PostHeaderIcon SEO Best Practices for Canonical URLs + the Rel=Canonical Tag – Whiteboard Friday



Posted by randfish

If you’ve ever had any questions about the canonical tag, well, have we got the Whiteboard Friday for you. In today’s episode, Rand defines what rel=canonical means and its intended purpose, when it’s recommended you use it, how to use it, and sticky situations to avoid.

SEO best practices for canonical URLs

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week, we’re going to chat about some SEO best practices for canonicalization and use of the rel=canonical tag.

Before we do that, I think it pays to talk about what a canonical URL is, because a canonical URL doesn’t just refer to a page upon which we are targeting or using the rel=canonical tag. Canonicalization has been around, in fact, much longer than the rel=canonical tag itself, which came out in 2009, and there are a bunch of different things that a canonical URL means.

What is a “canonical” URL?

So first off, what we’re trying to say is this URL is the one that we want Google and the other search engines to index and to rank. These other URLs that potentially have similar content or that are serving a similar purpose or perhaps are exact duplicates, but, for some reason, we have additional URLs of them, those ones should all tell the search engines, “No, no, this guy over here is the one you want.”

So, for example, I’ve got a canonical URL, ABC.com/a.

Then I have a duplicate of that for some reason. Maybe it’s a historical artifact or a problem in my site architecture. Maybe I intentionally did it. Maybe I’m doing it for some sort of tracking or testing purposes. But that URL is at ABC.com/b.

Then I have this other version, ABC.com/a?ref=twitter. What’s going on there? Well, that’s a URL parameter. The URL parameter doesn’t change the content. The content is exactly the same as A, but I really don’t want Google to get confused and rank this version, which can happen by the way. You’ll see URLs that are not the original version, that have some weird URL parameter ranking in Google sometimes. Sometimes this version gets more links than this version because they’re shared on Twitter, and so that’s the one everybody picked up and copied and pasted and linked to. That’s all fine and well, so long as we canonicalize it.

Or this one, it’s a print version. It’s ABC.com/aprint.html. So, in all of these cases, what I want to do is I want to tell Google, “Don’t index this one. Index this one. Don’t index this one. Index this one. Don’t index this one. Index this one.”

I can do that using this, the link rel=canonical, the href telling Google, “This is the page.” You put this in the header tag of any document and Google will know, “Aha, this is a copy or a clone or a duplicate of this other one. I should canonicalize all of my ranking signals, and I should make sure that this other version ranks.”

By the way, you can be self-referential. So it is perfectly fine for ABC.com/a to go ahead and use this as well, pointing to itself. That way, in the event that someone you’ve never even met decides to plug in question mark, some weird parameter and point that to you, you’re still telling Google, “Hey, guess what? This is the original version.”

Great. So since I don’t want Google to be confused, I can use this canonicalization process to do it. The rel=canonical tag is a great way to go. By the way, FYI, it can be used cross-domain. So, for example, if I republish the content on A at something like a Medium.com/@RandFish, which is, I think, my Medium account, /a, guess what? I can put in a cross-domain rel=canonical telling them, “This one over here.” Now, even if Google crawls this other website, they are going to know that this is the original version. Pretty darn cool.

Different ways to canonicalize multiple URLs

There are different ways to canonicalize multiple URLs.

1. Rel=canonical.

I mention that rel=canonical isn’t the only one. It’s one of the most strongly recommended, and that’s why I’m putting it at number one. But there are other ways to do it, and sometimes we want to apply some of these other ones. There are also not-recommended ways to do it, and I’m going to discuss those as well.

2. 301 redirect.

The 301 redirect, this is basically a status code telling Google, “Hey, you know what? I’m going to take /b, I’m going to point it to /a. It was a mistake to ever have /b. I don’t want anyone visiting it. I don’t want it clogging up my web analytics with visit data. You know what? Let’s just 301 redirect that old URL over to this new one, over to the right one.”

3. Passive parameters in Google search console.

Some parts of me like this, some parts of me don’t. I think for very complex websites with tons of URL parameters and a ton of URLs, it can be just an incredible pain sometimes to go to your web dev team and say like, “Hey, we got to clean up all these URL parameters. I need you to add the rel=canonical tag to all these different kinds of pages, and here’s what they should point to. Here’s the logic to do it.” They’re like, “Yeah, guess what? SEO is not a priority for us for the next six months, so you’re going to have to deal with it.”

Probably lots of SEOs out there have heard that from their web dev teams. Well, guess what? You can end around it, and this is a fine way to do that in the short term. Log in to your Google search console account that’s connected to your website. Make sure you’re verified. Then you can basically tell Google, through the Search Parameters section, to make certain kinds of parameters passive.

So, for example, you have sessionid=blah, blah, blah. You can set that to be passive. You can set it to be passive on certain kinds of URLs. You can set it to be passive on all types of URLs. That helps tell Google, “Hey, guess what? Whenever you see this URL parameter, just treat it like it doesn’t exist at all.” That can be a helpful way to canonicalize.

4. Use location hashes.

So let’s say that my goal with /b was basically to have exactly the same content as /a but with one slight difference, which was I was going to take a block of content about a subsection of the topic and place that at the top. So A has the section about whiteboard pens at the top, but B puts the section about whiteboard pens toward the bottom, and they put the section about whiteboards themselves up at the top. Well, it’s the same content, same search intent behind it. I’m doing the same thing.

Well, guess what? You can use the hash in the URL. So it’s a#b and that will jump someone — it’s also called a fragment URL — jump someone to that specific section on the page. You can see this, for example, Moz.com/about/jobs. I think if you plug in #listings, it will take you right to the job listings. Instead of reading about what it’s like to work here, you can just get directly to the list of jobs themselves. Now, Google considers that all one URL. So they’re not going to rank them differently. They don’t get indexed differently. They’re essentially canonicalized to the same URL.

NOT RECOMMENDED

I do not recommend…

5. Blocking Google from crawling one URL but not the other version.

Because guess what? Even if you use robots.txt and you block Googlebot’s spider and you send them away and they can’t reach it because you said robots.txt disallow /b, Google will not know that /b and /a have the same content on them. How could they?

They can’t crawl it. So they can’t see anything that’s here. It’s invisible to them. Therefore, they’ll have no idea that any ranking signals, any links that happen to point there, any engagement signals, any content signals, whatever ranking signals that might have helped A rank better, they can’t see them. If you canonicalize in one of these ways, now you’re telling Google, yes, B is the same as A, combine their forces, give me all the rankings ability.

6. I would also not recommend blocking indexation.

So you might say, “Ah, well Rand, I’ll use the meta robots no index tag, so that way Google can crawl it, they can see that the content is the same, but I won’t allow them to index it.” Guess what? Same problem. They can see that the content is the same, but unless Google is smart enough to automatically canonicalize, which I would not trust them on, I would always trust yourself first, you are essentially, again, preventing them from combining the ranking signals of B into A, and that’s something you really want.

7. I would not recommend using the 302, the 307, or any other 30x other than the 301.

This is the guy that you want. It is a permanent redirect. It is the most likely to be most successful in canonicalization, even though Google has said, “We often treat 301s and 302s similarly.” The exception to that rule is but a 301 is probably better for canonicalization. Guess what we’re trying to do? Canonicalize!

8. Don’t 40x the non-canonical version.

So don’t take /b and be like, “Oh, okay, that’s not the version we want anymore. We’ll 404 it.” Don’t 404 it when you could 301. If you send it over here with a 301 or you use the rel=canonical in your header, you take all the signals and you point them to A. You lose them if you 404 that in B. Now, all the signals from B are gone. That’s a sad and terrible thing. You don’t want to do that either.

The only time I might do this is if the page is very new or it was just an error. You don’t think it has any ranking signals, and you’ve got a bunch of other problems. You don’t want to deal with having to maintain the URL and the redirect long term. Fine. But if this was a real URL and real people visited it and real people linked to it, guess what? You need to redirect it because you want to save those signals.

When to canonicalize URLs

Last but not least, when should we canonicalize URLs versus not?

I. If the content is extremely similar or exactly duplicate.

Well, if it is the case that the content is either extremely similar or exactly duplicate on two different URLs, two or more URLs, you should always collapse and canonicalize those to a single one.

II. If the content is serving the same (or nearly the same) searcher intent (even if the KW targets vary somewhat).

If the content is not duplicate, maybe you have two pages that are completely unique about whiteboard pens and whiteboards, but even though the content is unique, meaning the phrasing and the sentence structures are the same, that does not mean that you shouldn’t canonicalize.

For example, this Whiteboard Friday about using the rel=canonical, about canonicalization is going to replace an old version from 2009. We are going to take that old version and we are going to use the rel=canonical. Why are we going to use the rel=canonical? So that you can still access the old one if for some reason you want to see the version that we originally came out with in 2009. But we definitely don’t want people visiting that one, and we want to tell Google, “Hey, the most up-to-date one, the new one, the best one is this new version that you’re watching right now.” I know this is slightly meta, but that is a perfectly reasonable use.

What I’m trying to aim at is searcher intent. So if the content is serving the same or nearly the same searcher intent, even if the keyword targeting is slightly different, you want to canonicalize those multiple versions. Google is going to do a much better job of ranking a single piece of content that has lots of good ranking signals for many, many keywords that are related to it, rather than splitting up your link equity and your other ranking signal equity across many, many pages that all target slightly different variations. Plus, it’s a pain in the butt to come up with all that different content. You would be best served by the very best content in one place.

III. If you’re republishing or refreshing or updating old content.

Like the Whiteboard Friday example I just used, you should use the rel=canonical in most cases. There are some exceptions. If you want to maintain that old version, but you’d like the old version’s ranking signals to come to the new version, you can take the content from the old version, republish that at /a-old. Then take /a and redirect that or publish the new version on there and have that version be the one that is canonical and the old version exist at some URL you’ve just created but that’s /old. So republishing, refreshing, updating old content, generally canonicalization is the way to go, and you can preserve the old version if you want.

IV. If content, a product, an event, etc. is no longer available and there’s a near best match on another URL.

If you have content that is expiring, a piece of content, a product, an event, something like that that’s going away, it’s no longer available and there’s a next best version, the version that you think is most likely to solve the searcher’s problems and that they’re probably looking for anyway, you can canonicalize in that case, usually with a 301 rather than with a rel=canonical, because you don’t want someone visiting the old page where nothing is available. You want both searchers and engines to get redirected to the new version, so good idea to essentially 301 at that point.

Okay, folks. Look forward to your questions about rel=canonicals, canonical URLs, and canonicalization in general in SEO. And we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Similar Posts:


Article Source: The Only Yard For The Internet Junkie
If you like all this stuff here then you can buy me a pack of cigarettes.

PostHeaderIcon How Content Can Succeed By Making Enemies – Whiteboard Friday



Posted by randfish

Getting readers on board with your ideas isn’t the only way to achieve content success. Sometimes, stirring up a little controversy and earning a few rivals can work incredibly well — but there’s certainly a right and a wrong way to do it. Rand details how to use the power of making enemies work to your advantage in today’s Whiteboard Friday.

How content can succeed by making enemies

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. Today, we’re going to chat about something a little interesting — how content can succeed by making enemies. I know you’re thinking to yourself, “Wait a minute, I thought my job was to make friends with my content.” Yes, and one of the best ways to make close friends is to make enemies too.

So, in my opinion, I think that companies and businesses, programs, organizations of all kinds, efforts of all kinds tend to do really well when they get people on their side. So if I’m trying to create a movement or I’m trying to get people to believe in what I’m doing, I need to have positions, data, stories, and content that can bring people to my site. One of the best ways to do that is actually to think about it in opposition to something else, basically try and figure out how you can earn some enemies.

A few examples of content that makes enemies & allies

I’ll give you a few examples, because I think that will help add some context here. I did a little bit of research. My share data is from BuzzSumo, and my link data here is from Ahrefs. But for example, this piece called “There Are Now Twice as Many Solar Jobs as Coal Jobs in the US,” this is essentially just data-driven content, but it clearly makes friends and enemies. It makes enemies with sort of this classic, old-school Americana belief set around how important coal jobs are, and it creates, through the enemy that it builds around that, simply by sharing data, it also creates allies, people who are on the side of this story, who want to share it and amplify it and have it reach its potential and reach more people.

Same is true here. So this is a story called “Yoga Is a Good Alternative to Physical Therapy.” Clearly, it did extremely well, tens of thousands of shares and thousands of links, lots of ranking keywords for it. But it creates some enemies. Physical therapists are not going to be thrilled that this is the case. Despite the research behind it, this is frustrating for many of those folks. So you’ve created friends, allies, people who are yoga practitioners and yoga instructors. You’ve also created enemies, potentially those folks who don’t believe that this might be the case despite what the research might show.

Third one, “The 50 Most Powerful Public Relations Firms in America,” I think this was actually from The Observer. So they’re writing in the UK, but they managed to rank for lots and lots of keywords around “best PR firms” and all those sorts of things. They have thousands of shares, thousands of links. I mean 11,000 links, that’s darn impressive for a story of this nature. And they’ve created enemies. They’ve created enemies of all the people who are not in the 50 most powerful, who feel that they should be, and they’ve created allies of the people who are in there. They’ve also created some allies and enemies deeper inside the story, which you can check out.

“Replace Your Lawn with These Superior Alternatives,” well, guess what? You have now created some enemies in the lawn care world and in the lawn supply world and in the passionate communities, very passionate communities, especially here in the United States, around people who sort of believe that homes should have lawns and nothing else, grass lawns in this case. This piece didn’t do that well in terms of shares, but did phenomenally well in terms of links. This was on Lifehacker, and it ranks for all sorts of things, 11,000+ links.

Before you create, ask yourself: Who will help amplify this, and why?

So you can see that these might not be things that you naturally think of as earning enemies. But when you’re creating content, if you can go through this exercise, I have this rule, that I’ve talked about many times over the years, for content success, especially content amplification success. That is before you ever create something, before you brainstorm the idea, come up with the title, come up with the content, before you do that, ask yourself: Who will help amplify this and why? Why will they help?

One of the great things about framing things in terms of who are my allies, the people on my side, and who are the enemies I’m going to create is that the “who” becomes much more clear. The people who support your ideas, your ethics, or your position, your logic, your data and want to help amplify that, those are people who are potential amplifiers. The people, the detractors, the enemies that you’re going to build help you often to identify that group.

The “why” becomes much more clear too. The existence of that common enemy, the chance to show that you have support and beliefs in people, that’s a powerful catalyst for that amplification, for the behavior you’re attempting to drive in your community and your content consumers. I’ve found that thinking about it this way often gets content creators and SEOs in the right frame of mind to build stuff that can do really well.

Some dos and don’ts

Do… backup content with data

A few dos and don’ts if you’re pursuing this path of content generation and ideation. Do back up as much as you can with facts and data, not just opinion. That should be relatively obvious, but it can be dangerous in this kind of world, as you go down this path, to not do that.

Do… convey a world view

I do suggest that you try and convey a world view, not necessarily if you’re thinking on the political spectrum of like from all the way left to all the way right or those kinds of things. I think it’s okay to convey a world view around it, but I would urge you to provide multiple angles of appeal.

So if you’re saying, “Hey, you should replace your lawn with these superior alternatives,” don’t make it purely that it’s about conservation and ecological health. You can also make it about financial responsibility. You can also make it about the ease with which you can care for these lawns versus other ones. So now it becomes something that appeals across a broader range of the spectrum.

Same thing with something like solar jobs versus coal jobs. If you can get it to be economically focused and you can give it a capitalist bent, you can potentially appeal to multiple ends of the ideological spectrum with that world view.

Do… collect input from notable parties

Third, I would urge you to get inputs from notable folks before you create and publish this content, especially if the issue that you’re talking about is going to be culturally or socially or politically charged. Some of these fit into that. Yoga probably not so much, but potentially the solar jobs/coal jobs one, that might be something to run the actual content that you’ve created by some folks who are in the energy space so that they can help you along those lines, potentially the energy and the political space if you can.

Don’t… be provocative just to be provocative

Some don’ts. I do not urge you and I’m not suggesting that you should create provocative content purely to be provocative. Instead, I’m urging you to think about the content that you create and how you angle it using this framing of mind rather than saying, “Okay, what could we say that would really piss people off?” That’s not what I’m urging you to do. I’m urging you to say, “How can we take things that we already have, beliefs and positions, data, stories, whatever content and how do we angle them in such a way that we think about who are the enemies, who are the allies, how do we get that buy-in, how do we get that amplification?”

Don’t… choose indefensible positions

Second, I would not choose enemies or positions that you can’t defend against. So, for example, if you were considering a path that you think might get you into a world of litigious danger, you should probably stay away from that. Likewise, if your positions are relatively indefensible and you’ve talked to some folks in the field and done the dues and they’re like, “I don’t know about that,” you might not want to pursue it.

Don’t… give up on the first try

Third, do not give up if your first attempts in this sort of framing don’t work. You should expect that you will have to, just like any other form of content, practice, iterate, and do this multiple times before you have success.

Don’t… be unprofessional

Don’t be unprofessional when you do this type of content. It can be a little bit tempting when you’re framing things in terms of, “How do I make enemies out of this?” to get on the attack. That is not necessary. I think that actually content that builds enemies does so even better when it does it from a non-attack vector mode.

Don’t… sweat the Haterade

Don’t forget that if you’re getting some Haterade for the content you create, a lot of people when they start drinking the Haterade online, they run. They think, “Okay, we’ve done something wrong.” That’s actually not the case. In my experience, that means you’re doing something right. You’re building something special. People don’t tend to fight against and argue against ideas and people and organizations for no reason. They do so because they’re a threat.

If you’ve created a threat to your enemies, you have also generally created something special for your allies and the people on your side. That means you’re doing something right. In Moz’s early days, I can tell you, back when we were called SEOmoz, for years and years and years we got all sorts of hate, and it was actually a pretty good sign that we were doing something right, that we were building something special.

So I look forward to your comments. I’d love to see any examples of stuff that you have as well, and we’ll see you again next week for another edition of Whiteboard Friday. Take care.

Video transcription by Speechpad.com

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Similar Posts:


Article Source: The Only Yard For The Internet Junkie
If you like all this stuff here then you can buy me a pack of cigarettes.

PostHeaderIcon JavaScript & SEO: Making Your Bot Experience As Good As Your User Experience



Posted by alexis-sanders

Understanding JavaScript and its potential impact on search performance is a core skillset of the modern SEO professional. If search engines can’t crawl a site or can’t parse and understand the content, nothing is going to get indexed and the site is not going to rank.

The most important questions for an SEO relating to JavaScript: Can search engines see the content and grasp the website experience? If not, what solutions can be leveraged to fix this?


Fundamentals

What is JavaScript?

When creating a modern web page, there are three major components:

  1. HTML – Hypertext Markup Language serves as the backbone, or organizer of content, on a site. It is the structure of the website (e.g. headings, paragraphs, list elements, etc.) and defining static content.
  2. CSS – Cascading Style Sheets are the design, glitz, glam, and style added to a website. It makes up the presentation layer of the page.
  3. JavaScript – JavaScript is the interactivity and a core component of the dynamic web.

Learn more about webpage development and how to code basic JavaScript.

javacssseo.gif

Image sources: 1, 2, 3

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess “AJAJ” doesn’t sound as clean as “AJAX” [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

Related image

Image source

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

PhantomJS and Zombie.js are scripted headless browsers, typically used for automating web interaction for testing purposes, and rendering static HTML snapshots for initial requests (pre-rendering).


Why can JavaScript be challenging for SEO? (and how to fix issues)

There are three (3) primary reasons to be concerned about JavaScript on your site:

  1. Crawlability: Bots’ ability to crawl your site.
  2. Obtainability: Bots’ ability to access information and parse your content.
  3. Perceived site latency: AKA the Critical Rendering Path.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

  1. Blocking search engines from your JavaScript (even accidentally).
  2. Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

Fetch as Google and TechnicalSEO.com’s robots.txt and Fetch and Render testing tools can help to identify resources that Googlebot is blocked from.

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

Historically, JavaScript-based websites (aka “AJAX sites”) were using fragment identifiers (#) within URLs.

  • Not recommended:
    • The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of “#” in URLs.
    • Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:
      • Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name=”fragment” content=”!”>).
      • Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

Image result

Image source

  • Recommended:
    • pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.
      • A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.
      • Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.
      • Read more: Mozilla PushState History API Documents

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

  • If the user must interact for something to fire, search engines probably aren’t seeing it.
    • Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.
  • If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.
    • *John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.
    • *Screaming Frog tests show a correlation to five seconds to render content.
    • *The load event plus five seconds is what Google’s PageSpeed Insights, Mobile Friendliness Tool, and Fetch as Google use; check out Max Prin’s test timer.
  • If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

adamaudette - I don't always JavaScript, but when I do, I know google can crawl the dom and dynamically generated HTML

Recently, Barry Goralewicz performed a cool experiment testing a combination of various JavaScript libraries and frameworks to determine how Google interacts with the pages (e.g., are they indexing URL/content? How does GSC interact? Etc.). It ultimately showed that Google is able to interact with many forms of JavaScript and highlighted certain frameworks as perhaps more challenging. John Mueller even started a JavaScript search group (from what I’ve read, it’s fairly therapeutic).

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

  1. Confirm that your content is appearing within the DOM.
  2. Test a subset of pages to see if Google can index content.
  • Manually check quotes from your content.
  • Fetch with Google and see if content appears.
  • Fetch with Google supposedly occurs around the load event or before timeout. It’s a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.
  • Note: If you aren’t verified in GSC, try Technicalseo.com’s Fetch and Render As Any Bot Tool.

After you’ve tested all this, what if something’s not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS
What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

Google introduced HTML snapshots 2009, deprecated (but still supported) them in 2015, and awkwardly mentioned them as an element to “avoid” in late 2016. HTML snapshots are a contentious topic with Google. However, they’re important to understand, because in certain situations they’re necessary.

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.”
Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

  1. Knowledge that search engines and crawlers will be able to understand the experience.
    • Certain types of JavaScript may be harder for Google to grasp (cough… Angular (also colloquially referred to as AngularJS 2) …cough).
  2. Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.
    • Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

"It's not just Google understanding your JavaScript. It's also about the speed." -DOM - "It's not just about Google understanding your Javascript. it's also about your perceived latency." -DOM

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to ? “get everything above-the-fold in front of the user, ASAP.”

Critical Rendering Path – Optimized Rendering Loads Progressively ASAP:

progressive page rendering

Image source

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

  1. Inline: Add the JavaScript in the HTML document.
  2. Async: Make JavaScript asynchronous (i.e., add “async” attribute to HTML tag).
  3. Defer: By placing JavaScript lower within the HTML.

!!! Important note: It’s important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

Read more: Google Developer’s Speed Documentation


TL;DR – Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Similar Posts:


Article Source: The Only Yard For The Internet Junkie
If you like all this stuff here then you can buy me a pack of cigarettes.

Free premium templates and themes
Add to Technorati Favorites
Free PageRank Display
Categories
Archives
Our Partners
Related Links
Our Partners
Resources Link Directory Professional Web Design Template