Meta robots tags are nothing new.
In fact, there are many instances where you may want to use some meta robots tags such as the noarchive tag.
On larger websites, managing sections can become challenging.
So much so, that you would want to use some meta robots tags in order to control how Google sees, indexes, and ignores the page entirely.
Let’s take a look at the noarchive tag and see what it can do.
About the Noarchive Tag
There are certain meta tags that will help you determine how you want Google to crawl or index your page.
The noarchive tag has to do with whether there’s a cached copy of the page.
When you create a webpage, you normally want all potential options enabled.
But, throughout the life of a website, you may want to limit what the page can do.
For example, say for some reason you don’t want Google to cache the page (especially if you are updating it soon).
By using the noarchive tag, you will be able to tell Google that “I don’t want you to cache this.”
Using the noarchive tag has no major impact on search ranking.
How Do You Create the Noarchive Tag?
You can use the following snippets of coding:
<META NAME=”ROBOTS” CONTENT=”NOARCHIVE”>
Or, you can use Google-specific coding:
<META NAME=”GOOGLEBOT” CONTENT=”NOARCHIVE”>
The first implementation applies to all robots. The second one applies to Googlebot.
Is Google Caching Your Pages a Good Thing?
It can be.
For example, people can access your pages even if your site is down.
Google also provides “text only” version of the page that gives an idea how it “sees” your page.
When Should You Use the Noarchive Tag?
Things like time-sensitive content and other types of content you don’t want to publish to everybody should be noarchived.
This will prevent your content from being fully cached by Google.
These include content such as:
- Advertising you don’t want Google to cache.
- Any PPC landing pages you don’t want viewable to everybody.
- Internal documents you don’t want to be historically public.
- Any other sensitive documentation you don’t want a cache history of.
For some of these situations, you will likely already have content traditionally noindexed, or robots.txt disallow directives in place.
For others, the noarchive tag can be your best friend.
Can You Get Penalized for Using Noarchive?
No.
In the past, some people have worried that it may be a red flag to Google that a site is cloaking.
Officially, however, Google has stated that there’s nothing wrong with using the tag,
This tag only removes the “Cached” link for the page.
Google will continue to index the page and display a snippet.
What Other Tags Can You Use?
The above tags we discussed are not the only ones you can use to limit crawler activity.
There are a number of other tags you can use when it comes to declaring crawler directives.
These basic directives are nothing new, but a lot of confusion exists about their best practices.
We would like to put some of this confusion to rest with the remainder of this list.
These tags can help with indexation, following, caching, and other essential functions.
By using these tags, you can prioritize and make sure the sections of your site you want to be indexed are indexed. They are also great for excluding content you may not want to show to everyone.
When You Want to Prevent Indexing: Noindex
Code implementation:
<meta name="robots" content="noindex">
This tag, when used properly, is used to let a search engine know that they should not index this particular page.
If you have sections you prefer to leave for users only (such as PPC advertisements or other ads you may not want indexed ), then you can use the noindex tag on these pages.
Allowing Search Engines to Index the Page: Index
Code implementation:
<meta name="robots" content="index">
There’s a problem with this: you don’t have to use the tag. It’s redundant. The default behavior of crawlers is to crawl and index your site!
And they will do so when your site proves its worth. Adding a tag like this just adds redundancies and code bloat that does not need to be there.
Letting Search Engines Follow Your Links
Code implementation:
<meta name="robots" content="index,follow">
This one will let crawlers index and follow the links on your page. In addition, following the links in this manner passes all-important link juice, which further boosts the page receiving it.
When You Don’t Want Search Engines to Follow Your Links
Code implementation:
<meta name="robots" content="noindex,nofollow">
Nofollowing your links will set them up so that they do not pass link equity or otherwise boost the PageRank of the page the link is going to.
Follow/nofollow was used extensively back in the days of PageRank sculpting, which is why you find some old sites nowadays with so many nofollowed links.
By using nofollow and follow directives in this manner, you could “sculpt” the PageRank of the receiving page if done right.
Nowadays such practices are considered spam. Unless you know what you are doing, or you want to hoard PageRank like no tomorrow, you should not use nofollow.
Why would you want to block a particular page from receiving all-important PR?
New Rules for Nofollow
As if that were not enough, earlier this year Google introduced new rules for the nofollow tags, which complicates matters further.
Before, you could simply designate any advertising links as nofollow if you wanted to.
They would not pass value, and Google would ignore them.
Now, the new rules create new designations for distinguishing between links for advertising reasons.
User-generated content has also been given new tags.
You can now markup links within user-generated content (such as blog comments and reviews) as nofollow if they vi0late your site’s policies.
You can read more about these new rules elsewhere on Search Engine Journal.
Nofollow, Noarchive, Nocache, No More!
Controlling crawlers is not complicated.
By improving the ability of crawlers to distinguish between the content you want them to crawl and content you don’t want them to crawl, this can help you control the content you may not want to be displayed to everyone.
It’s also not hard to control crawlers.
What is hard, however, is assessing your overall strategy and where you should go next.
Which comes with testing, learning, and doing.
-
- [Video Recap]