Call 0845 485 1219

Chris Simmance

Technical Tips To Avoid A Duplicate Content Penalty

12th Mar 2013 SEO | 23 Comments


Content DuplicationSince Google’s most recent Panda algorithm updates they have become more adept at penalising sites where the quality and quantity of content is poor. This has been evident more so on minimalist sites that provide more rich images than content, but has also affected sites that have limited content for the user above the fold of the page.

The update is also continually penalising sites where their content is duplicated across their own site and the Web itself. In this blog post I aim to give you some good technical tips that could help you combat any instances of content duplication on your site. I will also explain a few best practices that you should apply to your site regarding content presentation and promotion.

Simply put, the best way to deal with content duplication is to remove it (or not create it in the first place). Sometimes though although some content is duplicated it needs to stay live for the users benefit. If that’s the case and the content has to stay as it is for the user then the tips below can help prevent search engines picking up on the duplication and penalising it.

Robots disallow
This should be used with caution as if you disallow content incorrectly it could result in your entire site or entire sections being removed from the index. This step may also be used in conjunction with the two following tips. Basically, all you need to do is identify the page(s) that are duplicates of other pages and disallow them like the example below;

The page www.mysite.com/home/ contains the same content as the copy it is showing on the page www.mysite.com/ . What you can do is add a disallow into your robots.txt file to tell search engines not to index anything including and after ‘/home/’. Be sure that none of your pages that you want indexed are following ‘/home/’ like ‘/home/important-stuff/’ as that content will also be removed from the index.

Robots
For more information about robots.txt files there is an excellent guest post on the Koozai site here.

Google Webmaster Tools URL Parameters
Working in much the same way as the robots.txt file but in Google Webmaster Tools the URL Parameters section lets you manually tell Google what to ignore and what to track in terms of Pagination or Translation of pages on your site. It is really important that this be implemented properly and checked thoroughly as in the same way (but potentially worse) as the robots.txt file it can de-index more than you wanted it to. There is a warning on the first screen for a reason and it’s best to learn more before using it.

.xml Sitemap Omission
A relatively straightforward thing to do if you know the duplicated content that you don’t want indexed and you have implemented the other two tips I have mentioned it’s best to remove that page from the sitemap.xml file. Once you have taken the links out of the file you can upload it to your domain and re-submit it to Google Webmaster Tools.

sitemap

‘nofollow’ Internal Links
An oldschool approach but if you have gone to the trouble to tell the search engines to ignore pages from the index it might be worth adding the ‘nofollow’ attribute to links on pages you want indexed that point to the pages you don’t want indexed. This acts as an additional flag to search engines to tell them that the duplicate pages are there for the user and not for search gains.

Canonicalisation
This is something that may need to be done for lots of different reasons so I’ll list a couple first to give you an idea of when you may want to canonicalise a page;

  • When a page (through site structure) can be reached from more than one location e.g. www.mysite.com/folder1/interesting-topic and www.mysite.com/folder3/interesting-topic
  • Pages that use Session Ids like shopping carts or booking pages
  • Pages that are the same after login but are secured (http before login and https after)
  • Pages that are reached from an Affiliate link i.e www.mysite.com/click-4240873-3215923
  • Pages where the URL changes depending on a change to a field e.g. a travel site with a built in calendar option

In the examples above you would add a link on the original pages that says to search engines that the content on these pages is known as being duplicate and the original source can be found at the resource linked by canonicalisation.

canonical

noindex / nofollow in Meta
A relatively simple exercise and usually implemented on pages such as Blog category, tag and archive pages where the content is there to help the user find the page they are looking for but is the same as the content in the blog itself. Adding the nofollow & noindex into the meta as well as removing the pages from the index is a good idea. There are also some other good ‘best practices’ to follow in that respect and I will cover them later in this blog. You can also mix and match the tags (e.g. if you want robots to see the content but not index it) such as below:

noindex

www vs. Non-www Redirects
When your site can be reached by http://www.sitename and http://sitename the content could well be seen as duplicated. The best way around this is to redirect one version to the other so that there is no way that one version can be indexed and therefore seen as a duplicate. You can do this with a URL rewrite at server level.

URL redirect

Rel=Prev & Rel=Next
This can be a little tricky to implement and is mainly used on component pages on a site. This works in very much the same way as Canonicalisation and it indicates to search engines a relationship between URLs in a paginated series. Google explain it really well in their Webmaster Central Blog but as I mentioned it’s easier to think of it working in a similar way to Canonicalisation in paginated content.

What To Do With International Duplicates International flag-cubes
Sometimes sites have duplicate versions that are set for audiences in different countries that speak the same language. For example you may have www.mysite.co.uk for UK audiences and www.mysite.com.au for an Australian audience. For whatever reason rather than have all audiences reaching the site from the same URL the sites were set up to be reachable in a similar way to the example above.

There are a few ways to stop search engines from thinking the content across the different TLD’s is duplicated. Some of them are really simple and a few may be out of the scope of the site for whatever reason but most can be implemented.

In no particular order;

  • Go into the settings tab in Google Webmaster Tools and set the geographic target to the country the sites’ audience is intended for.
  • Where possible have the site available from a server in the country for the intended audience.
  • Ensure that all addresses, telephone numbers and currencies are for the audience country.
  • Add geo-meta location tags into the pages. You can create them {here} ” When creating local profiles be specific and link to the relevant site.
  • Where possible use a local profile site for the audience’s country. E.g. ‘yelp.co.uk’ for ‘mysite.co.uk’ ” Be careful not to get too many links from other countries that the site is not intended for.
  • Implement hreflang (More information)

What To Do With Test Sites
The simplest solution is not to have them live! If possible have them on a test server that is not accessible to the internet however there may be a reason to have them live as they may be on a subdomain perhaps. The best way to ensure they aren’t indexed is to disallow in the robots.txt file and use the parameters in Webmaster Tools. Having done this you also may need to (if the test site allows) add canonical links to the test versions of the site pointing to the live version.

You should also set a password on the test site so a random user doesn’t accidentally get to your test site by mistyping your URL.

Other Instances Of Content Duplication You May Not Have Considered

You may have duplicate content on your site and not even know it. I’m not talking about entire pages as they should be pretty obvious to spot and resolve. I am talking about dynamic content on pages or category style pages. I will list a few examples of places on a site that could be deemed duplicated with a quick solution that might fix it for you without a penalty.

Blog Snippets
On your site you may have a section to promote your blog that contains the most recent articles. Often the Blog snippets are dynamically pulled from the blog itself and will therefore be a duplicate of the blog itself.

How to resolve this – You could remove the snippet altogether and add a Blog link in the top navigation bar on the site or (some) blog platforms often allow for the author to write an excerpt for the blog. This then takes the place of the dynamic snippet.

Blog Category and Archive Pages
As I mentioned earlier in the post you can use the tips above to keep them out of the index but if you don’t really need them for the user then just remove them. Some Blogs have plugins to remove these (e.g. many of the great plugins by Yoast) so it can be easy. Failing that redirect the category or archive URL to the main blog page.

Testimonial Snippets
In much the same way as blog snippets these are likely to be duplicated. If you have a fair few testimonials you can use a few in the promotion on your home page and have the rest on the main testimonials page. Either that or look at the section as an opportunity to sell the page as an advert and invite the user to view the page without the snippet.

Scrolling Product Banners
If you have a few ‘Top Products’ that you like to promote on a scrolling banner or anywhere else on your site the description is quite likely to be duplicated. The best way to resolve this is to write unique descriptions for the products for the banner or promotion where possible or use a separate frame so it is seen as only one page.

The end / The end
These are just a few examples but there are lots of other ways you can inadvertently duplicate your content and hopefully now you have a good idea of some good ways to resolve it or insure yourself as best possible from penalties.

If you have any other tips or ideas please feel free to share them!

For more information or to read something very similar elsewhere I have posted this article on several different sites….*

* not

Image Credits:

Duplicate Stamp, International Flags and Website Redirect  from BigStock.

Share this post

About the author

Chris Simmance

Chris has worked in the travel industry for the last 8 years, much of that working overseas in ski resorts, so he has a fantastic understanding of thriving in competitive sectors. His last project was social media management and website development for a leading travel company.

Fixing Website Redirects Canonical Issues

23 Comments

What do you think?

Digital Marketing Ideas Every Month

Sign up to receive our free monthly email. Including our favourite pieces of news from the digital marketing industry.

From SEO to PPC, Social Media to Brand Management and Analytics, we'll keep you informed.