Since Google’s most recent Panda algorithm updates they have become more adept at penalising sites where the quality and quantity of content is poor. This has been evident more so on minimalist sites that provide more rich images than content, but has also affected sites that have limited content for the user above the fold of the page.
The update is also continually penalising sites where their content is duplicated across their own site and the Web itself. In this blog post I aim to give you some good technical tips that could help you combat any instances of content duplication on your site. I will also explain a few best practices that you should apply to your site regarding content presentation and promotion.
Simply put, the best way to deal with content duplication is to remove it (or not create it in the first place). Sometimes though although some content is duplicated it needs to stay live for the users benefit. If that’s the case and the content has to stay as it is for the user then the tips below can help prevent search engines picking up on the duplication and penalising it.
Robots disallow
This should be used with caution as if you disallow content incorrectly it could result in your entire site or entire sections being removed from the index. This step may also be used in conjunction with the two following tips. Basically, all you need to do is identify the page(s) that are duplicates of other pages and disallow them like the example below;
The page www.mysite.com/home/ contains the same content as the copy it is showing on the page www.mysite.com/ . What you can do is add a disallow into your robots.txt file to tell search engines not to index anything including and after ‘/home/’. Be sure that none of your pages that you want indexed are following ‘/home/’ like ‘/home/important-stuff/’ as that content will also be removed from the index.
For more information about robots.txt files there is an excellent guest post on the Koozai site here.
Google Webmaster Tools URL Parameters
Working in much the same way as the robots.txt file but in Google Webmaster Tools the URL Parameters section lets you manually tell Google what to ignore and what to track in terms of Pagination or Translation of pages on your site. It is really important that this be implemented properly and checked thoroughly as in the same way (but potentially worse) as the robots.txt file it can de-index more than you wanted it to. There is a warning on the first screen for a reason and it’s best to learn more before using it.
.xml Sitemap Omission
A relatively straightforward thing to do if you know the duplicated content that you don’t want indexed and you have implemented the other two tips I have mentioned it’s best to remove that page from the sitemap.xml file. Once you have taken the links out of the file you can upload it to your domain and re-submit it to Google Webmaster Tools.
‘nofollow’ Internal Links
An oldschool approach but if you have gone to the trouble to tell the search engines to ignore pages from the index it might be worth adding the ‘nofollow’ attribute to links on pages you want indexed that point to the pages you don’t want indexed. This acts as an additional flag to search engines to tell them that the duplicate pages are there for the user and not for search gains.
Canonicalisation
This is something that may need to be done for lots of different reasons so I’ll list a couple first to give you an idea of when you may want to canonicalise a page;
In the examples above you would add a link on the original pages that says to search engines that the content on these pages is known as being duplicate and the original source can be found at the resource linked by canonicalisation.
noindex / nofollow in Meta
A relatively simple exercise and usually implemented on pages such as Blog category, tag and archive pages where the content is there to help the user find the page they are looking for but is the same as the content in the blog itself. Adding the nofollow & noindex into the meta as well as removing the pages from the index is a good idea. There are also some other good ‘best practices’ to follow in that respect and I will cover them later in this blog. You can also mix and match the tags (e.g. if you want robots to see the content but not index it) such as below:
www vs. Non-www Redirects
When your site can be reached by https://www.sitename and https://sitename the content could well be seen as duplicated. The best way around this is to redirect one version to the other so that there is no way that one version can be indexed and therefore seen as a duplicate. You can do this with a URL rewrite at server level.
Rel=Prev & Rel=Next
This can be a little tricky to implement and is mainly used on component pages on a site. This works in very much the same way as Canonicalisation and it indicates to search engines a relationship between URLs in a paginated series. Google explain it really well in their Webmaster Central Blog but as I mentioned it’s easier to think of it working in a similar way to Canonicalisation in paginated content.
What To Do With International Duplicates
Sometimes sites have duplicate versions that are set for audiences in different countries that speak the same language. For example you may have www.mysite.co.uk for UK audiences and www.mysite.com.au for an Australian audience. For whatever reason rather than have all audiences reaching the site from the same URL the sites were set up to be reachable in a similar way to the example above.
There are a few ways to stop search engines from thinking the content across the different TLD’s is duplicated. Some of them are really simple and a few may be out of the scope of the site for whatever reason but most can be implemented.
In no particular order;
What To Do With Test Sites
The simplest solution is not to have them live! If possible have them on a test server that is not accessible to the internet however there may be a reason to have them live as they may be on a subdomain perhaps. The best way to ensure they aren’t indexed is to disallow in the robots.txt file and use the parameters in Webmaster Tools. Having done this you also may need to (if the test site allows) add canonical links to the test versions of the site pointing to the live version.
You should also set a password on the test site so a random user doesn’t accidentally get to your test site by mistyping your URL.
You may have duplicate content on your site and not even know it. I’m not talking about entire pages as they should be pretty obvious to spot and resolve. I am talking about dynamic content on pages or category style pages. I will list a few examples of places on a site that could be deemed duplicated with a quick solution that might fix it for you without a penalty.
Blog Snippets
On your site you may have a section to promote your blog that contains the most recent articles. Often the Blog snippets are dynamically pulled from the blog itself and will therefore be a duplicate of the blog itself.
How to resolve this – You could remove the snippet altogether and add a Blog link in the top navigation bar on the site or (some) blog platforms often allow for the author to write an excerpt for the blog. This then takes the place of the dynamic snippet.
Blog Category and Archive Pages
As I mentioned earlier in the post you can use the tips above to keep them out of the index but if you don’t really need them for the user then just remove them. Some Blogs have plugins to remove these (e.g. many of the great plugins by Yoast) so it can be easy. Failing that redirect the category or archive URL to the main blog page.
Testimonial Snippets
In much the same way as blog snippets these are likely to be duplicated. If you have a fair few testimonials you can use a few in the promotion on your home page and have the rest on the main testimonials page. Either that or look at the section as an opportunity to sell the page as an advert and invite the user to view the page without the snippet.
Scrolling Product Banners
If you have a few ‘Top Products’ that you like to promote on a scrolling banner or anywhere else on your site the description is quite likely to be duplicated. The best way to resolve this is to write unique descriptions for the products for the banner or promotion where possible or use a separate frame so it is seen as only one page.
The end / The end
These are just a few examples but there are lots of other ways you can inadvertently duplicate your content and hopefully now you have a good idea of some good ways to resolve it or insure yourself as best possible from penalties.
If you have any other tips or ideas please feel free to share them!
For more information or to read something very similar elsewhere I have posted this article on several different sites….*
* not
Image Credits:
Duplicate Stamp, International Flags and Website Redirect from BigStock.
This is an interesting article with tips of avoiding dup content that I didn’t consider. The blog snippet is one I never would have thought of. How do you remove the snippet if you’re using wordpress? The only options I know of are doing a full preview or doing the snippet, on the /blog of the wordpress site. I would prefer to have just a link, from a design perspective, and also for this article. :)
Hi Chris,
and what about duplicate content due to “price comparators” ? If a ecommerce site submitted feeds to some price comparators and lost traffic, what do you suggest ? To diversify website content (product details) from feed data for comparators?
And if only some product details are duplicated (included in feed comparators), is all site penalized or only product pages with duplicate content ?
Thank you
Hi Danilo,
It is a difficult issue. Basically any page that has a lot of its content duplicated will be devalued. If you have a lot of product pages that have the majority of your content duplicated then a lot of the pages will be devalued causing overall site rankings devaluation.
I would argue that 90% of the top level product information is the same for a product across all sites that sell it. Things like name, serial number, specifications or tagline. I think that if you concentrated on creating a unique description of the product on the page and send the top level data to the feed for the comparison sites then the majority of your page will be unique and will be valued higher.
Has this helped with your question? Feel free to get in touch on the social platforms in my bio above if you want to discuss it further.
Great one Chris. Canonicalization, 301/302 redirections, meta robots, rel=prev, next for duplication in pagination etc are certainly the best technical ways to avoid duplication.
As usual a very high quality and insightful piece Chris. Keep it coming.
Hi Gavin,
Thanks for that. It’s a good point and one to remember!
There are so many ways to resolve duplications and I thin the key one moving forward is to not create it in the first place ;) Sadly sometimes it happens as part of a site build though and that can’t be helped.
With regard the robots.txt and /home example, simply blocking /home will not guarantee this URL will not get indexed if someone links to it. The best thing to do IMO is to ensure that this variation of the homepage is not linked to anywhere internally and a 301 is set up to ensure anyone linking to this URL is redirected accordingly and any page equity consolidated to the canonical URL. Failing that, slap a canonical on it :-)
Thanks for that Sitebee.
I think you are right and its never easy especially as that is an instance of duplication you can’t resolve any other way than by taking it down or not allowing it to be indexed. That’s rather against the point of the post in most cases.
Out of interest, what was your solution? Did you keep it live and risk it or take it down?
I have fallen victim to duplicate content from guest bloggers, more than once.
They offer you “what they say” original content well written guest blogs. Which initially I always check Google search and copyscape for duplicate content. I`ve found a few times if I check again a few weeks later that the content from those guest posts pops up on other websites.
So I have gone back to the source who denies knowledge, so I contact other webmasters where I have found the duplicate content and they`ve said they got it from the same source as me and even proven it with email addresses.
So, keep an eye out for those dubious tactics.
But anyway, whether it’s a penalty or not… the tips on here will still help people solve any dupe problems they have :)
Yeah I guess… just would have thought they’d happily be open about that one, and that there’d be more buzz about it if it changed from the issue it was to an actual penalty.
I agree with the tactics Stanislav.
I think that Search pages are (hopefully) there for the user experience and may could even be thought of as such by default in the algorithm. We’ll never know unless we get told expressly I guess.
Steve I think that the fact there was nothing publicly announced means nothing these days when Google makes an update. Looking at rankings and traffic as well as known site issues (as well as commentary within the industry) is confirmation enough in most cases.
This is from 08 but I don’t think the stance has changed on G purposely dishing out penalties https://googlewebmastercentral.blogspot.co.uk/2008/09/demystifying-duplicate-content-penalty.html
Ah okay… I just mean there was never a filter or penalty for dupe before, algorithmically or manual. It was always just other incidental issues affecting ranking with dupe rather than any penalty having been created.
Hi Chris,
In many cases my approach for internal search pages that create dupe content is to noindex/follow them (so they still pass link equity), or if this is not possible – the hatchet approach (robot them out with a wild card), or exclude them through the URL parameter handling in GWT. In all cases, if they paginate – prev/next tag is a good approach as well.
I would agree that its an algorithmic update and sorry for the confusion. We have seen rankings drops on sites where there have been duplicate issues outstanding on site. This coincided with Panda 24 and having fixed the issues we have seen rankings rises again.
I would expect more content based woes to come as it seems Google is on the verge of a Panda 25 update.
Thanks Chris. When you say heavier penalties since the update it sounds as if you’re saying there were penalties before.
This is where I’m confused, as there have never been penalties for dupe before. There were issues which affected ranking, but not penalties. So I’m wondering if you mean those issues have increased since the update or if actual penalties have started in terms of algorithmic or manual penalising. I’m guessing the former as we’d surely have been hearing a lot about it from everywhere if the latter.
Hi Guys,
Thanks for the comments. Steve it’s seems since the last Panda update there have been heavier penalties on Duplicate Content. We have certainly seen that here and having fixed the issues with some of the methods above we have seen rankings return and in some cases increase.
Stanislav, I agree with the internal search duplication issues. I think the best way (and it isn’t easy to do) is to use Rel=Prev & Rel=Next for search pagination. Unless you have another idea to resolve it?
Hi Chris, great post. I’d love to know how you fixed your issues and started seeing results again.
My site was hit really hard because of duplicate content (I think) around March. I had about 60 posts from 2010 when I first started the site each offering different audio files relating to different football teams but I naively used the same descriptive text for each.
About a week ago I 301 redirected them to a single descriptive relevant post but I’m not seeing any results yet.
Is you submit a reconsideration request to Google or anything?
Any help would be hugely appreciated.
Thanks, Chris!
Pretty good overview of the possible duplicate issues. I’d only add internal search and additive filtering (common with e-commerce sites) to the list.
They’re actually actively penalising for dupe content now? Is there any link to details on that reg the last update?
Is it not just still the indirect negative consequences it was before with equity split, etc… rather than penalties?
Thanks for sharing Chris!
Really interesting reading!
Sign up now and get our free monthly email. It’s filled with our favourite pieces of the news from the industry, SEO, PPC, Social Media and more. And, don’t forget - it’s free, so why haven’t you signed up already?
Call us on 0330 353 0300, email info@koozai.com or fill out our Contact Form.
23 Comments