Do you know the different types of penalties associated with onsite content? With so many potential issues arising from content, it’s worth looking at each type of penalty, be it algorithmic or manual. This post is designed to give you a better understand of how each type of content issue could affect your site’s rankings.
Not so long ago, Content was created by some websites in every way imaginable. Some of this content was useful to users, but a substantial proportion was irrelevant, scraped or just pure nonsense designed to help sites rank for keywords.
The web was becoming populated with worthless spam largely in response to the way search engine algorithms were ranking websites. Since 2011, Google has been addressing this issue by producing a range of algorithmic and human reviewed manual penalties and analysis to help keep the search results clean. The most notable content algorithmic penalty is known as Google Panda and it is regularly updated by the Google Web Spam team.
One of the most common areas of confusion with content penalties is understanding what constitutes as duplicate, spam or manipulative. Once a Webmaster or site owner is aware of how their content will be perceived by Google, they can begin to detect whether they have issues that is losing traffic or could even result in a penalty leading to a huge drop in visibility of the site in search results. This post examines the different classifications of low quality content and the potential risk that they pose to a website.
Duplicate content is text and images that appear on more than one page, either caused by site design, dynamically by a CMS system, product descriptions or by copying content from another website.
Google has regularly stated that duplicate content will not result in penalisation, such as this comment from Matt Cutts via SEL. This means that as long as the site isn’t completely copied from another site or as long as there isn’t a significant proportion of articles that have been taken from other websites, the site is unlikely to be at risk of an algorithmic or manual penalty.
That said, the issue of duplicate content should not be ignored as it can result in key pages not appearing in searches especially if the content is duplicated on a website that is judged to have more authority or another page on your site that Google identifies as more relevant.
One of the biggest sources of duplicate content comes from product descriptions on E-Commerce websites, although this is very unlikely to result in a penalty, a page with a duplicate description may be “filtered” out of search results with more established sites ranking which would naturally reduce the traffic to the duplicate page.
Thin Content is where a URL exists on a website that contains almost no content or very little content that doesn’t “add value” to a user. Google believes that users don’t want to see empty pages and to a certain extent, this is usually true as they don’t provide the detailed information that the user is most probably looking to find.
A site with a large proportion of thin content pages are most likely to suffer from an algorithmic penalty such as Panda, which is based on analysing content quality using automated analysis of a site. Google has suggested a few pages of thin content could impact traffic to a site. Manual penalties are unlikely to be given unless the page is deemed by Google to exist to manipulate the search results through doorway or template pages.
Doorway, template & automated pages tend to contain generic content that is replicated across multiple URLs but with minor changes such as a place name, product name or keywords that are designed to rank for specific search queries.
Recently Google announced a crackdown on doorway pages and introduced a “ranking adjustment” (Google Webmasters Blog) that it claims would reduce the visibility of pages that were made clearly just to rank for certain terms.
The images below are a prime example of pages that appear to have been created to manipulate search results; the information provides no real value and location specific pages without the service being specific to each location or the company having separate business locations for London or Manchester.
Scraped content is text or even a complete website that has been copied almost word for word from another source. This exists largely as a legacy of previous search algorithms that benefited sites that had more pages with a lot of content on and didn’t necessarily consider it may have been taken from a more reputable website.
As well as being arguably illegal, it is not going to be useful for web users to see multiple websites showing the same content.
If Google identifies this as an attempt to improve rankings, it would be probable that the site would receive a site-wide manual penalty. Even if there is no intent to improve rankings, the algorithm still may identify the site as providing low quality, duplicate content that doesn’t provide value and therefore search visibility would decline.
It’s important to determine which type of content issue your site faces to identify the real risk that it could cause and the damage it may already be doing to your traffic levels for individual sections or across the entire site. Search engines are always keen to reinforce that blocking duplicate pages through robots.txt is not the best solution to resolving these issues. Instead, where possible, they should be fixed at source by removing low quality content pages & ensuring that dynamic duplication is avoided wherever possible.
For more information on how your site’s rankings may be affected by onsite content issues, speak to Koozai today.