Call 03332 207 677

Dean Marsden

Boosting Your SEO by Helping Googlebot

24th Nov 2015 SEO 5 minutes to read


Boosting Your SEO by Helping Googlebot

What’s the secret to Google loving your website and indexing it? The secret is to steer crawlers to the most important and useful pages of your site and ignore those pages with little value to searchers. You may think that you want Google or Bing to index every URL of your website but you could be preventing it from crawling your most important pages by using up your ‘crawl budget’ on poor or irrelevant pages.

What is ‘crawl budget’?

When Google’s crawlbot (or Bing’s for that matter) comes to your website, it has a finite amount of resources to spend on accessing your website’s pages; after all, Google has millions of other websites to crawl that day too.

Your crawl budget can be determined by the value or quality of your website already, including but not limited to the quality of backlinks to your website. I won’t speculate on more factors here, but there are some good research pieces on the web where people have tried to identify these factors.

Why control access?

Give important pages priority

By controlling where Googlebot is allowed to crawl within your website, you are increasing the likelihood that important and valuable pages will be crawled every time Google visits your website.

Examples of these could be your product or service pages, blog post pages or even your contact details page. All of these are pages you want to have ranked highly in the search results so users can find this information quicker.

Ignore pages that don’t need to be ranked

There will be pages of your website that have no need to be indexed in the search results. These include pages that a user wouldn’t typically look for in the search results but perhaps will browse to whilst on your website. These could be your privacy policy page, terms & conditions page or your blog tag or category pages.

How to help Googlebot access the right pages of your website

There are a number of different ways in which you can help Googlebot access your website. The more of the following you can adjust or implement, the more control you should have over Googlebot or Bingbot.

Robots.txt file

The first thing to look at is setting disallow rules in your robots file for all pages, folders or files types on your site that do not need to be crawled. Upon visiting a site, the first place a crawler will look at is your robots.txt file (provided it is always located at http://www.mydomain.com/robots.txt). This will help indicate to various crawlers which parts of your website it should not attempt to crawl. You can set rules depending on what crawler bot you want to control.

You can learn all about robots.txt and common issues in this Koozai blog post from Irish Wonder. Always test your rules in Google Search Console’s robots.txt tester tool before you set live any changes as some rules could block your whole website or pages you didn’t want blocked.

NoIndex tags

To help prevent certain pages from being indexed, it is also recommended that you add the NoIndex tag to the header code of those pages. Once added to a page, you should test these tags by doing a ‘Fetch as Google’ request on the URLs in Google Search Console.

URL parameter rules

URL Parameters in Google Search Console

If your site is powered by a CMS or an e-commerce system, you’ll need to be careful with dynamically generated URLs causing duplicate pages. Googlebot can easily get caught up and waste time crawling these URLs. The URL parameters section in Google Search Console can help you tell which of the dynamic URLs are found by Google and set a preference over those it can ignore.

Be aware that this is a powerful tool and you must use it with caution as it could prevent crawling of important parts of your website.

Up-to-date XML sitemaps

Although Google won’t take your XML sitemap as a rule of which pages to crawl, it takes it as a hint – so make sure it’s up to date to help reinforce which pages of your site it should be indexing.

Remove any old pages from your site and add any new pages.

Fix internal links

Googlebot will follow links it finds in your webpage content so make sure you aren’t going to waste its time by letting it crawl links to missing pages. Use a crawling tool such as Screaming Frog’s SEO Spider tool to find these broken internal links and fix them at the source.

Bulk Export Broken Links in Screaming Frog

Page load times

Googlebot will need to load each of your pages when it visits them so by reducing the load time of each you can allow it to crawl and index more pages within the same overall time. There are a number of free tools available to help you analyse and improve site speed.

Site Structure

A good site structure is an underrated method of helping Googlebot crawl your website a lot easier. Clearly categorising page content and not hiding pages away too deep in your site structure increases the likelihood they’ll be found by the crawler.

The SEO Benefits

If you’ve managed to implement some or all of the above recommendations and tested them using the tools mentioned, you should begin to see some changes in crawl stats shown within Google Search Console.

Here we’re looking for the number of pages crawled to be similar, or just over, the number of actual pages on your site in the first blue graph. The reduction in kilobytes downloaded (in red) should mimic the reduction in pages crawled if you previously had lots of pages being crawled.

Below is an example of a site with a significant number of URL parameter issues in which Googlebot crawled up to 12,000 URLs when in fact there were just a few hundred actual pages of the site. Through the application of URL parameter rules and the other factors mentioned above, the number of pages crawled became much more consistent and realistic.

Crawl Changes

If Google is crawling your useful pages each time, the rankings of your pages will be more likely to change frequently, and most likely for the better. Fresh content will get indexed and ranked a lot quicker and time won’t be wasted from your ‘crawl budget’.

Share this post

Dean Marsden
About the author

Dean Marsden

A big fan of art and design, Dean is an avid photographer with an eye for creating beautiful images. His other skills include playing golf, which he’s done since he was 15. He’s a little quiet, but you know what they say about the quiet folk…

Fixing Website Redirects and Canonical Issues

Your Free Whitepaper

23 pages of goodness

Redirects Whitepaper Cover

2 Comments

What do you think?

Digital Ideas Monthly

Sign up now and get our free monthly email. It’s filled with our favourite pieces of the news from the industry, SEO, PPC, Social Media and more. And, don’t forget - it’s free, so why haven’t you signed up already?