(This is the transcript from our new video so it may not read as well as a normal blog post would)
Hi. Today we’re going to be talking about helping Googlebot to help improve your SEO results. You may think that you want Google to come and crawl and index every page of your website. However, there’s a thing called crawl budget, which is a finite amount of resources that Googlebot has to crawl and index your website every time it visits it.
Crawl budget varies by site to site, and this is going to be based on your site’s strength. There is some research out on the Web that tries to identify the fact: Does that affect the crawl budget assigned to each website?
An example of this may be the quality of your backlinks. If Google sees that your site has got good quality backlinks, then it may assign a better crawl budget to your site as opposed to a site that doesn’t have good quality backlinks, because it’s going to try and index more of your content because it thinks it’s going to be more useful to the users that are searching for that content.
Why should you control access? Basically, you’re giving your important pages of your site priority when it comes to Googlebot crawling those pages. An example of these may be your product pages, service pages, blog posts, and even your contact details page as well, because people are going to be looking for that in Google.
We’re kind of making the most of this crawl budget by telling Google what pages are going to be the ones that we really want to be crawled every time and ignore the ones that aren’t so good.
So how can we do that? First thing to look at is the robots.txt file. You should make sure that you have a robots.txt file in the root of your website. This is simple text file script that tells Googlebot what areas of your site to not crawl and which areas to crawl. This may be pages, even folders and file types as well. So if there’s like PDF files on your website that you don’t need to be indexed, you can put that in the robots.txt file.
When you are working with this, though, make sure you go to Google Search Console’s robots.txt file testing tool just to make sure you haven’t got any rules in there that could accidentally deindex your whole website. So be careful with that one.
If Googlebot comes to your website from an external link, for example it comes straight to a page that somebody has linked to, it may not take into account robots.txt file rules, in which case you’re probably going to want to back up, ignoring the pages that aren’t good by adding noindex tags to the actual header code of the specific pages. This is used for telling Google that it shouldn’t be in the index. Of course, it’s kind of come to the page already and it’s tried to craw it and index it. But you’re just specifying that you don’t want it in the index, and then next time it probably won’t attempt to come back to that page.
Again, in Google Search Console, there is a testing tool you can use. If you go Fetch as Google, you can see if that noindex tool script is actually working.
URL parameter rules, this is something you can set in Google Search Console as well. You can only do it in there. Basically, it’s a really powerful way of telling Google about dynamically generated URLs that may be duplicates of normal URLs of your website. So if you’ve got a CMS system or an ecommerce system that you use on your domain, chances are it’s probably generating these dynamic URLs that just reorder, sort, and narrow content.
In this tool, you can tell Google what are duplicates based on what actions that are being done on those pages. So you can help prevent a lot of wasted time on those duplicate pages and only focus on the ones that need to be indexed.
Be careful with that tool, because again there’s potential to deindex your whole website if you’re not careful with what rules you set for the different parameters. So you need to understand what the parameters are and make sure there are no issues with that.
Keep an up to date XML sitemap. Although Google doesn’t live by the rules of what’s in your sitemap, it won’t go and specifically index every page that you specify in your XML sitemap, it does give it hints to what content it should be indexing. So make sure you’ve got your new pages in there, and you’ve got old pages that don’t exist that are taken out of the XML sitemap, because if it tries to follow these links and then gets a 404 page, that’s a bit of the crawl budget that’s wasted.
So moving on to fixing broken internal links, again if it’s following broken links within your website to other pages of your website and those pages don’t exist, it’s wasted that crawl budget. Use a tool such as Screaming Frog to crawl your website and identify broken links and fix them at source.
This kind of plays well into site structure as well. Having a good site structure is a really underrated way of controlling users and search bots that come to visit your website to find pages that matter. So if you’ve got an important page that’s three or four levels deep, hidden within navigation, chances are the crawl budget may be used up and users won’t be able to find it because it’s hidden away.
Plan a good site structure and move pages around. Put your important pages top level or second folder down in the URL structure. Have them in your main navigation. Make sure that they’re crawlable through links that are easy to find on your page.
The last one I’ve got here is page load times. As you can imagine, the faster your page loads, the more pages Google can get through within this allocated crawl budget. There’s a lot of blog posts and tools out there to help you identify ways to speed up your page load time. If you do it across your website, you can really make a big difference to how many pages are being crawled and indexed within that crawl budget.
So there we have kind of a why concentrate on this. I think it’s really important and an underrated way of boosting your crawl and indexation and even rankings in Google as well. Here are some ideas on how you can control that.
So I recommend giving that a go. If you’ve got any questions, contact me on Twitter @DeanMarsden22, or just get in contact with the Koozai sales team, and we’ll be happy to help you. Thanks.