‘Crawl Budget’; you’ve probably heard the phrase many times, but what is it and what does it mean? The Internet is an ever expanding resource for information and as much as Google would love to be able to crawl every piece of content that exists on it, this is somewhat impossible. As such, domains are assigned a crawl budget – something you must to consider for your SEO campaigns.
It’s important to Google (and other search engines alike) to crawl and index ‘the good stuff’ on the Internet and so to ensure they’re doing this whilst making the most of their limited resources, they allocate each domain a certain amount of crawl budget.
The crawl budget assigned to a domain is how much time they (the search engines) spend crawling a domain each day. This budget varies from domain to domain as it is based on a huge number of factors including the authority and trust of a website, how often it’s updated and much more.
So, as Google allocates your website a finite crawl budget, isn’t it a good idea to ensure they’re able to search your website efficiently? Well, of course.
It’s important that Google (and in turn users) are able to navigate around your site with ease. This increases the likelihood of Google being able to crawl those important pages on your website and improves the experience for users on your website.
There are a number of common errors found across many websites that can really waste crawl budget. I have highlighted 6 of those and ways you can ensure the wastage of your allocated crawl budget is minimal.
There are a number of errors to be aware of when it comes to internal and external linking issues. It goes without saying that if Google and other search engines crawl your website and are met with continual link errors, valuable crawl budget is being wasted. Below are two types of linking issues every webmaster should be aware of:
As a rule of thumb, redirects should be 301 redirects wherever possible (as opposed to 302) in order to flow ‘link juice’ through to the new page. If 301 redirects are linked to internally, the links should instead point directly to the live source, not through a redirect, as crawlers that flow through the link have to take more time to get to the destination page. This is wasting valuable crawl budget and means search engines spend less time looking at live pages that you want them to crawl.
Whilst reviewing internal redirects, you also want to ensure no redirect chains or loops exist on your website as this makes it a lot more difficult for both users and crawlers to access pages on your website. There are a number of desktop SEO spider tool programmes available that help to identify technical issues including those discussed, such as Screaming Frog.
It is of course important to ensure no broken links exist on your website, not only is this detrimental to a user’s experience on your site, but it also makes it very difficult for crawlers to navigate around your website. If a crawler can’t get to a page, they can’t index it. It’s important that regular link checks are undertaken across a website to ensure any broken links are fixed as soon as they are discovered, regular checks can be done using a variety of tools, such as Google’s Search Console and Screaming Frog.
Meaningful and user-friendly internal linking helps to pass link value and keyword relevancy around your website whilst also allowing users and robots to navigate through your pages. By not ensuring internal links are used where relevant, you’re missing an opportunity to channel users and robots through your site and build keyword relevancy through natural use of keyword anchor text.
By ensuring proper interlinking is in place and pages are linked to where relevant, you’re making the most of the crawl budget that has been allocated to your website, vastly improving site crawlability.
Page speed is an important factor for improving site crawlability. Not only is this an important ranking factor, it can also determine whether or not those all-important pages on your website get seen by search engines.
Albeit common sense, the faster a website is at loading, the more time crawlers can spend crawling different pages on your website. Along with increasing the amount of pages that get crawled, improved page speed also provides the user with a greater experience on your website (winning all around). So make sure time is spent improving the speed of your website if not for site crawlability, for the user!
If correctly used, a robots.txt can increase the crawl rate of your website; however it can quite often be used incorrectly and if done so, can greatly affect the crawlability and indexation of your website.
When blocking pages via robots.txt you’re telling a crawler not to access the page or index it, so it’s important to be certain that the pages being blocked do not need to be crawled and indexed. The best way to determine this is by asking yourself; would I want my audience to see this page from search engine results pages?
By efficiently instructing crawlers to not crawl certain pages on your website, crawlers are able to spend their crawl budget navigating pages that are important to you.
As the robots.txt file is one of the first places a crawler looks when first going to a website, it is best practice to use this to direct search engines to your sitemap. This makes it easier for crawlers to index the whole site.
URL parameters are often a major cause of wasted crawl budget, especially with ecommerce websites. In the Google Search Console (formerly Webmaster Tools), you’re offered the easiest way to indicate to Google how to handle parameters in URLs found across your website.
Before using the ‘URL Parameter’ feature, it’s important to understand how parameters work as you could end up excluding important pages from crawl. Google provide a handy resource to learn about this, find out more here.
Sitemaps are used by both users and search engines to discover important pages around your website. An XML Sitemap is specifically used by search engines; this is used as to help crawlers discover new pages across your website.
HTML Sitemaps are used by both users and search engines and are again useful in helping crawlers find pages across your site. As Matt Cutts discusses in the video below, it is best practice to have both an XML and HTML Sitemap in place on your website.
It is clear that there are a number of ways a website can make the most of the crawl budget it has been allocated. Making it easy for a crawler to navigate your site ensures that your important pages are being seen, by following the 6 tips I have provided, you will be greatly improving your website’s crawlability.
Have any more tips on improving site crawlability? I’d love to hear them, leave a comment below or contact me via Twitter @LukeTheMono
For more information on site crawlability, contact us today.