The term “Crawl Budget” might be something you’ve heard of flying around when it comes to SEO, but what is it and what does it mean?
At it’s simplest level it is how much time and resource Google or other search engines put into crawling your website. From this they then decide what is worth indexing and what isn’t, which leads on to certain pages being returned when users enter search queries.
So, although it goes on in the background and doesn’t have any involvement from the user or SEO side, it is still something which needs consideration – otherwise you can publish all you want and never be found.
For smaller sites it might not be an issue, but regardless of the size of the site, the tips below can help you get crawled effectively and make sure that your content gets in front of the right users.
Make The Most Of Your Crawl Budget
With the above description in mind, you can imagine how Google endeavours to crawl every page on the internet, but it simply can’t.
So the idea behind making the most of your crawl budget is to ensure that your top content is easily accessible and crawled so it can rank. This is done my making it easy to find and making sure that the crawlers aren’t wasting tie by looking at pages which you don’t want indexed.
Most of the practices below will also help with users navigating the site, not just crawlers and robots. There are also more specific issues which could crop up, but they are too many to count, so the below are the most common issues we come across on a fairly regular basis (unfortunately!).
1. Linking Issues – Both In And Out
Link issues are one of the fundamental issues which can waste crawl budgets. If crawlers find broken links, errors or go through too many redirect chains, this slows down the crawling, uses up budget unnecessarily and could result in the crawler not reaching the desired page.
Firstly, with redirects you should use a 301 redirect in 99.99% of cases. There are other directives, but you should only use them when you know what you’re doing.
Setting up redirects to ensure that you don’t have 404s is good, but you shouldn’t be having internal links going through these redirects when not needed. Whenever a page URL is changed on your site, you should set up the 301 redirect and also change the internal linking references to the new target URL. With menu links and other templated content, it is simple to make site wide updates. This saves time on crawl budgets, user experience and helps cut down pressure on you server.
Finding these links can be done easily through crawling tools such as Screaming Frog – arguably my favourite SEO tool! If you have to do this manually, then it should be fairly straightforward for menu links, but others may need some hunting down.
This applies to both internal and external broken links on your site. As well as keeping on top of and broken links pointing within your site through crawling tools or Search Console, you should do the same for external links.
Google Search console won’t report on these, but you can use free crawling tools and periodically check your blog posts, products and other pages which point to external URLs and assets.
2. Internal Linking Structures
Your menu structure has been mentioned previously and it should be functional and perform well for both users and crawlers.
With a well thought out menu and linking structure, users should be able to get to the most important pages on your site with as few clicks as possible. Including the correct keywords and terminology will also pass on other SEO benefits, so this can help across the board.
Your header and footer menus are probably the bulk of internal links, but you can include closely related content links, such as blog posts related to a product or topic, as well as using the keywords in your content to link to relevant pages. Again, this will help users find information, crawlers find your deeper pages, and also increase relevancy of keywords & pages.
3. Page Speeds & Technologies
These two are closely linked and are becoming more of an issue than in the past. Crawlers are catching up with technology, but the real world development often outpaces the technology of crawlers looking to index your site.
In 2021 Google launched the Core Web Vitals report, which is a fancy way of showing how fast your website is to load and be interacted with in your industry.
In simple page speed measurements, the longer pages take to load, the less number of pages the crawler will get to in your allotted time. If Google gives you an hour a week, you could conceivably double your crawl budget or more if you currently have low page speeds.
Technologies & Development
4. Your Robots.txt File
This simple file directs crawlers when they get to your site, helping them find sitemaps and making sure they don’t go where you don’t want them.
Blocking directories and URLs through your robots.txt file is a good way to hide certain aspects of your site – from whole directories to individual URLs.
For example, on a WordPress site, your default admin directory shouldn’t be indexed, so you can include the line “disallow: /wp-admin/” and you’re good to go. This can apply to other directories for sensitive information as well as blocking individual or bulk duplicated content. Coupled with canonicals, these can help control your crawl budget – but they are powerful so you should only use them if you know what you’re doing. It’s all too easy to disallow your important content!
This simple directive points crawlers to the sitemap of your site, giving them access straight away to the list of important URLS, without the need to rely on anything else here.
It is a simple directive which is often missing from websites. It should be put in simply as “Sitemap: [URL]”, where [URL] is the full URL of your sitemap.
5. URL Parameters
URL parameters are mostly different ways of showing the same information in a different way – such as when you are looking at a category page and sort by price. The URL will change to include a question mark followed by other code or words. This is very useful for users, but can quickly create hundreds, if not thousands of URLs which have basically the same content on. Closely linked to the robots.txt, you can remove these often-duplicated pages.
Filters & Sorting
As mentioned above, filtering and sorting data or products can result in duplications. Crawlers are better at working with these than they used to be, which is why the URL Parameters section of Google Search console has been removed now.
You can still run into issues with run away crawl budgets though, so you should ensure that your robots.txt is formatted to stop crawlers going through and indexing these pages.
That being said, if you have certain links which are only visible with filters or other parameters, you should add links elsewhere or use other methods to make sure that crawlers can access the pages you need. Again, if you are unsure, then it’s best to leave it to professionals.
With many filters – particularly on ecommerce sites – you can apply one filter on top of another. This is great for users, letting you just view you size, preferred colour, costing, etc.
For robots, this can create huge lists of pages – some of which never end!
If you think of the number of PIN iterations you can have, only 4 are needed for good security. If you have 4 filter types with around 10 selections each, you’ve reached that number already – and that’s just for 1 category page!
Another example we’ve seen in numerous guises are calendars. If you can click forward to next month, you can often keep doing this. You probably won’t realise until your Screaming Frog crawl has been going a long time and is looking 50 years in the future.
Sitemaps are integral to crawlers navigating your site, and user friendly sitemaps are also a great boon to accessibility.
XML sitemaps list out the URLs on your site in their simplest form.
There is more to them than this though, as they can dictate how often certain sections & URLs are crawled. Setting frequency and importance helps allocate the crawl budget around your site. For example, your blog pages may be updated with more frequency than your main landing pages, so these should be crawled more frequently to help index the most relevant information.
They should also be dynamically generated through your site, saving you having to update them periodically.
HTML sitemaps for users are less frequently used than they used to be, but they still have importance.
Accessibility is the main point here, letting users get around your site when users can’t access your site in a traditional manner, or when certain aspects don’t load due to compatibility.
Whilst not 100% needed, they are simple pages to create and can help the user greatly when needed.
Crawl budget is a powerful thing and can help you get indexed quickly and effectively. By assessing each page on your site and deciding what should and shouldn’t be indexed, you can figure out how to effectively manage the crawl budget.
As with all things SEO, you should consider what is needed for your site and not mess around if you’re unsure – as you could easily deindex your website.