Koozai > Blog > 6 Ways To Minimise Wasted Crawl Budget

6 Ways To Minimise Wasted Crawl Budget

23rd Jun 2015

| 8 minutes to read

The term “Crawl Budget” might be something you’ve heard of flying around when it comes to SEO, but what is it and what does it mean?

At it’s simplest level it is how much time and resource Google or other search engines put into crawling your website. From this they then decide what is worth indexing and what isn’t, which leads on to certain pages being returned when users enter search queries.

So, although it goes on in the background and doesn’t have any involvement from the user or SEO side, it is still something which needs consideration – otherwise you can publish all you want and never be found.

For smaller sites it might not be an issue, but regardless of the size of the site, the tips below can help you get crawled effectively and make sure that your content gets in front of the right users.

Make The Most Of Your Crawl Budget

With the above description in mind, you can imagine how Google endeavours to crawl every page on the internet, but it simply can’t.

So the idea behind making the most of your crawl budget is to ensure that your top content is easily accessible and crawled so it can rank. This is done my making it easy to find and making sure that the crawlers aren’t wasting tie by looking at pages which you don’t want indexed.

Most of the practices below will also help with users navigating the site, not just crawlers and robots. There are also more specific issues which could crop up, but they are too many to count, so the below are the most common issues we come across on a fairly regular basis (unfortunately!).

via GIPHY

1. Linking Issues – Both In And Out

Link issues are one of the fundamental issues which can waste crawl budgets. If crawlers find broken links, errors or go through too many redirect chains, this slows down the crawling, uses up budget unnecessarily and could result in the crawler not reaching the desired page.

Redirects

Firstly, with redirects you should use a 301 redirect in 99.99% of cases. There are other directives, but you should only use them when you know what you’re doing.

Setting up redirects to ensure that you don’t have 404s is good, but you shouldn’t be having internal links going through these redirects when not needed. Whenever a page URL is changed on your site, you should set up the 301 redirect and also change the internal linking references to the new target URL. With menu links and other templated content, it is simple to make site wide updates. This saves time on crawl budgets, user experience and helps cut down pressure on you server.

Finding these links can be done easily through crawling tools such as Screaming Frog – arguably my favourite SEO tool! If you have to do this manually, then it should be fairly straightforward for menu links, but others may need some hunting down.

Broken Links

This applies to both internal and external broken links on your site. As well as keeping on top of and broken links pointing within your site through crawling tools or Search Console, you should do the same for external links.

Google Search console won’t report on these, but you can use free crawling tools and periodically check your blog posts, products and other pages which point to external URLs and assets.

2. Internal Linking Structures

Your menu structure has been mentioned previously and it should be functional and perform well for both users and crawlers.

With a well thought out menu and linking structure, users should be able to get to the most important pages on your site with as few clicks as possible. Including the correct keywords and terminology will also pass on other SEO benefits, so this can help across the board.

Your header and footer menus are probably the bulk of internal links, but you can include closely related content links, such as blog posts related to a product or topic, as well as using the keywords in your content to link to relevant pages. Again, this will help users find information, crawlers find your deeper pages, and also increase relevancy of keywords & pages.

3. Page Speeds & Technologies

These two are closely linked and are becoming more of an issue than in the past. Crawlers are catching up with technology, but the real world development often outpaces the technology of crawlers looking to index your site.

via GIPHY

Page Speeds

In 2021 Google launched the Core Web Vitals report, which is a fancy way of showing how fast your website is to load and be interacted with in your industry.

In simple page speed measurements, the longer pages take to load, the less number of pages the crawler will get to in your allotted time. If Google gives you an hour a week, you could conceivably double your crawl budget or more if you currently have low page speeds.

Technologies & Development

Page speeds and accessibility are linked to crawlability, especially as new technologies are used and internal linking might not be reached by crawlers. E.G: if your page infinitely scrolls, a crawler would never reach the footer menu. Also, if you rely on JavaScript for rendering, then you should ensure that a non-JS version of links are available for crawlers.

4. Your Robots.txt File

This simple file directs crawlers when they get to your site, helping them find sitemaps and making sure they don’t go where you don’t want them.

Blocking

Blocking directories and URLs through your robots.txt file is a good way to hide certain aspects of your site – from whole directories to individual URLs.

For example, on a WordPress site, your default admin directory shouldn’t be indexed, so you can include the line “disallow: /wp-admin/” and you’re good to go. This can apply to other directories for sensitive information as well as blocking individual or bulk duplicated content. Coupled with canonicals, these can help control your crawl budget – but they are powerful so you should only use them if you know what you’re doing. It’s all too easy to disallow your important content!

Sitemap

This simple directive points crawlers to the sitemap of your site, giving them access straight away to the list of important URLS, without the need to rely on anything else here.

It is a simple directive which is often missing from websites. It should be put in simply as “Sitemap: [URL]”, where [URL] is the full URL of your sitemap.

5. URL Parameters

URL parameters are mostly different ways of showing the same information in a different way – such as when you are looking at a category page and sort by price. The URL will change to include a question mark followed by other code or words. This is very useful for users, but can quickly create hundreds, if not thousands of URLs which have basically the same content on. Closely linked to the robots.txt, you can remove these often-duplicated pages.

Filters & Sorting

As mentioned above, filtering and sorting data or products can result in duplications. Crawlers are better at working with these than they used to be, which is why the URL Parameters section of Google Search console has been removed now.

You can still run into issues with run away crawl budgets though, so you should ensure that your robots.txt is formatted to stop crawlers going through and indexing these pages.

That being said, if you have certain links which are only visible with filters or other parameters, you should add links elsewhere or use other methods to make sure that crawlers can access the pages you need. Again, if you are unsure, then it’s best to leave it to professionals.

Recursive Errors

With many filters – particularly on ecommerce sites – you can apply one filter on top of another. This is great for users, letting you just view you size, preferred colour, costing, etc.

For robots, this can create huge lists of pages – some of which never end!

If you think of the number of PIN iterations you can have, only 4 are needed for good security. If you have 4 filter types with around 10 selections each, you’ve reached that number already – and that’s just for 1 category page!

Another example we’ve seen in numerous guises are calendars. If you can click forward to next month, you can often keep doing this. You probably won’t realise until your Screaming Frog crawl has been going a long time and is looking 50 years in the future.

6. Sitemaps

Sitemaps are integral to crawlers navigating your site, and user friendly sitemaps are also a great boon to accessibility.

XML Sitemaps

XML sitemaps list out the URLs on your site in their simplest form.

There is more to them than this though, as they can dictate how often certain sections & URLs are crawled. Setting frequency and importance helps allocate the crawl budget around your site. For example, your blog pages may be updated with more frequency than your main landing pages, so these should be crawled more frequently to help index the most relevant information.

They should also be dynamically generated through your site, saving you having to update them periodically.

HTML Sitemaps

HTML sitemaps for users are less frequently used than they used to be, but they still have importance.

Accessibility is the main point here, letting users get around your site when users can’t access your site in a traditional manner, or when certain aspects don’t load due to compatibility.

Whilst not 100% needed, they are simple pages to create and can help the user greatly when needed.

via GIPHY

Conclusion

Crawl budget is a powerful thing and can help you get indexed quickly and effectively. By assessing each page on your site and deciding what should and shouldn’t be indexed, you can figure out how to effectively manage the crawl budget.

As with all things SEO, you should consider what is needed for your site and not mess around if you’re unsure – as you could easily deindex your website.

Services

Call us on 0330 353 0300, email info@koozai.com or fill out our Contact Form.

Share this post

Responses

Optimizando el Crawl Budget con códigos de estado HTTP 304
24th Aug 2016
[…] quieres ver otras maneras de mejorar ese presupuesto te dejo aquí un artículo que responderá a tus […]
Reply
Luke Monaghan
25th Jun 2015
Thanks Barry.
Of course, using Meta robots noindex tags is fine if you explicitly don’t want pages indexed, but blocking via robots.txt is a good way to conserve crawl budget which is what the tips in this post are to help achieve.
If it is important that a page isn’t indexed, I would definitely recommend Meta noindex, but for the benefit of saving crawl budget for more valuable pages, blocking via robots.txt is an effective technique.
Reply
Barry Allen
25th Jun 2015
Hmm.. Yes, big search engines respect robots.txt but having a backlink from another website to a page being blocked by robots may still result to that page being indexed.
Just clarifying the part “you’re telling a crawler not to access the page or index it”. Crawling and indexing are different. You can explicitly tell search engines not to index something by using meta robots noindex.
Reply
Luke Monaghan
24th Jun 2015
Hi Barry,
Thanks for your comment.
Although this is partly true, all of the big searches do obey the rules set in a robots.txt file. Of course another blocking method is to password-protect the relevant directory, this way all web crawlers won’t be able to access and index the confidential/ private content.
Reply
Barry Allen
24th Jun 2015
Blocking pages/directories through robots.txt doesn’t guarantee that it won’t be indexed…
Reply
SearchCap: Google Mobile Tests, DuckDuckGo Queries & Retargeting AdWords
23rd Jun 2015
[…] 6 Ways To Minimise Wasted Crawl Budget, koozai.com […]
Reply

Gary Hainsworth

Senior Organic Data Specialist

Gary is our technical SEO specialist and boasts more than 10 years’ experience in the industry. With in-depth knowledge on site migrations and all aspects of technical SEO, he’s a valuable asset to our team. Gary’s worked with the likes of the V&A, Warburtons, the NHS and the Lake District National Park. He has a passion for guitars too, be that playing them, modifying them or even building them. Gary has appeared in Startups Magazine, Portsmouth News and Southampton.gov.uk.

Digital Ideas Monthly

Sign up now and get our free monthly email. It’s filled with our favourite pieces of the news from the industry, SEO, PPC, Social Media and more. And, don’t forget – it’s free, so why haven’t you signed up already?

6 Ways To Minimise Wasted Crawl Budget

Make The Most Of Your Crawl Budget

1. Linking Issues – Both In And Out

Redirects

Broken Links

2. Internal Linking Structures

3. Page Speeds & Technologies

Page Speeds

Technologies & Development

4. Your Robots.txt File

Blocking

Sitemap

5. URL Parameters

Filters & Sorting

Recursive Errors

6. Sitemaps

XML Sitemaps

HTML Sitemaps

Conclusion

Services

Share this post

Responses

Leave a Reply Cancel reply

Gary Hainsworth

Senior Organic Data Specialist

Digital Ideas Monthly