Koozai > Blog > SEO Unpacked: What Is Indexation And Why Does It Matter?

SEO Unpacked: What Is Indexation And Why Does It Matter?

1st Jul 2026

| 20 minutes to read

SEO Unpacked is all about breaking down the parts of SEO that sound far more complicated than they need to be. This time, we’re looking at indexation, which might not sound like the most exciting topic in the world, but it is one of the most important parts of technical SEO. You can write brilliant content, build a great landing page, optimise every title tag, add schema, improve your internal links and spend hours debating whether a heading should say “solutions” or “services”. However, if that page is not indexed, it is not going to appear in Google’s search results.

Indexation sits between your website existing and your website being eligible to appear in search. It is one of those areas where SEOs can quickly fall into technical rabbit holes, but at its core, it is fairly simple: Google needs to find your page, understand it, store it, and then decide whether it is useful enough to show for relevant searches. So, let’s unpack what indexation means, how crawling, indexing and ranking are different, why some pages are crawled but not indexed, and why not every single page on your website should be indexed.

What is indexation?

Indexation is the process of a search engine storing information about a page so it can be considered for search results. A simple way to think about it is to imagine Google’s index as a huge library. Crawling is Google discovering that a book exists. Indexing is Google reading enough of that book to understand what it is about and adding it to the library system. Ranking is Google deciding whether that book is the best recommendation when someone asks a particular question.

If your page is not indexed, Google may know the URL exists, but it has not stored it in a way that allows it to appear in search results. That is why indexation is so important. It is the stage where a page moves from simply existing online to becoming eligible to appear when someone searches for something relevant. That does not mean the page is guaranteed to rank, it just means it can be considered.

Crawling, indexing and ranking: what is the difference?

Crawling, indexing and ranking often get grouped together, but they are not the same thing. This is where a lot of confusion starts, especially when Google Search Console says a page has been crawled but it still is not appearing in search (This is one of many reasons why your technical SEO may have a thousand yard stare). To make sense of indexation properly, it helps to understand each stage and what Google is doing at each point.

Crawling

Crawling is when search engines discover and visit URLs. Googlebot follows links, reads sitemaps, revisits known pages and looks for new or updated content. At this stage, Google is essentially trying to understand what pages are available and whether it can access them. A page can be discovered through internal links, external links, XML sitemaps, redirects, previously known URLs and other sources Google has found over time.

Once Google discovers a URL, it may crawl it. This means it requests the page, checks what comes back from the server, and tries to understand what content or information is available. However, crawling does not automatically mean indexing. This is one of the most common misunderstandings in SEO, just because Google has crawled a page, it doesn’t mean Google has decided that page deserves a place in the index. It is a bit like someone walking into a shop, having a look around, and then leaving without buying anything. Yes, they visited. But that does not mean the visit led to the outcome you wanted.

Indexing

We then move to step two, which is indexing. At this stage Google tries to understand the page. It looks at the content, title tag, headings, images, videos, canonical tags, structured data, internal links, duplicate signals and other information to work out what the page is about and whether it should be stored in the index. This is where quality, clarity and uniqueness start to matter.

If a page is thin, duplicated, blocked, canonicalised elsewhere, low value, confusing, or simply not useful enough compared with other pages Google already knows about, it may not be indexed. Again, not every crawled page gets indexed. That is not always a bad thing either. In some cases, it is exactly what you want!

Ranking

Ranking is the next stage. Once a page is indexed, Google can consider it for relevant search queries. Ranking is where Google decides whether that page should appear, and where it should appear, based on how useful, relevant and trustworthy it appears for a particular search. This means an indexed page can still get no clicks, no impressions and no visibility.

Indexing is not the same as ranking. It simply means the page is eligible to appear. You still need the page to match search intent, be useful, be technically accessible, have decent internal links, compete with what is already ranking and actually deserve to appear. In simple terms, crawled means Google has visited the page, indexed means Google has stored and understood the page, and ranking means Google is showing the page for relevant searches. You need all three, but each one solves a different problem.

Why does indexation matter?

Indexation matters because pages that are not indexed cannot generate organic search traffic from Google. That sounds obvious, but it is easy to overlook. You might have a product page with strong commercial value, a service page targeting an important keyword, or a blog post designed to answer a high-intent query. If that page is not indexed, it does not matter how good it is. It is invisible in search.

Indexation issues can affect organic visibility, traffic, leads, revenue, product discoverability, content performance and how search engines understand your site. For small sites, one or two indexation issues might not seem like much. For larger websites, especially ecommerce, property, recruitment, healthcare, finance or multi-location sites, indexation can quickly become a much bigger problem. A few messy filters, duplicate parameter URLs or low-value tag pages can suddenly turn into thousands of URLs being crawled, discovered or excluded. That is when indexation stops being a simple SEO check and starts becoming a site quality and crawl efficiency issue.

Why some pages are crawled but not indexed

This is where people tend to panic, and it’s caused many SEO’s to scratch their heads. You open Google Search Console, go into the Page Indexing report, and there it is: “Crawled – currently not indexed.” This means Google has crawled the page but has not indexed it. That does not always mean something is technically broken, but it does mean Google has chosen not to include that page in the index at that point There are a few common reasons this can happen. In some cases, it may be a technical issue, in others it may be a content quality issue, a duplication issue, a canonical issue, or simply a sign that the page is not giving Google enough reason to store it in the index. The possibilities are (unfortunately) endless!

The page is too thin

If a page has very little useful content, Google may decide it is not worth indexing. This often happens with empty category pages, very short blog posts, product pages with barely any description, location pages with copied text and only the town name changed, auto-generated pages, or filter pages with no unique value.

Thin content is not just about word count. A short page can be useful, and a long page can still be completely useless. The real question is whether the page gives users enough information to justify its existence. If the answer is “not really”, indexation may be a struggle.

The page is too similar to another page

Google does not need ten versions of the same thing. If you have multiple URLs with near-identical content, Google may choose one version to index and ignore the others. This is common on ecommerce websites where filters, sorting options, tracking parameters and product variants create lots of duplicate or near-duplicate URLs.

For example:

/mens-trainers/
/mens-trainers/?sort=price-low
/mens-trainers/?colour=black
/mens-trainers/?utm_source=email
/mens-trainers/page/2/

Some of those URLs may be useful for users, but that does not mean they all need to be indexed. If Google sees a lot of duplication, it may simply decide another URL is the better version.

The canonical points somewhere else

A canonical tag tells search engines which version of a page you prefer to be treated as the main version. If Page A has a canonical tag pointing to Page B, you are essentially saying, “Google, please treat Page B as the main one.” That can be useful for managing duplication, but it can also cause confusion if implemented badly.

If you are wondering why a page is not indexed, check whether the canonical tag points to itself or another URL. Then check what Google has selected as the canonical in Search Console. Sometimes you suggest one thing with your canonical tag, but Google decides another URL looks like the stronger or more appropriate version.

The page has a noindex tag

A noindex tag tells search engines not to index the page. This is useful when used intentionally, but painful when added by accident. Common accidental noindex issues include staging sites pushed live with noindex still in place, blog categories or tags noindexed by a plugin, product pages noindexed due to template rules, development settings left on after launch, or CMS-level SEO settings applying wider than expected.

If a page should be indexed, make sure it does not have a noindex tag in the HTML or HTTP header. This is one of those checks that sounds basic, but it has caught out plenty of websites. Nobody is above the accidental noindex.

The page is blocked from crawling

If a page is blocked in robots.txt, Google may not be able to crawl it properly. This is different from noindex. Robots.txt controls crawling, while noindex controls indexing. If you block Google from crawling a page, Google may not be able to see the page content or the noindex tag on the page.

That can create messy situations where Google knows a URL exists but cannot properly evaluate it. So, if you want a page removed from the index, do not just block it in robots.txt and hope for the best. Use the right method for the job.

The page is not internally linked well

Internal links help search engines discover pages and understand their importance. If a page exists but is buried deep in the site, only linked from an old sitemap, or not linked from any meaningful page, Google may see it as low priority. This is especially common with orphan pages, old campaign landing pages, blog posts not linked from category hubs, product pages hidden behind filters, or service pages missing from navigation and supporting content.

If a page matters, link to it like it matters. Internal linking is one of the few SEO levers you fully control, so do not make Google work harder than it needs to. Go get those links!

The page has quality issues

Sometimes the technical setup is fine, but the page still does not get indexed because it is not strong enough. That can include poor or duplicated content, weak search intent alignment, no clear purpose, too much boilerplate content, lack of useful information, poor page experience, confusing structure or weak trust signals.

This is where SEOs need to avoid treating indexation as purely technical. Yes, check the tags, canonicals, robots.txt and status codes. But also ask the uncomfortable question: is this page actually worth indexing? Because sometimes the answer is no, and Google has simply noticed before you have.

Why not every page should be indexed

This is the bit that often gets missed. A healthy website does not need every single URL indexed. In fact, trying to force every page into the index can make your website weaker, not stronger. Some pages exist for users, functionality or tracking, but do not need to appear in search results. That includes things like basket pages, checkout pages, internal search results, login pages, thank you pages, filter combinations with no search value, duplicate parameter URLs, thin tag pages, admin or account pages, staging URLs and PPC landing pages not built for organic search.

Indexation should be intentional. You want Google spending time on the pages that matter: your key services, products, categories, guides, resources and commercially useful content. If your site is full of low-value indexed pages, it can dilute quality signals and make it harder for search engines to understand what your important pages are. Think of it like inviting people to a meeting. You do not need everyone there. You need the right people there. Otherwise, suddenly there are 43 people on a call, nobody knows who owns the action, and someone is still screen sharing from the previous agenda item. Your index should be focused.

How to check indexation in Google Search Console

Google Search Console is usually the best starting point for checking indexation issues. There are two main areas to use: the Page Indexing report and the URL Inspection tool. The Page Indexing report is useful for spotting wider patterns across the site, while the URL Inspection tool is better when you want to look into one specific URL.

Used together, they can help you understand whether a page is indexed, whether Google has crawled it, whether indexing is being blocked, what canonical Google has selected, and whether there are wider issues affecting groups of URLs.

Page Indexing report

The Page Indexing report gives you an overview of which pages are indexed and which are not. You can use it to identify issues such as Crawled – currently not indexed, Discovered – currently not indexed, Duplicate without user-selected canonical, Alternate page with proper canonical tag, Excluded by noindex tag, Blocked by robots.txt, Not found 404, server errors and redirect issues.

The important thing here is not to panic when you see excluded URLs. Some exclusions are perfectly fine. For example, if checkout pages, account pages or duplicate parameter URLs are not indexed, that is usually expected. If important service pages, product pages or articles are excluded, that is where you need to investigate. The trick is to separate “this is a problem” from “this is Google doing exactly what we wanted”.

URL Inspection tool

The URL Inspection tool lets you check a specific URL. This is useful when you want to understand what Google knows about an individual page. You can check whether the URL is indexed, when it was last crawled, whether crawling is allowed, whether indexing is allowed, what canonical Google selected, whether the page is in a sitemap, whether Google could fetch the page and whether there are enhancement issues.

This is especially useful when a client or stakeholder asks, “Why is this page not on Google?” Instead of guessing, you can inspect the URL and look at the actual signals. That said, remember that “URL is on Google” does not mean the page is ranking well. It means the page is eligible to appear. Ranking is still a separate battle.

How to use crawl data to check indexation issues

Search Console tells you what Google is reporting. Crawl data tells you what your website is actually doing. Tools like Screaming Frog, Sitebulb and other crawlers help you check the technical setup at scale. This is where you can start spotting patterns that are harder to see one URL at a time.

A crawl can help you review status codes, indexability, canonical tags, noindex tags, robots.txt blocks, internal links, orphan URLs, sitemap URLs, duplicate titles, duplicate content, thin pages, redirect chains, pagination, parameter URLs and JavaScript-rendered content. The real value comes from comparing crawl data with GSC data.

For example, if GSC says a page is excluded by noindex, your crawl should confirm where that noindex is coming from. If GSC says Google selected a different canonical, your crawl can show whether your canonical signals are consistent. If GSC shows lots of discovered but not indexed URLs, your crawl can help identify whether they are internally linked, included in sitemaps or buried deep in the site. If important pages are not indexed, your crawl can show whether they are technically indexable in the first place. This is where SEO becomes less about staring at one report and more about joining the dots.

A simple indexation investigation process

If you are checking why a page is not indexed, start simple. The first question should always be whether the page should be indexed in the first place. Before trying to force a page into the index, you need to understand whether it actually deserves to be there. If the page has no organic purpose, no unique value and no reason to appear in search, then keeping it out of the index may be the right outcome.

If it is commercially important, useful and technically clean, then it is worth investigating properly. A simple checklist would be: does it return a 200 status code, is it blocked in robots.txt, does it have a noindex tag, does the canonical point to itself, is it included in the XML sitemap, is it internally linked from relevant pages, is the content unique and useful, does it match a clear search intent, is Google selecting a different canonical, has Google crawled it recently, and are similar pages being indexed?

Common indexation mistakes

Indexation issues often come from small decisions that scale badly. These mistakes are not always dramatic on their own, but when they happen across lots of pages, templates or URL types, they can create much bigger indexation problems. That is why it is important to look for patterns rather than only checking one URL at a time.

Adding every URL to the sitemap

Your XML sitemap should include the URLs you want search engines to crawl and index. It should not be a dumping ground for every URL your CMS can find. If your sitemap includes redirected URLs, noindexed URLs, canonicalised URLs, 404s, filtered URLs and low-value pages, you are sending mixed signals.

A clean sitemap should mainly contain indexable, canonical, important URLs. Basically, do not hand Google a map where half the roads lead to a wall.

Noindexing pages that should rank

This happens more often than people like to admit. A page gets noindexed during development, testing or migration, then everyone forgets about it. Weeks later, someone asks why traffic has dropped. Cue panic, Slack messages, and someone asking whether anything has changed recently, even though yes, obviously, lots of things have changed recently.

Always check noindex tags during launches, migrations and template changes. It is a simple check, but it can prevent a lot of unnecessary confusion.

Blocking important pages in robots.txt

Robots.txt is powerful, but it is also very easy to misuse. Blocking low-value crawl paths can be useful, but blocking important sections of the site can stop search engines from accessing content they need.

This is especially risky during migrations or redesigns, when staging rules accidentally end up on the live site. If an important page is not being crawled, robots.txt should be one of your first checks.

Ignoring internal linking

If a page matters, it should not be an orphan. Important pages need clear internal links from relevant areas of the site. That might include navigation, category pages, hub pages, related blog posts, breadcrumbs or contextual links.

Internal links help users move through the site, but they also help search engines understand which pages are important and how topics connect. If your key page is only linked from a forgotten blog post from 2019, do not be shocked when Google treats it like a forgotten blog post from 2019.

Creating too many low-value pages

This is a big one for larger sites. More pages does not automatically mean more traffic. Creating hundreds or thousands of pages with little unique value can cause indexation problems, especially when those pages are repetitive, thin or generated at scale.

Examples include location pages with near-identical copy, filter pages targeting tiny variations, tag pages with no useful content, AI-generated pages with no real editorial value, and product variant pages with duplicated descriptions. The goal is not to create as many URLs as possible. The goal is to create useful pages that serve a purpose. Very boring. Very effective.

Submitting URLs again and again

Requesting indexing in GSC can be useful for a small number of important URLs, especially after updates. But repeatedly submitting the same URL is not a strategy. If a page has underlying quality, duplication or technical issues, pressing “request indexing” like it owes you money will not fix the problem.

Find the issue, fix the issue, and then request indexing if needed. Otherwise, you are just asking Google to look at the same unresolved problem again.

So what about AI?

Well it wouldn’t be normal to talk about 2026 SEO without mentioning AI, would it?

AI search is slightly different from traditional search, but it still relies on many of the same foundations and principles. AI systems need to discover, access, understand and retrieve information before they can use it in an answer. Depending on the platform, this may happen through traditional search indexes, AI-specific crawlers, or real-time retrieval when a user asks a question. For example, Google’s AI experiences are still closely tied to Google Search, which means your important pages still need to be crawlable, indexable, useful, well structured and clearly linked from relevant areas of the site.

The important thing to remember is that being indexed in Google and being used in an AI-generated answer are not exactly the same thing. A page might be indexed and ranking, but still not be selected as a source in an AI response. Equally, some AI tools may retrieve information from the web in ways that do not perfectly mirror traditional rankings. This is why the basics still matter: your content needs to be accessible, easy to understand, clearly structured and genuinely useful.

For AI visibility, the goal is not to create strange content written purely for our algorithmic overlords, it is to make your key information easy for both users and machines to find, interpret and trust. Using clear headings, strong internal links, visible on-page content, helpful summaries, specific answers, schema where appropriate and clean technical foundations all help. AI may change how answers are presented, but it has not removed the need for good technical SEO, strong content and a clean, machine friendly website.

So what’s the key takeaway?

Indexation is not just a technical SEO checkbox. It is the bridge between your website existing and your website being eligible to appear in search. Crawling means Google has found and visited a page. Indexing means Google has understood and stored it. Ranking means Google has decided it is relevant enough to show for a search (that’s the part your stakeholders will care about).

The mistake is assuming those three things happen automatically, when in reality each stage depends on a mixture of technical accessibility, content quality, internal linking, duplication signals and overall page value. Some pages will be crawled but not indexed. Some pages should not be indexed at all. Some pages are technically indexable but not good enough to earn visibility. Others are blocked, canonicalised, duplicated, orphaned or accidentally noindexed because websites, much like marketing teams, are held together by processes, plugins and mild panic.

The best approach is to be intentional. Know which pages matter. Make sure they are crawlable, indexable, internally linked, useful and included in your sitemap. At the same time, keep low-value pages out of the index where appropriate. A clean, focused index helps search engines understand your website properly. More importantly, it helps your important pages stand a better chance of being found by the people actually looking for them. And that is the whole point of SEO.

Services

Call us on 0330 353 0300, email info@koozai.com or fill out our Contact Form.

Share this post

Liam Fernie

Strategic SEO Specialist

Liam Fernie is an experienced Strategic SEO Specialist, having worked across many agency roles and in freelance SEO consultancy for major websites. With a strong technical SEO background and a degree in Business and Technology, Liam has worked extensively in SEO with clients such as the leading international retailer Joules and across multiple industries, ranging from health and fashion to technology and education. Liam’s expertise covers technical SEO, content optimisation, on-page strategy, and aligning search activity with wider business objectives. He has a proven track record of uncovering growth opportunities that drive measurable ROI, such as identifying new audience segments and building strategies that open additional revenue streams for clients in highly competitive sectors. He has delivered SEO solutions for high-profile clients, including Joules, Where the Trade Buys, and Vivo Life, as well as supporting agencies such as Convertex, Time54, and Fruity Llama. At Summit Media, he quickly rose from Executive to Technical Manager, overseeing multimillion-pound accounts and driving both strategic and operational improvements. He has also contributed to scaling SEO teams through process development, SOPs, and mentoring junior staff. Outside of work, Liam describes himself as a bit of a geek, with a love for gaming, keeping up with the latest tech news, and watching Formula 1. He also enjoys making games, fishing, Sunday morning car boots, and catching up over a pint.

How to Write Expert Quotes Journalists Will Actually Use

Isobel Walster

30th Jun 2026
Digital PR Blog

How To Choose An AI Search Optimisation Agency

Sophie Roberts
@hospitalitysoph

25th Jun 2026

Robot sat on bench using a laptop to do a Google search

Why AI Search and Agentic Commerce Should Be on Every Marketing Team’s Radar This Summer