Team members discuss data reports and statistics in a collaborative setting around a table.

Search Console Indexing Anomalies: When Crawl Budget Is the Wrong Diagnosis

When Search Console starts showing Discovered – currently not indexed, Crawled – currently not indexed, Soft 404, or missing pages after a redesign, many teams jump straight to crawl budget. On most small and mid-size WordPress or WooCommerce sites, that is usually the wrong first diagnosis.

Google’s Search Central documentation is explicit: crawl-budget work mainly matters for very large sites or sites with rapidly changing URLs. That does not mean crawl budget never matters. It does mean SMB and mid-market teams should usually look first at discovery, duplication, quality signals, and rendering.

That matters because the wrong diagnosis wastes time. I see teams tweaking robots.txt, resubmitting sitemaps, or chasing “budget” myths while the real issue is parameter sprawl, weak internal linking, thin template pages, or JavaScript-dependent content that Google cannot render reliably after a theme or plugin change.

What Search Console symptoms usually mean instead of crawl-budget trouble

Discovered – currently not indexed is often a discovery or prioritization signal, not proof of a budget crisis. Check whether the URL is buried in faceted navigation, only appears in XML sitemaps, or sits behind weak internal links. On WooCommerce sites, filtered category URLs, search-result pages, tag archives, and parameter variants are common culprits.

Crawled – currently not indexed usually means Google fetched the page and still did not add it. That often points to duplication, thin near-empty templates, placeholder product pages, weak differentiating content, or canonical conflicts. It is a signal to inspect page usefulness and uniqueness, not to assume Googlebot ran out of capacity.

Soft 404 often appears when pages return 200 OK but look empty, temporary, unavailable, or functionally dead. Common WordPress patterns include thin location pages, out-of-stock product templates with no useful alternatives, filtered URLs with no real inventory, and custom error states rendered as normal pages.

Pages disappearing after changes frequently trace back to migrations, theme swaps, app embeds, plugin conflicts, changed canonicals, broken internal links, blocked assets, or client-side rendering problems. Google’s rendering guidance is especially relevant here: if important content or resources depend on JavaScript and those resources are blocked, erroring, or delayed, indexing can drop even when the page “looks fine” in a browser.

Use Search Console in this order: the Page Indexing report for pattern recognition, URL Inspection on representative examples, then Crawl Stats for host failures, response anomalies, and resource fetch clues. Google’s crawling troubleshooting docs support this workflow and specifically call out host status, soft 404s, and crawl patterns as diagnostic inputs.

What to do next

Start with a short triage pass this week:

1. Inspect representative URLs, not just counts.
Pull examples from each bucket: discovered, crawled, soft 404, and recently dropped pages. Compare live URL, canonical, referring sitemap, internal links, and rendered HTML.

2. Clean up URL inventory.
Prune low-value archives, filter combinations, internal search pages, duplicate parameters, and thin taxonomy pages. Make sure canonicals align with indexable destinations. Do not assume the sitemap will fix poor URL hygiene by itself.

3. Strengthen internal discovery.
Important products, services, and category pages should be reachable through crawlable links from high-value pages, not only through faceted navigation, JavaScript interactions, or sitemap inclusion.

4. Check rendering after theme or plugin changes.
Test whether critical content, links, and metadata appear in the rendered version Google sees. Review blocked resources, JavaScript errors, lazy-loaded content, and app layers that replace server-rendered content.

5. Validate status codes and soft-404 patterns.
If a page should not exist, return the correct status. If it should exist, make it substantively useful. Thin “empty state” pages with 200 responses often create index noise and poor signals.

6. Review Crawl Stats for infrastructure clues.
Look for spikes in errors, slow response behavior, or host issues before blaming indexing systems. If the server is unstable, crawl efficiency and indexing confidence both suffer.

Only after those checks should you spend serious time on crawl-budget theory. For most WordPress and WooCommerce sites, the faster wins come from cleaner URL inventory, better internal links, corrected canonicals, and fixing rendering or resource failures.

Sources

Know someone who would benefit from this update? Share this article with them.

This article is for informational purposes only and reflects general marketing, technology, website, and small-business guidance. Platform features, policies, search behavior, pricing, and security conditions can change. Verify current requirements with the relevant platform, provider, or professional advisor before acting. Nothing in this article should be treated as legal, tax, financial, cybersecurity, or other professional advice.