How to Use Google Search Console’s Crawl Stats and Page Indexing Reports to Fix Technical SEO Issues in WordPress

If important pages aren’t indexed, they don’t rank. If Googlebot wastes time crawling junk URLs, your best content gets crawled less often. Google Search Console’s Page Indexing and Crawl Stats reports are the fastest way to see both problems clearly—and fix them in WordPress before they cost you traffic and leads.

Google’s Page Indexing report documents why URLs are indexed or not indexed, including statuses such as Crawled – currently not indexed, Discovered – currently not indexed, Duplicate without user-selected canonical, Soft 404, and Blocked by robots.txt (developers.google.com/search/docs/crawling-indexing/page-indexing-report). The Crawl Stats report shows how Googlebot actually interacts with your site: total crawl requests, response codes, file types, and host status (developers.google.com/search/docs/crawling-indexing/crawl-stats-report).

For small businesses running WordPress or WooCommerce, these two reports connect directly to business outcomes:

Lost impressions because service pages are not indexed.
Wasted content spend when blog posts never move past “Crawled – currently not indexed.”
Lead leakage when key landing pages return soft 404s.
Higher hosting costs and slower growth when crawl demand is misallocated.

Start with the Page Indexing Report: What’s Not Indexed and Why

Open Search Console → Indexing → Pages. Focus on the “Why pages aren’t indexed” section. Don’t panic about volume. Prioritize URLs that should drive revenue: core services, product categories, high-margin products, location pages.

1. Crawled – Currently Not Indexed

Google confirms the page was crawled but not indexed. The documentation states this does not necessarily indicate an error, but it does mean Google chose not to include the page in the index at that time.

Common WordPress causes:

Thin or near-duplicate service pages across multiple cities.
Auto-generated WooCommerce tag archives.
Low internal linking to the page.

Fix:

Improve uniqueness and depth. Add FAQs, examples, and local proof.
Strengthen internal linking from relevant posts and category pages.
Consolidate weak variations into a stronger canonical page.

Business impact: If these are revenue pages, you are effectively running paid ads and social campaigns to a page Google is ignoring.

2. Discovered – Currently Not Indexed

Google knows the URL exists but hasn’t crawled it yet. This can relate to crawl prioritization and site quality signals.

Common causes:

Large XML sitemaps with low-value URLs.
Parameter-heavy WooCommerce URLs.
Weak internal link structure.

Review your sitemap. In WordPress SEO plugins, remove tag archives, author archives, and filtered URLs unless they serve a clear search intent. A sitemap should represent pages you actually want indexed.

3. Duplicate Without User-Selected Canonical

Google’s canonicalization guidance explains how it selects a canonical URL based on signals like rel=canonical, redirects, and internal links (developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls).

WooCommerce often generates duplicates via:

Product sorting and filter parameters.
HTTP vs. HTTPS inconsistencies.
Trailing slash vs. non-trailing slash conflicts.

Fix:

Ensure rel=canonical points to the primary URL.
Standardize internal links to one format.
Use 301 redirects for legacy variants.

Implementation caution: Over-aggressive canonicalization can collapse legitimate location or product variants. Always confirm that canonical targets match true intent and do not suppress unique revenue pages.

4. Soft 404

A soft 404 occurs when a page returns a 200 status code but appears empty or “not found.”

Common WordPress examples:

Expired WooCommerce products left live but stripped of content.
Custom templates that output minimal content when fields are empty.

Fix:

Return a proper 404 or 410 for discontinued items.
Redirect to the closest relevant category when appropriate.

This protects crawl efficiency and prevents Google from wasting time on dead inventory.

5. Blocked by robots.txt

Google’s robots.txt documentation explains how directives control crawling (developers.google.com/search/docs/crawling-indexing/robots/intro). Blocking a URL prevents crawling, which can prevent indexing if Google cannot access content.

Common mistakes:

Blocking /wp-content/ and unintentionally preventing CSS/JS access.
Blocking /shop/ or /services/ during staging and forgetting to remove it.

Tradeoff: Blocking low-value filters can reduce crawl waste, but blocking core content eliminates index eligibility. Use robots.txt surgically.

Use the Crawl Stats Report to Find Server and Crawl Waste Issues

The Crawl Stats report shows:

Total crawl requests.
By response code (200, 301, 404, 500).
By file type (HTML, CSS, JS, images).
Host status (availability and response performance).

This data directly affects crawl demand and site health.

High 404 or 301 Volume

If a large percentage of crawl requests hit 404s or redirect chains, you are wasting crawl capacity. Clean up:

Old campaign URLs.
Broken internal links.
Legacy product URLs after migrations.

In cPanel or server logs, confirm that redirect chains resolve in one hop. Multiple 302s or chained 301s increase latency and crawl inefficiency.

Server Errors (5xx) or Host Status Issues

If host status shows failures, Google may reduce crawl rate. That slows indexing and updates.

Core Web Vitals documentation on web.dev explains how performance metrics like Largest Contentful Paint and responsiveness relate to user experience. While not a direct crawl metric, slow server response time can reduce crawl efficiency and increase abandonment risk.

Common causes in WordPress:

Overloaded shared hosting.
Uncached dynamic WooCommerce queries.
Bloated plugins generating heavy PHP execution.

Fixes:

Enable full-page caching for non-cart pages.
Use object caching (Redis or Memcached).
Upgrade hosting if TTFB is consistently high.

Maintenance consideration: Aggressive caching can break cart, checkout, and membership logic. Always exclude dynamic endpoints and test logged-in user flows.

Connect Reports to Real Business Metrics

Don’t treat these reports as academic diagnostics.

If a service page is “Crawled – currently not indexed,” check impressions in Search Console’s Performance report. Zero impressions equals zero pipeline from organic search.
If Crawl Stats show heavy image crawling, confirm images are optimized and not duplicated across parameters.
If server errors spike during peak ad campaigns, you may be paying for traffic that Google struggles to crawl later.

Search Engine Land frequently documents how indexing and crawl inefficiencies affect real businesses during site migrations and large-scale updates. The pattern is consistent: technical precision accelerates recovery and growth; neglect delays both.

What to do next

Export non-indexed URLs from the Page Indexing report and label them by business importance (high, medium, low).
Fix revenue-critical pages first: noindex conflicts, canonical mismatches, thin content.
Audit your XML sitemap and remove junk archives or parameter URLs.
Review Crawl Stats for 30–90 days and identify trends in 404s, 301s, and 5xx errors.
Check server logs in cPanel or via SSH to confirm Googlebot behavior matches Search Console trends.
Validate fixes inside Search Console and monitor changes over several weeks.

If this process feels overwhelming, it’s usually because the issues span SEO, development, and hosting. That’s normal. At Doyjo, we handle these audits routinely for WordPress and WooCommerce teams that need crawl efficiency, clean canonical signals, and stable hosting aligned with business growth.

Search visibility in 2026 rewards technical clarity. The Page Indexing and Crawl Stats reports give you the roadmap. The upside isn’t theoretical—it’s measurable in impressions, qualified traffic, and reduced operational drag.

Sources

For Web Development, E-Commerce Development, SEO & Internet Marketing Services and Consultation, visit https://doyjo.com/

This article is for informational purposes only and reflects general marketing, technology, website, and small-business guidance. Platform features, policies, search behavior, pricing, and security conditions can change. Verify current requirements with the relevant platform, provider, or professional advisor before acting. Nothing in this article should be treated as legal, tax, financial, cybersecurity, or other professional advice.

Start with the Page Indexing Report: What’s Not Indexed and Why

1. Crawled – Currently Not Indexed

2. Discovered – Currently Not Indexed

3. Duplicate Without User-Selected Canonical

4. Soft 404

5. Blocked by robots.txt

Use the Crawl Stats Report to Find Server and Crawl Waste Issues

High 404 or 301 Volume

Server Errors (5xx) or Host Status Issues

Connect Reports to Real Business Metrics

What to do next

Sources

Related Posts