Table of Contents
You open the Page Indexing report in Google Search Console, scroll the "Why pages aren't indexed" table, and find a row labeled "Soft 404" with a four- or five-digit count next to it. You click in, expecting to see a list of error pages, and instead Google hands you a list of URLs that load perfectly fine in a browser. They return 200 OK. They have content. Some of them used to rank.
The Help Center entry explains, in two sentences, that Google has concluded the page does not exist even though the server says it does. It does not tell you why Google reached that conclusion, what part of the page triggered it, or which of your URLs you should fight to recover and which you should let go.
This guide is the long version of the answer. It walks through what the soft 404 status actually means, the four real patterns that cause Google to apply it to a working URL, a diagnostic flow that tells you which pattern applies to a given URL in under fifteen minutes, the fix path for each pattern, and the underrated case where the correct response is to let the soft 404 stand and convert the page to a hard 404 or 410. It also covers the bulk-triage workflow that scales when you have hundreds or thousands of URLs in this bucket and the native UI starts to fall over.
What "Soft 404" Means When Google Says It
A hard 404 is what most engineers think of as a 404. The server returns HTTP status 404 (or 410) on a request, the browser renders an error page, and Google removes the URL from the index. There is no ambiguity. The URL is gone, both sides agree.
A soft 404 is what Google records when those two sides disagree. The server returns 200 OK on a request, the browser renders what looks like a real page, and Google's classifier still concludes that no real content lives at that URL. The classifier reads the rendered HTML, compares it to patterns it has learned from millions of error pages across the web, and decides that whatever is at this URL is functionally an error page even though the response code says otherwise.
Once a URL is flagged as a soft 404, three things happen. The URL is excluded from the index, so it cannot rank. Any link equity flowing into it is functionally orphaned, because Google does not consolidate ranking signals into pages it considers error states. And the URL persists in the report indefinitely until either you fix it, you actually return a 4xx, or Google reverses the classification on a future crawl.
The third point is the one that catches teams off guard. A soft 404 is sticky. You can fix the underlying issue today and the URL may sit in the bucket for weeks before Google re-crawls, re-classifies, and removes it. The remediation is judged on a 60 to 90 day cycle, not on an overnight cycle.
This is also different from a few similar-looking statuses in the same report. "Not found (404)" is what Google calls a normal hard 404, where the server returns the right code. "Excluded by 'noindex' tag" is a deliberate exclusion that you triggered with a meta tag. "Crawled, currently not indexed" is a separate quality-classifier verdict that means the page passed the does-it-exist check but failed the is-it-worth-indexing check. If your URL is in Crawled, currently not indexed instead, the playbook in that guide applies. If it is in soft 404, this one does.
Why Google Flags a Working Page as a Soft 404
There are four real causes. Almost every URL in the bucket fits one of them. The diagnostic flow downstream of this section is built to let you tell them apart in fifteen minutes per URL or thirty minutes per cluster.
Cause 1: The Page Looks Like an Error Page to the Classifier
This is the most common cause and the one site owners are most reluctant to accept. Google's classifier learned what an error page looks like from millions of real error pages. The signals it weighs include the word "404" or "not found" or "page not available" in the rendered HTML, an unusually short body, a prominent search box or "go back to homepage" link, a stripped layout with no navigation, and the absence of structured content blocks that the rest of the site has.
Custom 404 pages are the textbook offender. A site designs a friendly 404 page with the message "Oops, the page you're looking for doesn't exist" and a search box, deploys it, and then misconfigures the server to return 200 on that page instead of 404. Every URL that hits the friendly page now returns 200 with content that screams "I am an error page" to the classifier. The classifier obliges.
Empty category pages and empty search-result pages produce the same outcome. A product category that has no products left renders a layout with the phrase "no items found" and a 200 status. A site search results page for a query that returned no hits renders "no results for your search" and a 200. The classifier reads these as error states even though the templates are technically functioning correctly.
The fix is one of two things, depending on whether you want the URL to exist or not. If the URL should exist (because it has content that just is not loading), you fix the content. If the URL should not exist, you change the server response from 200 to 404 or 410 and let the classifier stop guessing. See the section on "the case for actually returning a 404" below.
Cause 2: The Main Content Failed to Render
Google rendered the page, the rendered HTML had little or no main content, and the classifier decided that an empty page is functionally a missing page. This is the JavaScript-rendering trap.
A single-page application loads a shell of HTML that contains the navigation, the footer, and an empty main element. JavaScript then fetches the article body, the product details, or the listing items and injects them into the main element. If the JavaScript fails to execute, fetches a 404 from its data API, races against the renderer's timeout, or depends on a third-party script that Googlebot does not run, the rendered HTML stays empty. The server returned 200, the shell looks fine, but the main content slot is blank.
The classifier reads the rendered DOM. If the rendered DOM has no substantive content in the main content area, Google treats the URL as an empty page and flags it as a soft 404. Server-rendered sites have the same problem when a template depends on a database call that returned no rows and the template silently degrades to empty markup instead of returning an error.
The fix is to verify the rendered HTML actually contains the content. Run the URL through the URL Inspection tool, click "Test live URL", click "View tested page", and switch to the "HTML" tab. The HTML you see there is what Google's renderer produced. If the main content is missing or truncated, that is your problem. Either move the rendering server-side (Next.js or Remix on a React stack, Nuxt on a Vue stack, ASP.NET on a .NET stack), pre-render the page at build time, or make the JavaScript more defensive so a slow API call returns content within the renderer's budget.
Cause 3: The Page Is Thin or Has No Indexable Body
Thin content gets misclassified as soft 404 more often than people expect. The classifier's job is to separate pages that have substance from pages that do not. If your page is technically successful, has 200 OK, has rendered content, but the rendered content is a fifty-word product description with no images and no review snippets, the classifier reads it as functionally indistinguishable from an empty page.
Doorway pages, programmatic SEO pages with the same boilerplate text and only the city name swapped, archive pages with a date header and three links, and old blog tag archives with the tag name and a single post all fall into this trap. They are not error pages, but they have so little unique content that the classifier scores them at error-page-equivalent.
The fix is not a server change, it is a content change. Either flesh out the page so it has substantive unique content, or roll it up into a parent page and 301 redirect. The programmatic SEO case in particular benefits from this: a thousand near-duplicate city pages with two paragraphs each will frequently fall into soft 404 and stay there, where a hundred meatier city pages with three hundred words of unique data each will index. See the GSC content decay playbook for the broader retention pattern.
Cause 4: The Page Returns the Wrong Content Because of a Routing or Parameter Bug
This is the least obvious cause and the one that most often produces "what is going on" tickets to engineering. The page renders something, but what it renders does not match what the URL implies. The classifier reads the mismatch as an error state and flags it.
Common cases. A product page URL renders the parent category's content because the product slug failed to resolve. A localized URL like /fr/article-name falls back to the English version with no French content, and the rendered page is a thin shell that mostly says "no translation available". A pagination URL renders page 1 instead of the requested page because the parameter parser failed silently. A search results page renders an empty results template even though the query had matches, because a database connection timed out and the catch block degraded to the empty template.
Google's classifier sees a URL that promises one thing and renders another, with markers that look like an error fallback, and flags it. The fix is to chase the routing or data bug and confirm that each URL renders the content the URL implies. This is the cause that benefits most from a real fetch-and-render audit using the URL Inspection tool rather than a code review.
The Fifteen-Minute Per-URL Diagnostic Flow
When a single URL appears in soft 404 and you want to know which cause applies, run this flow. It is faster than guessing and it gives you a clear answer.
Step one, two minutes. Open the URL in a private browser window logged out of everything. Confirm it returns 200, renders content, and does not redirect. If it redirects, the URL is not actually returning 200, and the soft 404 is downstream of redirect chain confusion (see Cause 4).
Step two, three minutes. Run curl -I https://example.com/your-url and confirm the response code is 200 and there are no X-Robots-Tag: noindex or similar response headers. Run curl -s https://example.com/your-url | head -200 and look at the raw HTML the server returns. If the main content is missing from the raw HTML, your problem is server-side rendering and Cause 2 is in play even on a non-JavaScript page.
Step three, five minutes. Open the URL in GSC's URL Inspection tool. Click "Test live URL". Click "View tested page" and switch to the HTML tab. This is the rendered HTML Google sees. Search the HTML for the main article text or product description that should be there. If it is missing, you have a rendering problem (Cause 2). If it is present, you do not.
Step four, two minutes. Read the rendered HTML for error-page markers. Search for "not found", "no results", "page not available", "page does not exist", "404", and similar strings. If any of those appear prominently in the body (not just in metadata or footer disclaimers), you are getting flagged by the content classifier (Cause 1).
Step five, three minutes. Compare the rendered content length to other URLs of the same type on your site that are successfully indexed. A product page that has fifty words of body when sibling product pages have three hundred is a Cause 3 thin-content flag. A category page with three products when sibling categories have thirty is the same.
If after these five steps you cannot tell which cause applies, the cause is almost always Cause 4 in disguise, where the rendered content is fine but does not match the URL's promise. Read the URL slug, read the rendered H1, and ask whether they are coherent. If the URL slug says "blue widgets" and the H1 says "widgets", you have a routing or fallback bug.
Bulk Triage When You Have a Thousand URLs in the Bucket
The per-URL flow scales to about ten URLs an hour. When the bucket has a thousand or ten thousand entries, you need a different approach. The bucket usually decomposes into a few large clusters that share a cause, and once you identify the cause for one cluster, you can fix the cluster at the template level rather than the URL level.
The first move is to export the URL list. From the soft 404 row in the Page Indexing report, click "Export" and pull the list to a spreadsheet. The native export caps at 1,000 URLs. If the bucket is larger, the more reliable export path is the GSC API using the URL inspection batch endpoint or the BigQuery bulk export if you have it set up. The BigQuery option matters here because the native UI's 1000-row export cap is a hard wall on diagnostics at this scale.
Once you have the list, cluster it by URL pattern. The simplest version of this is to extract the path prefix or the URL template and group. URLs under /product/, URLs under /category/, URLs under /search?q=, URLs under /blog/tag/. The cluster sizes tell you where the leverage is. A bucket of 5,000 soft 404 URLs that decomposes into 4,200 under /search?q= and 800 scattered elsewhere is really a single bug in the search results template plus a long tail.
Pick the largest cluster, run the per-URL diagnostic on three random URLs from it, and confirm they share a cause. They almost always do. Fix the template-level cause. Then run the diagnostic on the next cluster.
In our audit work, we have found that the four causes are not evenly distributed across the bucket. Cause 1 (looks like an error page) typically accounts for 35 to 45 percent of bucket entries, Cause 2 (rendering failure) accounts for 15 to 25 percent, Cause 3 (thin content) accounts for 25 to 35 percent, and Cause 4 (routing bug) accounts for 5 to 15 percent. Your distribution will vary based on your stack, but the takeaway is that the top two causes usually represent more than half the bucket and fixing them gives you the biggest single recovery.
This is also the workflow where third-party tools have a real advantage over the native UI. Search Console Tools was built partly because the Page Indexing report's filtering and grouping are not designed for bucket sizes above a few hundred. If you are repeatedly running this triage cycle, the time savings on the export, the pattern grouping, and the rendered-HTML diff add up fast.
The Case for Actually Returning a 404
The instinct on a soft 404 is to fix the content so the URL gets indexed. This is sometimes the wrong instinct.
If the URL should not exist, the right fix is to make the server return a real 404 or 410. A real 4xx response removes the URL from the index cleanly, removes it from the soft 404 bucket, and stops Google from spending crawl budget on a page that will never be useful. It also stops the URL from appearing in user-facing internal reports as a broken URL that someone might later try to fix.
Cases where a real 404 is the right outcome.
Discontinued products with no replacement. A retailer who removes a SKU that has no related product to redirect to is better off returning 410 (gone) than serving a "this product is no longer available" page that returns 200 and gets soft-404 flagged. The 410 tells Google to forget the URL. The 200 leaves it in the bucket forever.
Old promotional URLs. A landing page for a promotion that ran two years ago, gets no traffic, has no relevance now, and has no current page to consolidate into should return 410 or be removed from the URL space entirely. A blog post archive page for a tag that no longer has any posts should return 410, not a "no posts here yet" page.
Test URLs and staging leaks. URLs that were never meant for production crawl, that leaked through the sitemap or a misconfigured robots.txt, that have no content because they were never supposed to have content. These should return 404. Make a server-level rule that strips the leak path and return a proper code.
User-generated content that was deleted. A forum thread that was removed, a user profile that was deleted, a comment that was moderated out. The page is gone and Google's classifier is correctly noticing that. Return a 410 to confirm and the URL leaves the bucket within a crawl cycle.
The practical rule. Ask whether the URL has a future. If the answer is no, return a 4xx and stop arguing with the classifier. If the answer is yes, fix the content and let the classifier re-evaluate.
Special Case: When the Soft 404 Is on a URL That Used to Rank
This one stings. A page that used to drive organic traffic falls into soft 404 and disappears from the SERPs overnight. The instinct is panic and a hurried content patch.
Slow down. Run the per-URL diagnostic first. In our audit experience, soft 404 on a previously-ranking URL is almost always Cause 2 or Cause 4, not Cause 1 or Cause 3. The page was ranking with sufficient content, which means the content has not suddenly become thin. What has usually changed is either the rendering pipeline (a frontend deploy, a CDN change, a JavaScript bundle update) or the routing (a CMS migration, a slug change, a parameter handler update) such that the URL no longer renders what it used to render.
Step one is to compare the current rendered HTML with an archived version. The Wayback Machine snapshot from a month before the soft 404 first appeared tells you what the page used to look like. Diff it against the current rendered HTML from the URL Inspection tool. The diff tells you whether the content is missing or wrong, and that points at the cause.
If you cannot find the cause from the diff, check whether the URL was affected by a recent deploy. Most rendering and routing regressions are deploy-correlated. The CHANGELOG of the frontend repo for the two weeks before the soft 404 appeared in GSC is a much more efficient place to look than the page itself.
Once the underlying issue is fixed, request indexing via URL Inspection. This is one of the cases where the manual reindex request is genuinely helpful, because it short-circuits the normal recrawl cycle. The page is unlikely to recover its old ranking immediately, but the soft 404 status will clear within a recrawl, and the historical ranking signals are still attached to the URL.
What Not to Do
A few patterns we see in soft 404 remediation that do not work.
Adding more text to a thin page that is already substantive. Cause 3 is real but it is not "the page has fewer than 1,000 words". A 400-word product page with strong unique content can index. Padding it with boilerplate to hit a word count target makes the page worse and does not move the classifier.
Setting noindex on a soft 404. The bucket is already excluded from the index. Adding noindex does not remove it from the bucket and it does not help. Use noindex when you want a page indexed elsewhere to stop being indexed, not as a soft 404 mitigation.
Redirecting every soft 404 URL to the homepage. Google treats a redirect from a URL with no related content to the homepage as a soft 404 itself, just with the redirect attached. The classifier sees through it. Use redirects only when there is a genuinely related target page, and use 410 for everything else.
Submitting the soft 404 URLs in a sitemap to "tell Google they exist". The sitemap is a discovery aid, not a force-index lever. The URL is already discovered, that is why it is in the report. Submitting it again does not change the classification.
Trying to fix the soft 404 by setting a canonical to a working page. The classifier evaluates the URL itself, not just the canonical destination. If the URL is rendering an empty shell or an error-looking page, the canonical tag will not save it. Fix the rendering or return a 4xx.
A Two-Week Action Plan for Sites With a Soft 404 Bucket
Day 1 to 2. Export the full soft 404 list. Cluster by URL pattern. Identify the top three clusters by URL count.
Day 3 to 4. Run the per-URL diagnostic on three URLs from each top cluster. Confirm the cause for each cluster.
Day 5 to 7. Fix the cluster-level cause for cluster one. This is usually a template change, a routing fix, or a content rewrite. Deploy the fix.
Day 8 to 10. Fix the cluster-level cause for clusters two and three.
Day 11. Use the URL Inspection tool to request indexing on five representative URLs per cluster. This is a manual trigger that confirms your fix is rendering correctly before you ask Google to recrawl the rest.
Day 12. Audit the long tail of the bucket. URLs that do not fall into the top three clusters. Decide for each whether the right answer is a content fix or a 410.
Day 13 to 14. Deploy 410 responses for the long-tail URLs you decided to remove. Submit a new sitemap. Wait.
Week 4 to 8. The bucket size shrinks as Google recrawls. Most sites see 60 to 80 percent of the bucket clear in 30 days if the cluster fixes were correct. The long tail of stubborn URLs is usually URLs that are still hitting an edge case the cluster fix did not cover, and they are worth a second round of individual diagnostic.
The full recovery cycle is rarely under 30 days and rarely over 90 days. If the bucket is not shrinking after 30 days, the cluster fix did not actually fix the cause and you need to run the diagnostic again on URLs that should have been fixed.
Frequently Asked Questions
What is the difference between a soft 404 and a hard 404 in Google Search Console?
A hard 404 is a page where the server returns HTTP status 404 (or 410) on request. The response code itself tells Google the page does not exist. A soft 404 is a page where the server returns 200 OK but Google's content classifier reads the rendered page as functionally an error page anyway. From an indexing standpoint, both are excluded from the index, but a soft 404 stays in the GSC bucket until Google reverses the classification or you change the server response, where a hard 404 is removed once Google has confirmed it.
Why does Google flag a 200-OK page as a soft 404?
The four common causes are: the rendered content looks like a custom error page (with phrases like "not found" or a prominent search box), the JavaScript that loads the main content failed to execute so the rendered HTML is mostly empty, the content is so thin that the classifier scores it at error-page-equivalent, or the URL is rendering content that does not match what the URL implies (a routing or parameter bug). Almost every soft 404 fits one of these four patterns.
Will Google reindex a soft 404 URL once I fix it?
Yes, but on Google's recrawl schedule, not yours. The URL will sit in the soft 404 bucket until Google recrawls and reclassifies it. For high-priority URLs, you can speed this up by using the URL Inspection tool to request indexing manually after the fix is live, which forces a recrawl within hours or days. For the rest of the bucket, the normal recrawl cycle takes 14 to 60 days depending on the URL's historical crawl rate.
Should I redirect soft 404 URLs to the homepage?
Almost never. Google treats a redirect from an irrelevant URL to the homepage as a soft 404 itself, and the URL stays in the bucket with the redirect attached. Use redirects only when there is a genuinely related target page that the user would have wanted next. For URLs with no relevant target, return a 410 status to remove the URL from the index cleanly.
Does a soft 404 hurt the rest of my site?
Indirectly. Soft 404 URLs consume crawl budget that could have gone to working URLs, and large soft 404 buckets are correlated with broader quality signals that affect site-wide ranking. A handful of soft 404 entries on a site with thousands of healthy URLs is noise. A soft 404 bucket that is 20 percent of your URL count is a signal to triage immediately. The fix path is the same either way: identify the causes, fix the largest clusters first, and use 410 for the URLs that should not exist.
Can I noindex a soft 404 URL instead of fixing it?
You can, but it does not help. The URL is already excluded from the index by the soft 404 classification. Adding noindex does not remove it from the bucket and it does not affect anything else. If the URL should not exist at all, return a 410. If the URL should exist but should not rank, fix the content first and then add noindex if you actually want it out of the index.
Why does my custom 404 page show up as a soft 404?
Because it is returning 200 OK instead of 404. A friendly custom error page that says "Sorry, the page you're looking for doesn't exist" is still a 200 OK response if your server is configured to serve it without setting the status code. The classifier reads the friendly error message and flags the URL. The fix is to configure your server to return status 404 when serving the custom 404 page. Most frameworks have a built-in way to do this; the fix is one config line, not a content change.
How long does it take for a soft 404 to clear after I fix it?
Anywhere from 24 hours (if you manually request indexing for the URL) to 90 days (for URLs Google does not recrawl often). The median in our audit work is around 21 days for the bulk of a fixed cluster to clear, with a long tail of stubborn URLs that takes 60 to 90 days. If the cluster has not started shrinking after 30 days, the fix did not actually address the cause and you need to re-diagnose.
Can structured data fix a soft 404?
Sometimes, but it is a secondary lever. Adding Product or Article structured data to a thin page does not give the classifier substantive content to score; it gives a stronger signal about what kind of page the URL is meant to be. If the rendered content is genuinely substantive but is being misread by the classifier, structured data can tip the scoring. If the rendered content is genuinely thin or genuinely broken, structured data does not help and may make the mismatch between markup and content more obvious.
Should I use the URL Removals Tool on a soft 404?
No. The Removals Tool is a temporary suppression (about six months), not a real removal, and it does not change the underlying classification. Once the suppression expires, the URL returns to the same soft 404 state. Use the Removals Tool only for emergency suppression of URLs that have ranking but should not (a leaked draft, a private URL that got indexed). For genuine cleanup, return a 410, fix the content, or both.
The Short Version
Soft 404 is Google's verdict that a 200-OK URL is functionally an error page. The four causes are: looks-like-an-error-page, rendering failure, thin content, and routing or parameter bug. The per-URL diagnostic takes fifteen minutes. The bulk triage clusters the bucket by URL pattern and fixes at the template level. The right response for some URLs is to return a real 410 instead of fighting the classification. Recovery cycles run 30 to 90 days.
If you have a soft 404 bucket and have not run the diagnostic on a sampled cluster yet, that is the move. Start with the largest cluster, run the five-step flow on three URLs, find the cause, fix it at the template level, and watch the bucket shrink. The diagnostic is the leverage. Everything downstream is mechanical.
Want to make the diagnostic faster across your full site? Search Console Tools imports your full GSC data, including the soft 404 bucket, runs the cluster analysis automatically, and flags the rendered-HTML and content-thinness signals so you do not have to inspect URLs one at a time. The full audit usually surfaces the top three clusters in the first ten minutes.
Run a Free AI Citation Audit
Are you in the AI Overview? Get a free report showing how often ChatGPT, Claude, and Gemini cite your brand, plus the 3 blockers preventing your discovery in 2026.
No spam. 1-click unsubscribe. Join 1,200+ SEO teams managing the GEO pivot.