How Long Does it Take for Google to Drop 404 Pages?

Newman

Active member
If your website is under a Google penalty or is ranking poorly and you suspect it's because you're site has got index bloat or tons of low quality pages that Google has either indexed or knows about, the removal of those pages can't come fast enough. Unfortunately, these days, it seems to take Google forever to make a change to their index. I suspect it's either because there are simply too many web pages out there on the internet or they're trying to make it more and more difficult for us to game their system. Back in the early days, we were able to make a change to our websites and then wait just a few days to see if it had any effect. Now, it seems like we've got to wait six months to a year or more. So basically, it's just not worth it to try to "see what works" anymore. We've got to learn what makes a quality website quality and just keep our sites up to snuff. There's no way around that these days.

Here's the scenario: you recently found out that a low quality directory on your website has been crawled by Google. The pages in this directory are all thin, duplicate, or system generated pages and none of them need to be in Google's index. All 20,000 of them have been crawled and they've all got a noindex meta tag on them. Unfortunately, that tag makes no difference because after the crawling of these pages, your rankings began to drop. You suspect that's because the pages all have very low pagerank and Google is counting them towards your overall website quality score. Not a good thing. You've been able to set this entire directory and all the pages inside of it so they return 404 status codes. The question is, how long will it take for Google to drop these pages from its index?

That is the question indeed. I've been searching for the answer for some time and all I've come across are insanely absurd pieces of advice from so-called "SEO consultants." Pretty much everyone knows that the pages have got to be removed from the index, but the strategies to get them out differ. Everyone wants it done fast, but the reality of it is that most likely won't happen. You see, once you cut the pagerank flow from a huge group of pages in a directory, Google no longer sees these pages as valuable. Therefore, it won't crawl them nearly as much, leaving them in limbo for up to a year or more. I've had low quality pages on some of my websites sit in Google's index for over three years. Yes, three years without being crawled! If that doesn't freak you out, nothing will.

Some folks say that you should keep the paths open, but use 301 redirects to tell Google that the old junk page is now part of a good page. They say that, in this case, you should redirect all 20,000 pages to one single page and that will solve your problems. Does that sound right to you? I didn't think so. The sad truth of the matter is that once you set your thin, duplicate, or system generated pages to show a 404 or a 410 status code, you should forget all about them and move onto something else in your life. By you trying to block them in robots.txt or set them so they forward to another page via a 301 redirect, you're merely prolonging the problem.

So to answer the question in the title of this thread, how long does it take Google to drop a 404 page? Well, if it's an important top level page on your site, probably a few days. If it's an interior subcategory page that gets crawled every few weeks, then a few weeks. But if it's a page that was generated among thousands and thousands of other pages because of some system malfunction or website hacking attempt, the unfortunate truth of the matter is that it can take up to a year or so. And my advice to you would be to just get rid of the pages and never look back. Yes, your website rank will drop, but as you work on your site through the months to come, those pages will fall out and drop from Google one by one and then one day, you'll wake up to find that your site is roaring with great rankings like it used to.
 

WendyMay

Active member
My question is, does Google ever drop pages that don't return a 404 error code? I've read that if you have a bunch of pages that you wold like to remove, you should put them in a separate sitemap and then submit that sitemap to Google. Or, you should link to the pages and then Google will crawl them faster to find the 404 status code. I've found both of these suggestions to be completely outlandish. Who makes this stuff up? When Google crawls a certain number of pages in /xx/ directory and those pages return "not found" or "gone" header status codes, it learns that the pages are not there anymore. Its crawling of them slows down substantially, so no amount of stuffing the pages into a sitemap or linking to them will make any difference.

In my case, I moved my entire site to a new directory. Actually, my previous CMS provider went out of business, so I had to install new software in a different directory and 301 redirect my homepage and category pages to the new pages. All other pages, of which numbered in the tens of thousands, were deleted. I've been waiting just about a year now for these pages which are returning 410 HTTP response status code to removed. They're disappearing slowly, but obviously not fast enough for me. And probably for anyone in my shoes.

I've also seen recommendations to use 301 redirects to redirect all of these pages to live ones. That's crazy because, like I said, there are tens of thousands of them. I can't set up any sort of system to redirect them all. I've also seen suggestions to add noindex to the old pages and keep them live. Yeah right. That's just making things worse and they'll never disappear. And finally, I've seen suggestions to add canonical link elements to the old dead pages. That's the worst one of all. I can't even believe anyone would suggest that. And yes, these were all on an SEO consultancy firm's page.

Back to my initial question - does Google ever just drop pages that have such a low pagerank and that aren't linked to anymore? I've got lots of them, so I'm curious about this.
 

Newman

Active member
Back to my initial question - does Google ever just drop pages that have such a low pagerank and that aren't linked to anymore? I've got lots of them, so I'm curious about this.
I think they do. I swear I've read this years ago, but I have no idea where. I think maybe Matt Cutts or John Mueller said something to that effect. Basically, if a page is so weak, it'll get discarded eventually. I don't think you have anything to worry about in your case. All of those pages are getting weaker and weaker by the day and while they may bring your overall site ranking and quality score down in the short term, once they get cleared out, you should see a strong rebound in your search results rankings. And since you forwarded your good, strongest, and most important pages to the new site, you should be in good shape. It'll just seem like forever for Google to drop the bad pages.
 

WendyMay

Active member
It'll just seem like forever for Google to drop the bad pages.
Thank you for the reply. It has already seemed like forever, but the real problem is now that Google has been crawling so many 404 pages, it thinks something is wrong with my site and has lowered its crawl rate. I have looked at my log files and stats and it's not crawling nearly as many pages as it used to. This may be because of the 404 error pages or the 301 redirects. Or, it's not valuing the new site like it used to value the old site. Who knows.
 

Newman

Active member
I just found this on a site somewhere. It's John Mueller talking about how Google treats crawling after a major site change or server move has taken place:

Here’s what happens when Google detects a server change:

- Google’s systems will throttle back.
- Google will make sure the new server can cope with the extra work of Googlebot crawling.
- Googlebot’s crawl speed will eventually increase again at a rate that works best for the server.


I know that you're not dealing with a server migration, but more of a site redesign with different URLs, but I think what I shared above still applies. This is what JohnMu said on Reddit in 2018 about a website redesign and changing the URL structure:

The bigger effect will be from changing a lot of URLs (all pages in those folders) - that always takes time to be reprocessed. I'd avoid changing URLs unless you have a really good reason to do so, and you're sure that they'll remain like that in the long run.

So yeah, it's got a lot to sort out. If everything is set correctly and you've got your 404s and 301s in place, just wait it out. Hopefully there will be brighter days ahead.
 
Top