Google just can't get it up

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I run a website with about 4 million pages ( ).
Although the Google spiders are very active and have been pulling
100,000+ pages per day for the last 3 months, few pages show up on
Google.  See .  Google
indexing of this site essentially collapsed in January 2005 when the
number of pages was increased from about 1 million to 4 million.

AskJeeves, on the other hand, indexes 95% of the site.

My current working hypothesis as to why these pages don't show up on
Google centers on Google's repetitive pulling of pages to test
stability and refresh its indexes.  Suppose Google has to be able to
pull the same page twice over a two week period before it posts to the
index.  Suppose also that Google has a maximum pull rate per site.
Also, suppose that Google expires pages after a month.  With more than
4 million pages, Google cannot do repeat pulls fast enough to keep the
pulled pages in the index.

Does this make sense to anyone intimately familiar with Google
indexing?  If this hypothesis is correct, is there a way to get Google
to ease the repeatability requirements?

Re: Google just can't get it up

I dont think it is releated to number of link but the depth of the link
and the strange subject of your site + bad directory structure, all
files located in the main directory + band link titles that do not mean
anything really.

Less batching up and more real information on the page would do a
better trick.

such as showing the last 100 new items on the front page.

Re: Google just can't get it up

To test this, I have throttled Googlebot back to about 1/3 of the 4
million pages and may throttle it back even further, even though
Google's indexing of Chinese content is starting to rise above the
baseline.   See .

If the hypothesis is correct, Google coverage should rise in a few

Quoted text here. Click to load it
and breadth of the site.  It just doesn't add these pages to its
indexes.  In contrast, AskJeeves also spiders the full depth and
breadth of the site and does add the pages to its indexes.  The key
difference, according to this hypothesis, is that AskJeeves works on a
two to three month refresh cycle and uses a different verification

Re: Google just can't get it up

The directory structure is a fiction.  The site is almost totally

Site Timeline