Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- Google speed of spidering and indexing
Re: Google speed of spidering and indexing
And another update since a weeks gone by since registering the domain.
The site http://www.charles-dickens.org has approx. 7350 spiderable
The original 59 pages that were indexed were dropped quite quickly,
leaving only the home page (with a cache this time). That was
Now the 59 pages and a few more are back.
Googlebots been very busy, since the site went live approx. 1 week ago
the sites had 1588 hits from Googlebot (the one that's responsible for
organic listings, not Google ads).
An allinurl:charles-dickens.org gives 1,550 results (fluctuates
between 1400 and the above)
By clicking the "repeat the search with the omitted results included."
link it still report 1,550 results, but only goes as far as 200 pages.
I've seen this a lot lately with my sites and suggests the 1,550
results number is how many different pages Google has spidered from
the domain, but hasn't necessarily added.
The number of results actually shown (200 in this case) is the number
of pages in the database at that time.
Another domain with 12,000 pages, the first pages (130) were added on
the 7th May, few hundred more on the 23rd and the rest on the 26th
May. Currently an allinurl: search shows between 4500 and 5500 pages
spidered with almost 600 pages displayed.
One of my oldest literature sub domains is
http://sherlock-holmes.classic-literature.co.uk/ (started mid Feb). It
consists of just over 400 pages and other than minor changes hasn't
had new pages added since March. Even though the 400 pages have been
online over 3 months only 205 show with an allinurl: search (425 are
indicated as spidered).
Conclusion Google will spider the contents of a large site over time
(quite quickly as well), but doesn't add them all right away as
otherwise there would be many more pages shown with an allinurl:
search. It suggests there is a limit as to how many pages it will add
to the database from a domain/sub domain per month. I'd guess it's a
percentage of total pages spidered plus other factors.