Do you have a question? Post it now! No Registration Necessary. Now with pictures!
September 22, 2006, 3:13 pm
rate this thread
Top SEO Tools
Top 10 Exposure
Search Engine Tool
The Google Goal Of Indexing
100 Billion Web Pages
By Danny Wirken
Google's Goal of Quality Search
In their paper 'The Anatomy of a Large-Scale Hypertextual Web Search
Engine' it is very evident that Google's goal has always been to be one
of the best search engines there is in terms of the quality of the
results it gives. Sergey Brin and Lawrence Page, however knew that in
order to do this, Google needed to be able to store information
efficiently and cost effectively and to have excellent crawling,
indexing, and sorting methods or techniques. Google not only aimed to
give quality results but to produce the results as fast as possible.
Editorial Note: Read Jim Hedger's blog post about Google's Matt Cutts
comments on Domain Hijacking as a Blackhat Technique.
Google started as a high quality search engine and continues to be the
best search engine today. It has managed to stay true to its original
intent to be a search engine that not only crawls and indexes the web
efficiently but also a search engine that produces more satisfying
results in comparison to other existing search engines. To stay true to
the goal of providing the best search results, Google knew right from
the start that it had to be designed so that the search engine could
catch up with the web's growth. According to Brin and Page "In
designing Google we have considered both the rate of growth of the Web
and technological changes. Google is designed to scale well to
extremely large data sets. It makes efficient use of storage space to
store the index". They knew that they needed much space to store an
ever growing index.
Google's index size, which started out as 24 million web pages, was
large for its time and has grown to around 25 billion web pages, still
keeping Google ahead of its competitors. However, Google is a company
that doesn't settle for just beating the competitors. They truly aim to
give their users the best service there is and that means as a search
engine they want to give users access to all or at least most of the
quality information that is available on the web.
Google's New System for Indexing More Pages
As mentioned earlier, Google aims to give access to even more
information and has been devoting time and much effort to realize this
goal. It seems that the new patent entitled 'Multiple Index Based
Information Retrieval System' filed by Google employee Anna Patterson
might be the answer to the problem. The patent published just this May
of 2006 and filed way back in January of 2005 shows that Google might
actually be aiming to expand their index size to as much as a 100
billion web pages or even more.
According to the patent, conventional information retrieval systems,
more commonly known as search engines, are able to index only a small
part of the documents available on the Internet. According to
estimates, the existing number of web pages on the Internet as of last
year was around 200 billion; however, Patterson claimed that even the
best search engine (that is Google) was able to index only up to 6 to 8
billion web pages.
The disparity between the number of indexed pages and existing pages
clearly signaled a need for a new breed of information retrieval
system. Conventional information retrieval systems just weren't capable
of doing the job and just wouldn't be able to index enough web pages to
give users access to a large enough percentage of the present existing
information available on the web.
The Multiple Index Based Information Retrieval System, however, is up
to the challenge and is Google's answer to the problem. Two
characteristics of the new system makes it stand out compared to the
conventional systems. One is that it has the "capability to index an
extremely large number of documents, on the order of a hundred billion
or more". And the other is its capability to "index multiple versions
or instances of documents for archiving...enabling a user to search for
documents within a specific range of dates, and allowing date or
version related relevance information to be used in evaluating
documents in response to a search query and in organizing search
With the new system developed by Patterson, Google now has the ability
to expand its index size to unbelievable proportions as well as improve
document analysis and processing, document annotation, and even the
process of ranking according to contained and anchor phrases.
History of Google's Index Size
Google started out with an index size of around 24 million web pages in
1996. By August of 2000, Google had managed to quadruple their index
size to approximately one billion web pages. In September of 2003,
Google's front-page boasted an index of 3.3 billion web pages.
Microdoc, however, revealed that the actual number of web pages Google
had indexed during that time was already more than five billion web
pages. In their article 'Google Understates the Size of Its Database',
they emphasized that Google not only specialized in simplicity but also
in understating their power and complexity. Google was still managing
to stay ahead of its competitors and continued to surprise everyone
with what they had up their sleeves.
Forget Expensive PPC Advertising
Get a Google-Type Ad with Top 10 Exposure across 230+ search engines
and web directories delivering 150 Million+ Searches/Mo.
$3 - $4/Month - Quick Inclusion - World Wide Placement!
Your Keywords - No Bidding - No Click Fraud - Stats Tracking
Sign Up Today - Receive 3 Bonuses Valued at $90
As Google's index continued to grow the number in their front page grew
impressively large as well before it plateaued at eight billion web
pages. This was around the time that Patterson filed the new patent.
Then in 2005, with controversies in index size growing, Google decided
to stop counting in front of the public and simply claimed that their
index size was three times larger than the nearest competitor's index
size. Google also maintained that it was not just the size of indexed
pages that was important but how relevant the results they returned
Then in September of 2005, as part of Google's 7th anniversary, Anna
Patterson, the same software engineer who filed the patent on the
Multiple Based Index Information Retrieval System posted an entry on
Google's official blog claiming that the index size was now 1,000 times
larger than the original index. This pegged their index size at around
24 billion web pages, about a fourth of Google's goal of indexing a 100
billion web pages. It seems then that Google must have started using
the new system in mid 2005. With the new system in place, we can only
wait and see how fast Google will reach the goal of a 100 billion web
pages in its index. It's most likely though that when Google has
reached that goal it will set an even higher goal to provide continuous
About The Author
Danny Wirken is co-owner of http://www.theinternetone.net an internet
marketing website that primarily focuses on the many aspects,
methodologies and processes that are used in internet marketing.
Re: The Google Goal Of Indexing
I suggest them to present search results in two columns, 10 per column, so
people can be on top of either the left or the right one :-)
John Need help with SEO? Get started with a SEO report of your site: