Do you have a question? Post it now! No Registration Necessary. Now with pictures!
June 4, 2007, 8:09 pm
rate this thread
about google and search and seo in general. Here are snipits from a
recent article I just read about the new google algo and what it's all
about. Hope this gives months of interesting things for others to think
and to post about:
Google Signals and Classifiers: (main part of the new google algo)
As Google compiles its index, it calculates a number it calls PageRank
for each page it finds. This was the key invention of Google’s founders,
Mr. Page and Sergey Brin. PageRank tallies how many times other sites
link to a given page. Sites that are more popular, especially with sites
that have high PageRanks themselves, are considered likely to be of
Mr. Singhal has developed a far more elaborate system for ranking pages,
which involves more than 200 types of information, or what Google calls
signals. PageRank is but one signal. Some signals are on Web pages like
words, links, images and so on. Some are drawn from the history of how
pages have changed over time. Some signals are data patterns uncovered
in the trillions of searches that Google has handled over the years.
Increasingly, Google is using signals that come from its history of what
individual users have searched for in the past, in order to offer
results that reflect each person’s interests. For example, a search for
dolphins will return different results for a user who is a Miami
football fan than for a user who is a marine biologist. This works only
for users who sign into one of Google’s services, like Gmail.
Once Google corrals its myriad signals, it feeds them into formulas it
calls classifiers that try to infer useful information about the type of
search, in order to send the user to the most helpful pages. Classifiers
can tell, for example, whether someone is searching for a product to
buy, or for information about a place, a company or a person. Google
recently developed a new classifier to identify names of people who
aren’t famous. Another identifies brand names.
These signals and classifiers calculate several key measures of a page’s
relevance, including one it calls topicality a measure of how the topic
of a page relates to the broad category of the user’s query. A page
about President Bush’s speech about Darfur last week at the White House,
for example, would rank high in topicality for Darfur, less so for
George Bush and even less for White House. Google combines all these
measures into a final relevancy score.
The sites with the 10 highest scores win the coveted spots on the first
search page, unless a final check shows that there is not enough
diversity in the results. If you have a lot of different perspectives on
one page, often that is more helpful than if the page is dominated by
one perspective, Mr. Cutts says. If someone types a product, for
example, maybe you want a blog review of it, a manufacturer’s page, a
place to buy it or a comparison shopping site.
If this wasn’t excruciating enough, Google’s engineers must compensate
for users who are not only fickle, but are also vague about what they
want; often, they type in ambiguous phrases or misspelled words.
Long ago, Google figured out that users who type Brittany Speers, for
example, are really searching for Britney Spears. To tackle such a
problem, it built a system that understands variations of words. So
elegant and powerful is that model that it can look for pages when only
an abbreviation or synonym is typed in.
Mr. Singhal boasts that the query Brenda Lee bio returns the official
home page of the singer, even though the home page itself uses the term
biography not bio.
But words that seem related sometimes are not related. We know ‘bio’ is
the same as ‘biography,’ Mr. Singhal says. My grandmother says: ‘Oh,
come on. Isn’t that obvious?’ It’s hard to explain to her that bio means
the same as biography, but ‘apples’ doesn’t mean the same as ‘Apple.’
In the end, it’s hard to gauge exactly how advanced Google’s techniques
are, because so much of what it and its search rivals do is veiled in
secrecy. In a look at the results, the differences between the leading
search engines are subtle, although Danny Sullivan, a veteran search
specialist and blogger who runs Searchengineland.com, says Google
continues to outpace its competitors.
Yahoo is now developing special search formulas for specific areas of
knowledge, like health. Microsoft has bet on using a mathematical
technique to rank pages known as neural networks that try to mimic the
way human brains learn information.
Google’s use of signals and classifiers, by contrast, is more rooted in
current academic literature, in part because its leaders come from
academia and research labs. Still, Google has been able to refine and
advance those ideas by using computer and programming resources that no
university can afford.
People still think that Google is the gold standard of search, Mr.
Battelle says. Their secret sauce is how these guys are doing it all in
aggregate. There are 1,000 little tunings they do.
To read more and see the full article click here: