July 6, 2006 How Google Works...

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!


July 6, 2006
How Google Works

How Google Works
By David F. Carr
For all the razzle-dazzle surrounding Google, the company must still
work through common business problems such as reporting revenue and
tracking projects. But it sometimes addresses those needs in
unconventional-yet highly efficient-ways. Others are starting to
follow its lead. Here's why.

With his unruly hair dipping across his forehead, Douglas Merrill walks
up to the lectern set up in a ballroom of the Arizona Biltmore Resort
and Spa, looking like a slightly rumpled university professor about to
start a lecture. In fact, he is here on this April morning to talk
about his work as director of internal technology for Google to a crowd
of chief information officers gathered at a breakfast sponsored by
local recruiting firm Phoenix Staffing.

ADVERTISEMENT Google, the secretive, extraordinarily successful $6.1
billion global search engine company, is one of the most recognized
brands in the world. Yet it selectively discusses its innovative
information management infrastructure-which is based on one of the
largest distributed computing/grid systems in the world.

Merrill is about to give his audience a rare glimpse into the future
according to Google, and explain the workings of the company and the
computer systems behind it.

View the PDF -- Turn off pop-up blockers!

For all the razzle-dazzle surrounding Google-everything from the
press it gets for its bring-your-dog-to-work casual workplace, to its
stock price, market share, dizzying array of beta product launches and
its death-match competition with Microsoft-it must also solve more
basic issues like billing, collection, reporting revenue, tracking
projects, hiring contractors, recruiting and evaluating employees, and
managing videoconferencing systems-in other words, common business

But this does not mean that Google solves these problems in a
conventional way, as Merrill is about to explain.

"We're about not ever accepting that the way something has been done in
the past is necessarily the best way to do it today," he says.

Among other things, that means that Google often doesn't deploy
standard business applications on standard hardware. Instead, it may
use the same text parsing technology that drives its search engine to
extract application input from an e-mail, rather than a conventional
user interface based on data entry forms. Instead of deploying an
application to a conventional server, Merrill may deploy it to a
proprietary server-clustering infrastructure that runs across its
worldwide data centers.

Google runs on hundreds of thousands of servers-by one estimate, in
excess of 450,000-racked up in thousands of clusters in dozens of
data centers around the world. It has data centers in Dublin, Ireland;
in Virginia; and in California, where it just acquired the
million-square-foot headquarters it had been leasing. It recently
opened a new center in Atlanta, and is currently building two
football-field-sized centers in The Dalles, Ore.

By having its servers and data centers distributed geographically,
Google delivers faster performance to its worldwide audience, because
the speed of the connection between any two computers on the Internet
is partly a factor of the speed of light, as well as delays caused by
network switches and routers. And although search is still Google's big
moneymaker, those servers are also running a fast-expanding family of
other applications like Gmail, Blogger, and now even Web-based word
processors and spreadsheets.

That's why there is so much speculation about Google the
Microsoft-killer, the latest firm nominated to drive everything to the
Web and make the Windows desktop irrelevant. Whether or not you believe
that, it's certainly true that Google and Microsoft are banging heads.
Microsoft expects to make about a $1.5 billion capital investment in
server and data structure infrastructure this year. Google is likely to
spend at least as much to maintain its lead, following a $838 million
investment in 2005.

And at Google, large-scale systems technology is all-important. In
2005, it indexed 8 billion Web pages. Meanwhile, its market share
continues to soar. According to a recent ComScore Networks qSearch
survey, Google's market share for search among U.S. Internet users
reached 43% in April, compared with 28% for Yahoo and 12.9% for The
Microsoft Network (MSN).

And Google's market share is growing; a year ago, it was 36.5%. The
same survey indicates that Americans conducted 6.6 billion searches
online in April, up 4% from the previous month. Google sites led the
pack with 2.9 billion search queries performed, followed by Yahoo sites
(1.9 billion) and MSN-Microsoft (858 million).

This growth is driven by an abundance of scalable technology. As Google
noted in its most recent annual report filing with the SEC: "Our
business relies on our software and hardware infrastructure, which
provides substantial computing resources at low cost. We currently use
a combination of off-the-shelf and custom software running on clusters
of commodity computers. Our considerable investment in developing this
infrastructure has produced several key benefits. It simplifies the
storage and processing of large amounts of data, eases the deployment
and operation of large-scale global products and services, and
automates much of the administration of large-scale clusters of

Google buys, rather than leases, computer equipment for maximum control
over its infrastructure. Google chief executive officer Eric Schmidt
defended that strategy in a May 31 call with financial analysts. "We
believe we get tremendous competitive advantage by essentially building
our own infrastructures," he said.

Google does more than simply buy lots of PC-class servers and stuff
them in racks, Schmidt said: "We're really building what we think of
internally as supercomputers."

Because Google operates at such an extreme scale, it's a system worth
studying, particularly if your organization is pursuing or evaluating
the grid computing strategy, in which high-end computing tasks are
performed by many low-cost computers working in tandem.

Despite boasting about this infrastructure, Google turned down requests
for interviews with its designers, as well as for a follow-up interview
with Merrill. Merrill did answer questions during his presentation in
Phoenix, however, and the division of the company that sells the Google
Search Appliance helped fill in a few blanks.

In general, Google has a split personality when it comes to questions
about its back-end systems. To the media, its answer is, "Sorry, we
don't talk about our infrastructure." Yet, Google engineers crack the
door open wider when addressing computer science audiences, such as
rooms full of graduate students whom it is interested in recruiting. As
a result, sources for this story included technical presentations
available from the University of Washington Web site, as well as other
technical conference presentations, and papers published by Google's
research arm, Google Labs

Site Timeline