# Traversing Friends Graph - How does Friendster do it?

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on

I'm not sure if this forum is the correct place to post this, but I
couldn't think of any other group. I would really appreciate any help
you could give me.

FINAL GOAL OF MY APPLICATION:
Building a friendster clone for a very large organization (10,000
people). I am using Php/Mysql.

SETUP OF SYSTEM:
(it may help if you are familiar with how friendster works)
Mathematically what I have is a graph, with around 10,000 nodes (each
node representing one person). Each of these nodes have an avg of appx
40 edges coming out from them going to other nodes in the graph.

I have implemented this graph in my database using two tables. One, I
have a Person table which assigns each person a ID. Then I have a table
called "Edge" that defines the edges. It has two cells in every row of
the table (each cell holds the ID of one of the two ends of an edge).

WHAT I WANT TO DO:
Starting at any node, I want to be able to (as quickly as possible)
calculate which nodes are up to 3 degrees of separation away from it.
This would be needed because I want to allow someone to search for
people within 3 degrees of distance away from themselves.

WHAT I HAVE TRIED:
I wrote a recursive function that starts at a certain node and just
traverses all the nodes that are a given distance away. I have done
optimizations to this function like keeping track of which nodes have
already been visited, and not revisiting them. On the site, I was
planning on running this function the first time someone does a search,
and then storing the results in the session so I wouldn't have to run
it again.

The problem is with a test database where each person has about 40
edges going from them to random nodes, it takes about 60 seconds to
calculate the 3rd degree friends. It is taking about 800 sql select
queries.  60 seconds however is WAY too slow...

Can someone explain to me, or take a stab at how friendster (or any
website like it, orkut, linkedin, hi5 etc) does it (their database is
far greater than mine). Friendster not only keeps track of WHO is up to
3 degrees away from you, but they also tell you ALL the paths between
you and any other person within those 3 degrees.

If you can come up with some smart way that I could solve my problem
(like maybe store everyone's 2nd degree friends before hand, or
something). I'm really stuck...any help would be GREATLY appreciated.

Thanks,
Timin

## Re: Traversing Friends Graph - How does Friendster do it?

Timin Uram wrote:

OK, here is an idea...  Imagine a database table:

CREATE TABLE `knows` (
`who` int,
`whom` int,
KEY `who` (`who`),
KEY `whom` (`whom`)
) COMMENT='The Who Knows Whom Table'

So if the person whose data stored in another table under
ID number 1578 knows the person whose data stored in that
same table under ID number 8690, there would be two records
in the table `knows`:

===========
who    whom
===========
1578   8690
8690   1578
===========

Now we need to figure out, say, who knows ID 6231 firsthand:

SELECT who FROM knows WHERE whom=6231;

Then, we can figure out the first degree of separation (i.e.,
who knows someone who knows 6231):

SELECT
k1.who AS who,
k2.who AS through,
k2.whom AS whom
FROM knows AS k1 LEFT JOIN knows AS k2 ON k1.whom=k2.who
WHERE k2.whom=6231;

Hopefully, this will return something like:

=========================
who     through     whom
=========================
4572      9421      6231
=========================

meaning, 4572 knows 6231 through 9421.

Now comes the second degree of separation:

SELECT
k1.who AS who,
k2.who AS through1,
k3.who AS through2,
k3.whom AS whom
FROM (knows AS k1 LEFT JOIN knows AS k2 ON k1.whom=k2.who)
LEFT JOIN knows AS k3 ON k2.whom=k3.who
WHERE k3.whom=6231;

And, finally, voila: the third degree:

SELECT
k1.who AS who,
k2.who AS through1,
k3.who AS through2,
k4.who AS through3,
k4.whom AS whom
FROM ((knows AS k1 LEFT JOIN knows AS k2 ON k1.whom=k2.who)
LEFT JOIN knows AS k3 ON k2.whom=k3.who)
LEFT JOIN knows AS k4 ON k3.whom=k4.who
WHERE k4.whom=6231;

Actually, you can write a function that would generate a query
automatically given the number of degrees of separation...

Cheers,
NC

## Re: Traversing Friends Graph - How does Friendster do it?

Hi NC, Thanks for your response. Thanks for your suggestion, and it
definitely cut down on the time, however, because of multiple joins,
the query still ends up taking about 20 seconds. It also doesn't take
loops into account (but that's ok because we're not infinitely
looping).

Are there any other optimizations possible...or something else, like
caching some information somehow? It seems like these friendster sites
do this pretty fast...is there something else possible?

Timin

NC wrote:

it.

## Re: Traversing Friends Graph - How does Friendster do it?

Timin Uram wrote:

???

A similar query I tested on a shared hosting with a table
of about 7500 records consistently completes in under one
second.  Are you sure you have proper indexes in place?

Cheers,
NC

## Re: Traversing Friends Graph - How does Friendster do it?

NC wrote:

I have a table of about 100,000 edges (200,000 rows), and it is taking
about 20 seconds. I've tried it on two different servers (a local one,
and one on a shared host).

Another thing is that in a friendster type network, there is a
clustering effect where say 40 friends have most of each other in each
other's lists. This leads to a lot of loops (paths such as 8->7->8->9
for a path from 8 to 9). So maybe it's faster when there are completely
random connections...(but I wouldn't thinks so).

I also have an index on both of the primary keys (who and whom)

Is there anything else possible? Or do you there there is something
wrong with my setup?

## Re: Traversing Friends Graph - How does Friendster do it?

You can modify the queries to avoid loops, or at least minimize
them.  Whether or not the added conditions take more time to check
than they do cut down the number of combinations, I don't know.
You will still get many combinations where there are multiple paths
between two people that don't involve loops.

Assuming you are doing a join of 4 instances of the "who knows whom"
table (k1, k2, k3, and k4), for a 3-degrees-of-separation query.
Assuming for the moment that nobody "knows" himself in the table
(and this is somehow enforced), you might want to put in the
conditions:

k1.who != k3.who
k1.who != k4.who and k2.who != k4.who

which prevents the same person showing up in the chain more than once.

Gordon L. Burditt

## Re: Traversing Friends Graph - How does Friendster do it?

Timin Uram wrote:

OK, let's try this:

1. Change your indexing.  Right now, you have a primary
index based on both columns.  Replace it with two non-
unique indexes, one for each column.

2. Rewrite the query like this:

SELECT
k1.who AS who,
k2.who AS through1,
k3.who AS through2,
k4.who AS through3,
k4.whom AS whom
FROM ((knows AS k1 LEFT JOIN knows AS k2 ON k1.whom=k2.who)
LEFT JOIN knows AS k3 ON k2.whom=k3.who)
LEFT JOIN knows AS k4 ON k3.whom=k4.who
WHERE k1.who = 6231
AND k3.who <> 6231
AND k4.who <> 6231
AND k4.whom <> 6231

My first post to this topic said "WHERE k4.whom=6231", which
created a lot of completely unnecessary overhead (my apologies
for not thinking about it sooner).  This way, it runs MUCH
faster...  Also, looping is taken care of by adding extra
WHERE clauses.

Cheers,
NC

## Re: Traversing Friends Graph - How does Friendster do it?

Timin Uram wrote:

Of course there is always something possible.  You can get
better hardware (lots of it) and/or hire Jeremy Zawodny to
advise you on performance tuning and replication (the latter,
by the way, is exactly what Friendster did).

Cheers,
NC

## [OT] Re: Traversing Friends Graph - How does Friendster do it?

<snip>

Surprise. I'm reading J.Z's blog for sometime and might have missed
this news. Couldn't even find that in Google. Just curious, where did
you find this news?

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com    Blog: http://rajeshanbiah.blogspot.com /

## Re: Traversing Friends Graph - How does Friendster do it?

R. Rajesh Jeba Anbiah wrote:

NC> hire Jeremy Zawodny to advise you on performance tuning and
NC> replication (the latter, by the way, is exactly what Friendster
NC> did).

It's not on the blog, it's in the "MySQL Stuff" section:

Consulting

I occasionally (as time permits) help out companies with their
MySQL needs. I specialize in performance tuning and replication
but can handle nearly all aspects of MySQL on Linux/Unix systems.
Here are some of my current and past clients.

Quest Software - Foglight group
Plaxo
CafeDVD
Friendster
Feedster LLC
Technorati
Spam Arrest LLC
Cloudmark
Rackspace
LiveJournal
Eight Days Inc.

Contact me for rates and availability.

http://jeremy.zawodny.com/mysql /

Cheers,
NC

## [OT]Re: Traversing Friends Graph - How does Friendster do it?

NC wrote:

did
<snip>

Thanks. Interesting, I missed it really. Now I wonder, why Yahoo
didn't fire him yet;)

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com    Blog: http://rajeshanbiah.blogspot.com /

## Re: Traversing Friends Graph - How does Friendster do it?

wrote:

Here is what I tried.

Tables
----------

persons
pid    serial        // int4 with autoincrement
name    varchar(32)

lid    serial        // int4 with autoincrement (not
required but I like to have uniq id
in all tables
parent    int4
child    int4

Then I filled the persons table with 10000 random names and then

I can then select all linked persons to used id 1201 with:

SELECT DISTINCT
pid
FROM
WHERE
(parentid IN (
WHERE (pid=childid AND parentid IN (
WHERE (pid=childid AND parentid=1201
))
)) AND pid=childid);

This takes about 1-2 secs on my PostgreSQL server and could be even
faster with indexes in links table.

- Allan Savolainen

## Re: Traversing Friends Graph - How does Friendster do it?

Hey, one thing is that my host has mysql version 4.0 which doesn't
support subqueries, but I can get around that by just doing each of
those queries separately, and then inserting those results in the ()
separated by commas individually.

However, I did try the query on my local server with mysql 4.1, and it
still took about 18 seconds. Here is an image of my table's setup from

Is something wrong with my setup or indexes?

## Re: Traversing Friends Graph - How does Friendster do it?

Timin Uram wrote:

Yes.  Instead of creating two separate indexes for `who`
and `whom`, you created a single index for both fields.
This makes it difficult to join the table to itself on
copy1.who = copy2.whom.

Cheers,
NC

## Re: Traversing Friends Graph - How does Friendster do it?

hey timin,
i read your post. ya i've also been surfing the friendster and hi5. but
right now i'm writing to you to ask that do you have any idea that how
theirs Invitation works. You might be familiar about their invitation
program. Just give your msn/yahoo id and pasword, they just grabs all
the contacts of that id and sends an invitaion.
do you know how it works???
hope listen from you soon.
Sachin

## Re: Traversing Friends Graph - How does Friendster do it?

Hey Sachin, I don't know the exact answer for your query, but I can
guess that they are doing the same thing that 3rd party chatting
programs like trillian and gaim do.

Gaim is open source, so you could possibly look at the code for doing
that there. Friendster or hi5 must have this code running in the
backend.

Also note that gaim and trillian are not authorized to do what they do,
but for now, i think the chatting networks (msn, yahoo, aol) have just
given up trying to fight and are allowing third party chatting programs
on their network.