Click here to get back home

Scalability Code question - PHP vs MySQL

 HomeNewsGroups | Search | About
 comp.lang.php    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Scalability Code question - PHP vs MySQL rich 08-01-2008
Get Chitika Premium
Posted by rich on August 1, 2008, 10:32 am
Please log in for more thread options
I'm having a tough time figuring out which of these two options are
best. This is a matter of processing my data in PHP, vs MySQL.
Usually that's a no brainer, but I have a couple gotchyas here and
would love any and all opinions here. I'm going to make this as short
and simple as I can...

This is for an e-commerce site with very high traffic, and the choice
will probably not be based on speed, but which is more scalable. I
need this to last. So here's my test code.. you may not know all
these functions, but I think they're very straight forward:

// 2 ways of doing this.. 1 query or more?
$start = microtime(true);
$productSql = "SELECT * FROM $searchTemp $productWhere $sort"; //
searchTemp is a large table of denormalized data
$searchResults = $my->returnTableAssoc($productSql,
$selectFromSlave); // this just returns a multidimensional array of
the results

// this is an array_unique for a multidimensional array and will
essentially be like group_by productid
$products = remove_dups($searchResults, 'productid');
// get the other columns of data needed
$brands = array();
$cats = array();
$colors = array();
$years = array();
$bootWidth = array();
$flex = array();
foreach($searchResults as $sr)
{
        $brands[] = $sr['manufacturer'];
        $cats[] = $sr['categoryid'];
        $colors[] = $sr['colorcode'];
        $years[] = $sr['modelYear'];
        $bootWidth[] = $sr['bootWidth'];
        $flex[] = $sr['flexRating'];
}
$brands = array_unique($brands);
$cats = array_unique($cats);
$colors = array_unique($colors);
$years = array_unique($years);
$bootWidth = array_unique($bootWidth);
$flex = array_unique($flex);
$end = microtime(true);
echo "Did first in " . ($end - $start) . " seconds <br>";

// try again - just do a bunch of queries and let mysql do all the
work
$productSql = "SELECT * FROM $searchTemp $productWhere GROUP BY
productid $sort";
$products = $my->returnTableAssoc($productSql, $selectFromSlave);
$productSql = "SELECT distinct manufacturer FROM $searchTemp
$productWhere";
$brands = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct categoryid FROM $searchTemp
$productWhere";
$cats = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct colorcode FROM $searchTemp
$productWhere";
$colors = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct modelYear FROM $searchTemp
$productWhere";
$years = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct bootWidth FROM $searchTemp
$productWhere";
$bootWidth = $my->returnArray($productSql, $selectFromSlave);
$productSql = "SELECT distinct flexRating FROM $searchTemp
$productWhere";
$flex = $my->returnArray($productSql, $selectFromSlave);
$end = microtime(true);
echo "Did second in " . ($end - $start) . " seconds <br>";


So, on my development server, #1 runs in .9 seconds, and #2 runs in
3.7 seconds. However in my live production environment with 2
webservers and 2 database servers, they run at approx 1.1 seconds
each. It's essentially a tie.

Another thing to keep in mind is whichever option I choose, I'll be
using memcache to speed things along also.

So, in short, both run at the same speed, but which one is more
scalable?

Thanks.

Posted by Dale on August 1, 2008, 10:58 am
Please log in for more thread options
> I'm having a tough time figuring out which of these two options are
> best. This is a matter of processing my data in PHP, vs MySQL.
> Usually that's a no brainer, but I have a couple gotchyas here and
> would love any and all opinions here. I'm going to make this as short
> and simple as I can...

<snip>

> So, in short, both run at the same speed, but which one is more
> scalable?

you've got to be kidding, right? if not, you're overlooking a lot of obvious
things. as much as possible, let a db do what it was designed to do. you
know very well your php scenario won't fly and has no chance of scaling!
right?



Posted by rich on August 1, 2008, 11:31 am
Please log in for more thread options
> you've got to be kidding, right? if not, you're overlooking a lot of obvi=
ous
> things. as much as possible, let a db do what it was designed to do. you
> know very well your php scenario won't fly and has no chance of scaling!
> right?

Well yeah, I know to let mysql do the work. But I think this requires
more thought than a textbook answer. This is probably (from a
performance standpoint) the most important spot on the whole site and
I want to make sure this is done right. You could argue that from a
query queue point of view, running the 1 query is way faster. Really
I guess the ONLY question here, who gets to figure out distinct
values.

Also I think memcache would be more useful in the PHP scenario. Can
just store the main query and sort it out from there.

Don't get me wrong though... I'm leaning toward the MySqQL way, I just
think this is important enough to get more opinions on before I charge
ahead.


Posted by Dale on August 1, 2008, 11:51 am
Please log in for more thread options

> you've got to be kidding, right? if not, you're overlooking a lot of
> obvious
> things. as much as possible, let a db do what it was designed to do. you
> know very well your php scenario won't fly and has no chance of scaling!
> right?

Well yeah, I know to let mysql do the work. But I think this requires
more thought than a textbook answer.

== i don't think it does.

This is probably (from a performance standpoint) the most important spot on
the whole site and
I want to make sure this is done right.

== give it its best shot then. :)

You could argue that from a
query queue point of view, running the 1 query is way faster.

== well, that's one point. however, you've not introduced traffic into your
scenario yet. further, you asked about scaling. have you benchmarked the
times not only against mysql but other databases as well?

Really
I guess the ONLY question here, who gets to figure out distinct
values.

== and how are you 'dedup-ing' your data in php? are you unable to translate
that into ansi sql?

Also I think memcache would be more useful in the PHP scenario. Can
just store the main query and sort it out from there.

== php doesn't cache queries. php cannot index either. a db has to send a
resultset to php in order for php to process it anyway. that all goes into
'does this scale better'. i don't care what you do, a db can only get faster
where php will not. so imagine the value of 1 being where the db and php
converge. notice that the db is faster and tops out at 1. php is slower,
with the best time being 1. get the picture?

Don't get me wrong though... I'm leaning toward the MySqQL way, I just
think this is important enough to get more opinions on before I charge
ahead.

== understood. do you have a real-life table and query requirements that you
could post here? finally, in your test table and for your performance
testing, does it contain an amount of data that you expect to have in the
wild?

btw, stay away from variable names like $searchTemp. it is almost
meaningless to anyone that comes behind you. it *appears* to be the search
criteria, so why not name it $searchCriteria and save yourself and others
from having to guess?

cheers



Posted by rich on August 1, 2008, 11:54 am
Please log in for more thread options
> Don't get me wrong though... I'm leaning toward the MySqQL way, I just
> think this is important enough to get more opinions on before I charge
> ahead.

Couple more things if anyone else wants to chime in here...

I ran the test again just now, and I got:

Did first in 1.1379570960999 seconds
Did second in 5.2290420532227 seconds

When I said before the times were tied, I think I was being dumb and
the queries were cached. I'm pretty sure there's no way I can count
on these being cached live because of all the possible combinations.
So again.. option 1 uses far less mysql time.. AND if i stored that
main query in memcache, it'd be even faster. BUT - that sounds like a
lot of webserver memory usage. Not great. Ugh.


Similar ThreadsPosted
Scalability of PHP image galleries January 22, 2006, 3:06 pm
scalability of perl cgi programs that load text files August 21, 2004, 11:33 pm
Newbie PHP question: 'hiding' code on server from View Source January 16, 2007, 11:49 am
MySQL Count() question.. (mysql newb) February 17, 2005, 4:45 am
base64 code mysql February 8, 2005, 3:03 pm
MYSQL Fulltext relevance value - need php code January 27, 2005, 11:12 am
How can I get free source code for PHP and MySql? July 13, 2005, 9:29 am
Image Code Verification with PHP/MySQL? January 12, 2006, 3:50 pm
how to not write password in code for using to mysql? May 27, 2007, 5:15 am
php code, timezones affecting mysql? November 21, 2007, 7:14 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap