Data objects vs. speed

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I have to create a big web application and I was thinking of using a
data layer. For each entity in the database, I'll define a class that
maps the table structure, having sub-objects for each foreign key,
having insert/delete/update  methods, the usual deal. Yet, I am very
concerned about performance. For example, there are lots of cases when
I may just be needing the employee name. Yet using this model, I will
have to instantiate an entire Employee class, which may have sub-
objects and use lots of SELECT queries. I think this will be an
important performance hit. What do you think? Ever did something like
this? Is it worthy? Is there a better way? Should I give up creating
the data layer?

Re: Data objects vs. speed wrote:
Quoted text here. Click to load it

I do objects in all my more complex sites.  But I generally use business  
objects - ones related to the job at hand.  Any business object may be  
related to a few columns in one table, most columns across multiple  
tables, or anything in between.

And I may even use the same object, but have different fetch methods  
which retrieve only partial data.

For instance, if I have:

   table student
     id int
     name char(30)
     other stuff

   table class
     id int
     name char
     instructor_id int
     other stuff

   table class_student
     class_id int
     student_id int

I might have a class which will retrieve all the student names, the  
instructor name, day(s) and start/s for a single class, the room in  
which it will be held, etc..  Alternatively, it might only retrieve a  
count of the number of students in the class

If the data is getting more complex, I may add a pure data layer under  
the business object layer, where I do have one table per object.  But  
this layer will also have multiple functions to allow me to retrieve  
what I need, when I need it.

Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.

Re: Data objects vs. speed

I have just implemented something similar.

You can structure the SELECTs in such a way that they are only run if
the relavent information is requested from the class via a get method.
For instance, if any of the methods requesting user information are
called, a SELECT is run which gets all user information (since all
info is on the same row - might as well get it all). Any further
requests then give up this cached info rather than another SELECT.

However, I share your concern with having to instantiate a large class
for even simple queries. I am a newbie at PHP (C++ coder by trade) and
am only now reading about the performance overheard of function calls
and the fact that PHP will have to parse the entire class code.

Is it worth the trade off? I don't know. I has certainly made the
coding a whole lot easier and more structured. I am now implementing
cacheing certain data in the session, and this is a snip as everything
is accessed via the same class.

I will read this thread with interest.

Re: Data objects vs. speed

lister wrote:
Quoted text here. Click to load it

In a big class where you need lots of data, this would be a performance  
disaster.  Some of my tables have 40+ columns - and making 40+ SELECT  
statements, each fetching one item, is a tremendous waste of time.

Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.

Re: Data objects vs. speed

Quoted text here. Click to load it

I think you misunderstand me. There is only one select statement per
table. Yes, I could have 40 columns, and 40 different get*() methods,
but only the first call to one of those methods would result in a
SELECT *. Then all 40 pieces of information would be populated in the
class, and any subsequent get*() calls would simply return the value.

Re: Data objects vs. speed

lister wrote:
Quoted text here. Click to load it

OK, that's better.  But it's also a lot of unnecessary overhead to fetch  
40 columns if you only need 3.

Additionally, you should never use SELECT * - put in the names of the  
columns instead.  For instance, what happens if someone adds a 10Mb BLOB  
field to your table and you SELECT *?

As I indicated above, a better compromise between getting everything and  
getting just one field is to have more than one fetch method to allow  
fetching of multiple columns.

For instance, on one customer I have one table (companies they sell to)  
with 42 columns (and yes, it is normalized).  One fetch might get  
company name and id.  Another might get this plus address and phone  
info.  A third one might get all of this plus current financial info.

And since this is a business object, I could have a method which fetches  
outstanding transactions.  Another can fetch all transactions.  These  
and others like them link to other tables - but that is transparent to  
the program itself.

Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.

Re: Data objects vs. speed

rpsetzer wrote:

Quoted text here. Click to load it

The approach I'm using in a current project is along these lines:

    public function load_from_database($keys)
    public function load_from_array($vals)

Such that the class can be populated from either the database, or from an
array. It can be partially populated, so that, for example, you can load
the Employee ID and Employee Name from an array like so:

    $e = new Employee();
    $e->load_from_array(array('id'=>1, 'name'=>'Joe'));

where the other fields like address, telephone number, etc would remain
blank within the object, but if they were asked for:

    print $e->get('phone');

then the Employee object is smart enough to grab all the missing data from
the database and return the appropriate value.

So for example:

    $db = new PDO(...);
    foreach ($db->query('SELECT id, name FROM employees') as $row)
        $e = new Employee();
        print $e->get('name');
        if ($e->get('name')=='Joe')
            print ' '.$e->get('phone');
        print "\n";

will work as expected, printing a list of employees, including phone
numbers for employees called Joe. Yes, that's right -- even when the
initial query didn't select phone numbers!

When you perform the get('phone') call, $e realises that 'phone' is NULL,
so performs its own call to load_from_database, which runs 'SELECT * FROM
employees WHERE id=' and then feeds the results into

Toby A Inkster BSc (Hons) ARCS
Contact Me ~
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!

Re: Data objects vs. speed


Of course it depends on what you need to do and how much processing  
power you have at hand. For some purposes using a relational database  
may itself be a bad idea. But for most apps data objects can work fine,  
if you implement caching. When i developed the first version of  
phpPeanuts i wanted it to be the simpelest thing that could possebly  
work so i did not include caching. But when i tested my first serious  
app with it, caching was the first thing to be added. Without it it  
simply did not perform. With caching the app only fired 1/4th to 1/10th  
of the queries, this solved the performance problem. I implemented  
caching in the framework and i have not yet had to develop an app for  
wchich it was not sufficient.

To implement caching each object type needs to have a key. For each  
object that is retrieved from the database a reference* is added to the  
cache, using its key as key in an associative array. Associative arrays  
are great for caching because they use hashed lookup to quickly retrieve  
  objects. With hashed lookup a there is very little searching, and the  
searching does not increase substantially with large arrays.

When navigating over a '1-n' relationship in the '1' direction you have  
the (now foreign) key, so you first look in the cache to see if the  
object has already been loaded. If it has not, you load it and cache it  
and return a reference. This mechanism is so fast that it is not needed  
to store the reference to a related object in a member variable of the  
relating object. This saves you a big headache with circular references  
that tend to make php unstable.

When navigating over an 1-n relationship in the n direction the chache  
is of little help**. Therefore it is generally a good idea to store each  
array of objects you retrieve in a member variable of the relating object.

BTW, you do not have to implement insert/delete/update methods for each  
object type, with metadata you can implement generic insert, delete and  
update methods so you only need three methods however many types you  
make. Or you could download a framework that does all this ;-).


Henk Verhoeven,

BTW phpPeanuts does not work if you need arbitrary keys. It allways  
needs 'id' to be the key. Simpelest thing...

* in php5 it is not necessary to think about references, but in php4 you  
may get a subtantial performance gain by using variable references here.
** However, if an object is stored in multiple tables but you do not  
know in advance in which ones (like with polymorphism) the cache may  
save you an extra query to retrieve additional data.

Re: Data objects vs. speed

Quoted text here. Click to load it

Having a single class for each database table is a good idea - I have been  
using it for years. The notion of having a subclass for each foreign key is  
a definite no-no. It is totally unnecessary and a waste of time. Nor do you  
need a separate class method for each possible SELECT ... WHERE ... as it is  
possible to have a generic getData($where) method where the $where argument  
is a string which is provided at runtime. This can cover all eventualities.

It does not matter that you have to instantiate the EMPLOYEE class  
containing 40 columns if you only want a single column. In my methodology  
the default is SELECT *, but a specific list of column names can be provided  
for individual queries if required. This removes the need for a different  
class method for each combination of SELECT ... and WHERE ...

One thin that you should notice is that 95% of the code in a database table  
class is common to all database tables, therefore this common code can be  
put into a superclass and inherited by individual table classes. This means  
that each individual table class need contain no more than the following:
(1) database engine name
(2) database name
(3) table name
(4) database structure (column names, primary key names, candidate key  
names, relationships with other tables)
(5) custom business rules

If you are really clever you can put your database APIs in a separate class  
so that you can switch database engines (MySQL, PostgreSQL or Oracle) by  
changing a single line of code.

The bottom line is that if you use OO classes intelligently you can save a  
lot of repetitive coding, must don't waste time trying to create a complex  
class hierarchy - it just ain't worth it.

Tony Marston

Site Timeline