maybe OT: MySQL design question

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
This is a theoretical design question as to how to address an issue where a  
certain table with ~20 fields will grow rapidly (20,000 records per day) and  
we need to keep all records in the active DB for seven years.  We're talking  
about > 50 million records.

I see two approaches to this:

1.) Single table using raid

2.) One table per day (365x7 =  2555 tables) and build an object to  
interface as if it were a single table.

Which (if any) approach would you think is better where better = fast  
response time and overall maintenance.



Re: maybe OT: MySQL design question

Quoted text here. Click to load it

Common database design principles tell us that the schema should
remain (relatively) unchanged throughout the life of the application.
i.e. You don't add a new table when you add a new customer to your
system (or, in this case, a new day).  So the first answer is the
correct one.

Re: maybe OT: MySQL design question

Ben wrote:

Quoted text here. Click to load it

That's a horrible idea. You'll slow down queries to a crawl. A search for
a particular record, if you don't know the date it was created will result
in (up to) 2555 SQL queries. As each table will be smaller than the
original table, it will, admittedly, not take 2555 times longer than a
single table, but would certainly take in the order of 100 times longer or

I would also recommend switching to a database engine more capable of
handling tables this size. Oracle, SQL Server 2005, PostgreSQL or Sybase
would be far better choices for this sort of data.

For additional speedups, make sure you index any field you think it's
likely you'll want to search on or sort by -- dates and numeric fields are
generally good candidates.

Structure text-related fields such that you will generally be able to do
head searches rather than tail searches. To clarify, it's far faster to
search for "surname LIKE 'Jones%'" than "fullname LIKE '% Jones%'".

I would recommend a RAID 5 array to store the data on, and if you're
planning on doing a lot of read-only queries, look into a second server
and set them up as a master-slave arrangement. Writes go to the master,
and propagate to the slave when the slave's not busy. The slave then
handles all read requests. Most enterprise-level databases include
replication like this.

Toby A Inkster BSc (Hons) ARCS
Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux

Site Timeline