Going for a LARGE Table: Any Tips?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hi there

I'm developing a large web application. Part of this web application will
be storing numerical chart data in a MySQL table - these numbers will be
already calculated, and are just being stored for reference.  

In this particular table, the stored data will never be deleted or
changed. The only actions performed will be SELECTs and INSERTs.  There
will never be any DELETEs or UPDATEs.

The end-result charts themselves have many rows. Let's say about 40-100
rows per chart. I will be storing thousands of charts.  I am definitely
looking at the possibility of there being millions (upon millions?) of
rows at some time in the future.  I am very aware of relational database
concepts, and one large table truly makes the most sense in this
particular situation.  Believe me, there are foreign keys up the wahoo.

A general question to the experts out there (or just people with
experience  ;))... what kind of things should I 'look out for', or
prepare for, when designing/using a table that is so large?   My previous
MySQL tables in all my years of programming have never really had more
than 40,000 rows.  This table will of course be the largest table in a
database that holds many more tables (about 20 or 30).

Here are things I am concerned/wondering about.  Any advice on them would
be very much appreciated.

1) Table type: all the tables in the database that depend on data being
inserted/updated to other tables are all InnoDB using transactions. This
table, as mentioned, will strictly be INSERTed to and SELECTed from.  
Should I use MyISAM or InnoDB?  Just about all columns will be storing
numbers, possibly a varchar column or two, and no blobs/text columns.  
However, these columns will need to be 'searched' sometimes.

2) Backing up / Restoring: What should I look out for in terms of
dumping/restoring the table?  Anything other than the traditional

3) If I do go with an InnoDB table... I imagine I will probably have to
change that 'ibdata' setting, or at least get familiar/involved with it,
since this table will be so large.  I've seen the term 'multiple
tablespaces' in reference to InnoDB tables.  Is this something I should
be investigating for a table of this size?  What exactly does it mean?

4) Do I need to think about replication of any sort?  Or slave/master
things?  Or can I just use the database as is?  There would never be more
than 50 or so people using the database at once (if that matters).

The system is a P4 (2.8), running Linux RedHat, MySQL 4.1.7 (though I'm
pretty sure I will have it upgraded to MySQL 5 before putting this into
production or even building it - or do I even need to), 1 GB of RAM....
basically our own box being managed/hosted somewhere across the

I know this is a lot of advice to ask.  I'm self-taught, and have been
re-reading some MySQL books in preperation for this project, which will
be my 'largest' to date, and will be quite complex overall (this is a
small but still critical part of the overall project).  The books teach
well, but they don't offer to much advice to specific situations ;)


Re: Going for a LARGE Table: Any Tips?


very interesting question ;-).

What the really best solution is depends strongly on how frequently you
query the differents parts of the data. Is there something like 'historical'
data (you need to query less often and only with SELECTs) and current data
(which you need all the time)? Then it might be an option to set up an
archive table. Assuming that you're using MySQL 5, you can create views to
query historical and current data together, if necessary (although I would
rather query a single table if possible than the view, even if it might seem
more convinient to always use the view).

For the "current" table I would use InnoDB, because locking is done at the
row level, so you'll probably get a better performance when doing SELECTs
and INSERTs simulaniously. For the historical tables, if you do not
necessarily need foreign key constraints for them (otherwise you would be
bound to InnoDB), I would rather choose MyISAM. With MyISAM, you can also
create compressed read only tables (if you're sure that there will be no
further INSERTs) that consume less disk space and can be queried faster.
With MyISAM, you could e.g. create one archive table for every month and
merge them together - and use this merged table in the view to get a
complete data set. This way, the data is split up into smaller pieces, but
you can still query them all together, if needed, and take advantage from
techniques that are preferable for dynamic data and for static data at the
same time.

This approach would need some maintainance work. You would need to create a
new archive table every new month, move the old data to this new archive
table and re-create the merge table to include the new table. A Stored
Procedure could help you to automate this process. But you would also gain
advantage in doing backups, because you would only have to backup the
archive tables once (they won't change anymore) and you can create smaller
and more frequent backups on the current data.

Replication should be an option, if the performance is still not suffient
(although I think that it will work for 50 users without replication). If
you use replication, you'd do best to use the master server to do the
INSERTs and the slave server to do the SELECTs. In that case, you can set
different indexes for the master table and for the slave table. As indexes
can slow down INSERTs, you could set the indexes only on the slave server's
table (except any unique indexes that you need to ensure data integrity, of
course) to speed up the SELECT statements, but not to slow down the INSERTs.
However, if the server's at a *very* high load (which is very unlikely for
50 users), the slave server could be a bit behind the master server. But in
most cases, the replication slave is almost as up to date as the master
server (maybe 1/100 of a second behind or so ;-)).

Well, so I guess, I gave you something to think about and maybe some other
newsgroup users a basis for further discussion ;-).


Site Timeline