Continuously Growing Data

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I'm curious about how websites deal with the problem of continuously
growing data. For example, there are many forums that preserve all
posts for many years back, and are continuously receiving and storing
new posts all the time. No website has access to an infinite amount of
storage space, so:

What kind of buffer (i.e. space available) do they like to have, or in
other words, how much time do they have before they would run out of
space based on the rate of data growth and storage space?

What do they do when they are running out of space on a particular
machine; how do they increase their storage space?

Is maintaining this kind of data stream (like a forum) feasible for a
small-scale website, like one that I might want to host from my own
computer (with about 20 GB of space left) or from a cheap web host
(that probably wouldn't offer me any more than something like 60 GB)?



Re: Continuously Growing Data

On 2008-08-15, bgold12 wrote:
Quoted text here. Click to load it

     As the average size of a forum post is not likely to be more than
     5KB, you could store 200,000 posts in 1GB. How many posts do you
     expect to get?

   Chris F.A. Johnson                      <
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)

Re: Continuously Growing Data

bgold12 wrote:
Quoted text here. Click to load it

I've been running a website that gathers data since the early 1990's.
Over time it has had to be upgraded, but not because it ever ran out of
space, but because it needed to get faster to cope with increasing
numbers of users and more complex interactions.  One side affect of
those upgrades (about 6) is that nearly each time the newer server came
with more disk space, simply because disk size seem to follow Moore's
law (well, the popular misrepresentation).

Most of the data collected by websites has had to have been typed by a
person at some time or other, and that puts an upper limit on how fast
it can accumulate. Next time you read something on a website, pick one
word, and try to imagine the person typing it.

Steve Swift

Re: Continuously Growing Data

Quoted text here. Click to load it

However you can order it from Amazon et al. Moore's Law is still
enough to throw hardware at the problem for most cases. By the time
you need that expanded storage, it'll be cheap enough to afford it.
The trick is to plan for this in advance and not do anything that will
prevent you upgrading regularly.

Another question is how to "store" information so that you can
generate small summaries of large datasets, without having to allocate
space to store everything. How does Google count what the most popular
recent searches are, from within the vast number of different
searches?  Every new popular search for "Heath Ledger dead joker" had
to begin somewhere with that single first search, which wasn't
immediately obvious as to be worth tracking. How do you track these,
without needing vast storage?  As it happens there's a clever
algorithm. This month's American Scientist magazine has an interesting
article on it.

Site Timeline