Huge Data Handling

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Hi Guys,

I am trying to edit some bioinformatic package written in perl which
was written to handle DNA sequence of about 500,000 base long (a
string containg 500000 chrs)..

I have to enhance it to handle 100 million base long DNA...

Each base in DNA has this information, base (A, C, G or T), qual
(0-99), position (1-length)

there is one main DNA sequence and on average 500,000 parts (max 2000
chrs long with the same set of information)...

The program first creates an alignment like
Part -

Now, lets say I have to go thorugh each position and find how many
variations are present at certain position (with their original
position and quality).

Look at * position, there is T-A variation

Right now they are using hash to caputure this

%A, %C, %G, %T

Loop For Main DNA {
                                        $A = $qual; # this tells
me that there is A base at certain position
with some qual for main

Update the qual by adding the qual of parts

Loop For Parts {
           $A += $qual # for A parts

            $T += $qual $ for T parts
But because the dataset is huge, it consumes lot of memory...

so basically I am trying to figure out a way to store this information
without using much memory

If you dont understand the above problem, dont worry....

just tell me how to handle huge data which need to accessed frequently
using least possible memory..

Thanks in advance

Re: Huge Data Handling

Vishal G wrote:
Quoted text here. Click to load it

Oh is that all!?


Re: Huge Data Handling

Vishal G wrote:
Quoted text here. Click to load it

perldoc -q "How can I make my Perl program take less memory"

Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

Site Timeline