|
Posted by Chris Mattern on May 1, 2008, 12:42 pm
Please log in for more thread options >
>> Cosmic Cruizer wrote:
>>> I've been able to reduce my dataset by 75%, but it still leaves me
>>> with a file of 47 gigs. I'm trying to find the frequency of each line
>>> using:
>>>
>>> open(TEMP, "< $tempfile") || die "cannot open file
>>> $tempfile:
>>> $!";
>>> foreach (<TEMP>) {
>>> $seen++;
>>> }
>>> close(TEMP) || die "cannot close file
>>> $tempfile: $!";
>>>
>>> My program keeps aborting after a few minutes because the computer
>>> runs out of memory.
>>
>> This line:
>>
>>> foreach (<TEMP>) {
>>
>> reads the whole file into memory. You should read the file line by
>> line instead by replacing it with:
>>
>> while (<TEMP>) {
>>
>
><sigh> As both you and Sinan pointed out... I'm using foreach. Everywhere
> else I used the while statement to get me to this point. This solves the
> problem.
>
> Thank you.
Didn't realize your file had so many duplicates (and thus such a small
set of unique lines). If it works, that's great!
--
Christopher Mattern
NOTICE
Thank you for noticing this new notice
Your noticing it has been noted
And will be reported to the authorities
|