Click here to get back home

Faster file iteration

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Faster file iteration vijay@iavian.com 03-13-2008
Posted by vijay@iavian.com on March 13, 2008, 9:41 am
Please log in for more thread options
use strict;

my $file_1 = '1.txt'; # File 1
my $file_2 = '2.txt'; # File 2

if(open(FH1 , $file_1)){
print "File $file_1 Opened\n";
}else{
print "Failed to Open file $file_1\n";
exit;
}

if(open(FH2 , $file_2)){
print "File $file_2 Opened\n";
}else{
print "Failed to Open file $file_2\n";
close FH1;
exit;
}

while(chomp(my $line_2 = <FH2>)){
my($dummy21,$file21_no,$file21_date) = split(/\s+/,$line_2);
next if($file21_no !~ /\d+/);
my $counter1 = 0;
my $least_date1 = 0;
seek(FH1,0,0);
$least_date1 = date_compare($file21_date);
while(chomp(my $line_1 = <FH1>)){
my($d,$file1_no,$file1_date) = split(/;/,$line_1);
if($file1_no == $file21_no){
$file1_date =~/(\d\d\d\d)(\d\d)(\d\d)/;
my $yr1 = $1;
$file21_date =~/(\d\d\d\d)(\d\d)(\d\d)/;
if(($yr1 - $1) < 5){
$counter1++;
}
}
}
$least_date1 = 0 if($counter1 == 0);
print "$dummy21\t$file21_no\t$file21_date\t$counter1\t
$least_date1\n";
print FH3 "$dummy21\t$file21_no\t$file21_date\t$counter1\t
$least_date1\n";
}

Here $file_1 has around 12000000 records , it takes 2 mins to go for a
single record in $file_2.

Any suggestion to make it fast ?

Posted by Martijn Lievaart on March 13, 2008, 9:47 am
Please log in for more thread options
On Thu, 13 Mar 2008 06:41:59 -0700, vijay@iavian.com wrote:

> Here $file_1 has around 12000000 records , it takes 2 mins to go for a
> single record in $file_2.
>
> Any suggestion to make it fast ?

Read file_1 once, store it in an appropriate datastructure (hash comes to
mind). It still may take two minutes to read, but after that searching is
fast.

Does take some memory, but 12 million records should take less than 100
Megs.

M4

Posted by bugbear on March 13, 2008, 10:52 am
Please log in for more thread options
vijay@iavian.com wrote:
>
> Here $file_1 has around 12000000 records , it takes 2 mins to go for a
> single record in $file_2.
>
> Any suggestion to make it fast ?


Are the two files in date-sorted order?

BugBear

Posted by vijay@iavian.com on March 13, 2008, 11:07 am
Please log in for more thread options
> vi...@iavian.com wrote:
>
> > Here $file_1 has around 12000000 records , it takes 2 mins to go for a
> > single record in $file_2.
>
> > Any suggestion to make it fast ?
>
> Are the two files in date-sorted order?
>
> BugBear

No , they are not sorted on date , no unique key ..

Posted by xhoster on March 13, 2008, 12:19 pm
Please log in for more thread options
> use strict;
>
> my $file_1 = '1.txt'; # File 1
> my $file_2 = '2.txt'; # File 2
>
> if(open(FH1 , $file_1)){
> print "File $file_1 Opened\n";
> }else{
> print "Failed to Open file $file_1\n";
> exit;
> }
>
> if(open(FH2 , $file_2)){
> print "File $file_2 Opened\n";
> }else{
> print "Failed to Open file $file_2\n";
> close FH1;
> exit;
> }
>
> while(chomp(my $line_2 = <FH2>)){
> my($dummy21,$file21_no,$file21_date) = split(/\s+/,$line_2);
> next if($file21_no !~ /\d+/);
> my $counter1 = 0;
> my $least_date1 = 0;
> seek(FH1,0,0);
> $least_date1 = date_compare($file21_date);
> while(chomp(my $line_1 = <FH1>)){
> my($d,$file1_no,$file1_date) = split(/;/,$line_1);
> if($file1_no == $file21_no){

You could pre-load file1 into a hash (by $file1_no) of a list of
lines that have that $file1_no. That way for each line in file2, you
only need to go through those lines of file1 that already meet the
above condition. This by itself should greatly improve things unless
there most of the data is all in the same or just a few $file1_no.


> $file1_date =~/(\d\d\d\d)(\d\d)(\d\d)/;
> my $yr1 = $1;
> $file21_date =~/(\d\d\d\d)(\d\d)(\d\d)/;
> if(($yr1 - $1) < 5){
> $counter1++;
> }

And within a given $file1_no hashed list, you could sort by file1_date,
that way once you meet a non-qualifying date you could abort the loop
early rather than testing all the rest. (This improvement would probably
be quite small, compared to the previous one)


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Similar ThreadsPosted
Use of freed value in iteration August 5, 2004, 7:43 am
Loop Iteration Variable. January 12, 2006, 2:49 am
opendir and readdir vs glob iteration December 1, 2005, 7:31 am
alarm($timeout) timing out only one iteration - Solaris9 May 12, 2005, 8:28 pm
Repeating element string parsing and iteration November 15, 2007, 12:47 am
Print a separator between each iteration of a foreach loop April 15, 2008, 1:24 pm
Database or large array faster? July 13, 2004, 12:01 pm
FAQ 3.16 How can I make my Perl program run faster? February 13, 2005, 6:03 pm
FAQ 3.15 How can I make my Perl program run faster? April 9, 2005, 11:03 pm
FAQ 3.15 How can I make my Perl program run faster? June 25, 2005, 11:03 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap