Click here to get back home

Reading whole file into memory. Parsing 'C' like file efficently

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Reading whole file into memory. Parsing 'C' like file efficently n_macpherson 06-17-2008
Get Chitika Premium
Posted by Ben Morrow on June 17, 2008, 2:54 pm
Please log in for more thread options

Quoth xhoster@gmail.com:
> n_macpherson@sky.com wrote:
>
[slurping a file into an array]
> > I've been away from Perl for a while but I seem to remember there was
> > a module File::Tie which might be suitable.
>
> For 6000 lines of code, you should be a long long way from needing
> Tie::File. In fact, last time I investigated it, the memory overhead for
> Tie::File was so large that, unless your file's lines are very long, much
> longer than one generally finds in a computer program, it provided little
> memory benefit over slurping the file.

One major advantage of Tie::File is that the interface is exactly the
same as a slurped array, so if/when memory does become a problem, you
can simply replace

use File::Slurp qw/read_file/;

my @data = read_file 'name';

with

use Tie::File;

tie my @data, 'Tie::File', 'name' or die "can't read 'name': $!";

and leave the rest of the code unchanged.

Ben

--
Many users now operate their own computers day in and day out on various
applications without ever writing a program. Indeed, many of these users
cannot write new programs for their machines...
-- F.P. Brooks, 'No Silver Bullet', 1987 [ben@morrow.me.uk]

Posted by xhoster on June 17, 2008, 3:48 pm
Please log in for more thread options
> Quoth xhoster@gmail.com:
> > n_macpherson@sky.com wrote:
> >
> [slurping a file into an array]
> > > I've been away from Perl for a while but I seem to remember there was
> > > a module File::Tie which might be suitable.
> >
> > For 6000 lines of code, you should be a long long way from needing
> > Tie::File. In fact, last time I investigated it, the memory overhead
> > for Tie::File was so large that, unless your file's lines are very
> > long, much longer than one generally finds in a computer program, it
> > provided little memory benefit over slurping the file.
>
> One major advantage of Tie::File is that the interface is exactly the
> same as a slurped array, so if/when memory does become a problem, you
> can simply replace
>
> use File::Slurp qw/read_file/;
>
> my @data = read_file 'name';

This uses 3 times as much memory as reading in the file in a while loop
and pushing it into the array. It seems like it should only be two times
as much, but it isn't (And it is 1.5 times as much @data=<$fh> takes). Of
course, most of that excess memory is eligible for later reuse, provided
your program survives and needs it.

>
> with
>
> use Tie::File;
>
> tie my @data, 'Tie::File', 'name' or die "can't read 'name': $!";
>
> and leave the rest of the code unchanged.

But my lament is that this just doesn't save all that much memory over
an already efficient slurping method, due to the overhead of Tie::File's
internal structures. I checked again on the latest Tie::File, and based on
vague recollections it does seem substantially better than the older one I
played around with, but still the memory overhead is not an insignificant
fraction of what it would be to just slurp a large file of short lines. So
I consider Tie::File to be an emergency measure I'd throw at a program to
keep it limping along while I redesign and rewrite. (Not that there is
anything wrong with that)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Posted by smallpond on June 17, 2008, 2:56 pm
Please log in for more thread options
n_macpherson@sky.com wrote:
> I know there are a number of FAQs which disscourage reading whole
> files into memory rather than line by line.
>
> However my problem is as follows.
>
> I am reading a file which is a language which looks like (but isn't )
> C. I need to insert comments / documentation at various points in the
> file. However sometimes I don't know what I want to insert until I get
> well past the current line - for example
>
>
> for(i=0;i<64;i++)
> {
> // lots of code
> }
>
> Say my opening brace is on line 95 and my closing brace 195 I want to
> insert a comment
>
> // for loop ends line 195
>
> at line 94 (i.e immediately above the opening brace). The problem is
> that processing line by line I don't know until I get to line 195 what
> I have to change at line 9 so I have to store lines 94 to 195 in
> memory anyway
>
> Similarly if I read a function header, I want to insert some
> documentation before the function header
> so I don't believe processing the file line by line is the best
> solution here. As I will be inserting extra lines into the middle of
> an array I think I am going to need a module to do this.
>
> Memory won't be an issue - my largest file will only be 6000
>
> I've been away from Perl for a while but I seem to remember there was
> a module File::Tie which might be suitable.
>
> I'd be grateful if anyone has any suggestions - the people who will be
> using this don't normally use Perl so I'd like to avoid using any non-
> standard modules if possible
>
> Thanks
>
> Niall

1) Read the file into an array of lines.
2) Build a hash of your inserts, key=line number.
3) Write out the updated file inserting each new line as you get to it.

This way you don't have to modify the array which would change the
line indices.

** Posted from http://www.teranews.com **

Posted by cartercc on June 17, 2008, 6:26 pm
Please log in for more thread options
On Jun 17, 6:49 am, n_macpher...@sky.com wrote:
> Say my opening brace is on line 95 and my closing brace 195 I want to
> insert a comment
>
> // for loop ends line 195
>
> at line 94 (i.e immediately above the opening brace). The problem is
> that processing line by line I don't know until I get to line 195 what
> I have to change at line 9 so I have to store lines 94 to 195 in
> memory anyway
>
> Similarly if I read a function header, I want to insert some
> documentation before the function header
> so I don't believe processing the file line by line is the best
> solution here. As I will be inserting extra lines into the middle of
> an array I think I am going to need a module to do this.

I might approach this by matching delimiters. You can certainly match
delimiters and insert comments just above the opening brace. If you
match on key words (for, while, if, else, etc.) and count your lines,
you can create an intermediate file with a comment template just above
the opening brace, and then manually edit for the final program.
Something like this, maybe:

my $line_counter
my @brace_stack #holds info about your block
while(<INFILE>)
if $_ matches '{'
$line_counter++
push $brace_stack[n]
print OUTFILE "// COMMENT"
print OUTFILE $_
if $_ matches '}'
$line_counter--
pop $brace_stack[n]
print OUTFILE $_
print OUTFILE "// COMMENT"

Obviously, your logic would depend on your coding standard. I wrote
something similar in Java and developed a class that would do
something similar. Perl ought to be a lot easier.

CC

Similar ThreadsPosted
Huge Memory Load for reading into memory November 6, 2006, 7:10 pm
Reading value from File and using in another file August 4, 2006, 1:42 pm
reading a txt file November 6, 2004, 12:37 pm
Help reading a file July 26, 2007, 12:08 am
Net::SFTP from memory instead of file? August 22, 2005, 6:49 pm
reading a file to the end regardless of result January 15, 2006, 4:40 pm
Reading the first .jpg file from a .rar archive? June 21, 2006, 7:59 am
reading file on a different machine January 14, 2008, 9:00 pm
Reading file issue February 28, 2008, 4:13 pm
Perl read file eat up my memory... June 1, 2008, 2:41 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap