Click here to get back home

Reading whole file into memory. Parsing 'C' like file efficently

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Reading whole file into memory. Parsing 'C' like file efficently n_macpherson 06-17-2008
Get Chitika Premium
Posted by n_macpherson on June 17, 2008, 6:49 am
Please log in for more thread options
I know there are a number of FAQs which disscourage reading whole
files into memory rather than line by line.

However my problem is as follows.

I am reading a file which is a language which looks like (but isn't )
C. I need to insert comments / documentation at various points in the
file. However sometimes I don't know what I want to insert until I get
well past the current line - for example


for(i=0;i<64;i++)
{
// lots of code
}

Say my opening brace is on line 95 and my closing brace 195 I want to
insert a comment

// for loop ends line 195

at line 94 (i.e immediately above the opening brace). The problem is
that processing line by line I don't know until I get to line 195 what
I have to change at line 9 so I have to store lines 94 to 195 in
memory anyway

Similarly if I read a function header, I want to insert some
documentation before the function header
so I don't believe processing the file line by line is the best
solution here. As I will be inserting extra lines into the middle of
an array I think I am going to need a module to do this.

Memory won't be an issue - my largest file will only be 6000

I've been away from Perl for a while but I seem to remember there was
a module File::Tie which might be suitable.

I'd be grateful if anyone has any suggestions - the people who will be
using this don't normally use Perl so I'd like to avoid using any non-
standard modules if possible

Thanks

Niall

Posted by Jürgen Exner on June 17, 2008, 7:12 am
Please log in for more thread options
n_macpherson@sky.com wrote:
>Similarly if I read a function header, I want to insert some
>documentation before the function header
>so I don't believe processing the file line by line is the best
>solution here.

Based on what you said I would tend to agree.

If that kind of automated annotation is useful is a different story,
thou. I doubt it. Like for

>Say my opening brace is on line 95 and my closing brace 195 I want to
>insert a comment
>// for loop ends line 195

First of all a proper indentation will provide even better guidance as
to where the loop ends. And second a single block spanning 100 lines is
just plain nuts. A classic rule of thumb used to be that if the code for
a sub doesn't fit on VT220 screen, then it was too long and you should
think about splitting it. There ware two reasons for this:
- you don't want to keep scrolling up and down while thinking about this
sub
- anyting much longer becomes too complex for a single sub

Granted, times have changed and typically you can display many more
lines on modern terminals. But the second reason is still very sound.
Many people will probably consider 30-50 lines of code to be the maximum
length of code that can still be easily viewed and recognized without
too much mental scrolling.

>As I will be inserting extra lines into the middle of
>an array I think I am going to need a module to do this.

Why? Sounds like a perfect job for splice().

jue

Posted by n_macpherson on June 17, 2008, 8:25 am
Please log in for more thread options
>
> First of all a proper indentation will provide even better guidance as
> to where the loop ends. And second a single block spanning 100 lines is
> just plain nuts. A classic rule of thumb used to be that if the code for
> a sub doesn't fit on VT220 screen, then it was too long and you should
> think about splitting it. There ware two reasons for this:
> - you don't want to keep scrolling up and down while thinking about this
> sub
> - anyting much longer becomes too complex for a single sub
>
> Granted, times have changed and typically you can display many more
> lines on modern terminals. But the second reason is still very sound.
> Many people will probably consider 30-50 lines of code to be the maximum
> length of code that can still be easily viewed and recognized without
> too much mental scrolling.
>

One of the reasons I am writing this script is because we have
introduced coding standards which specify a maximum of 300 lines per
function and 70 lines for a while/if/else/for loop and I need to
highlight places in our scripts where this occurs. I agree 300 lines
for a function is probably too long but in the language concerned
anything less than 200 would be completely impractical unfortunately.

The indentation is a good point - our developers mostly develop on
site which means a variety of editors ( UltraEdit, Visual Studio,
Notepad++, our own proprietary editor ) are used. This means
indentation across scripts becomes inconsistent. One of the functions
of the script I am writing will be to make sure the indentation
conforms to the coding standards.

> Why? Sounds like a perfect job for splice().

Yes - I'd forgotten splice() will allow me to insert into the middle
of an array (as I said I have been away from Perl for a little
while) . That should work fine for my purposes.

Posted by bugbear on June 17, 2008, 8:17 am
Please log in for more thread options
n_macpherson@sky.com wrote:
> I know there are a number of FAQs which disscourage reading whole
> files into memory rather than line by line.

Problem-dependant IMHO.

If you're anywhere in the area of processing
log files, booking files,data streams,
line by line is the way, especially where
the unit of processing is the line,
or small number of lines.

But for files known to be small,
where the fundamental unit of processing
isn't the line, reading and processing
the whole thing "as one" is the better way.

BugBear

Posted by xhoster on June 17, 2008, 2:15 pm
Please log in for more thread options
n_macpherson@sky.com wrote:
> I know there are a number of FAQs which disscourage reading whole
> files into memory rather than line by line.

I hope the discourage you from reading whole files into memory
thoughtlessly and without good reason. It seems like you do have a good
reason to read them into memory, so go ahead and do it. There is even a
module, File::Slurp, to facilitate it.

...
>
> Memory won't be an issue - my largest file will only be 6000

Those are famous last words :)

I remember many times when I've said "it will only ever be X large" and
then had to eat those words. But of course, I suspect there are many many
more times that my statement held true and it never did get much larger,
but those ones don't force themselves back into your attention the way the
other ones do.

>
> I've been away from Perl for a while but I seem to remember there was
> a module File::Tie which might be suitable.

For 6000 lines of code, you should be a long long way from needing
Tie::File. In fact, last time I investigated it, the memory overhead for
Tie::File was so large that, unless your file's lines are very long, much
longer than one generally finds in a computer program, it provided little
memory benefit over slurping the file.

>
> I'd be grateful if anyone has any suggestions -

Don't worry about this particular problem until it has proven itself
to be an issue (which it probably won't)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Similar ThreadsPosted
Huge Memory Load for reading into memory November 6, 2006, 7:10 pm
Reading value from File and using in another file August 4, 2006, 1:42 pm
reading a txt file November 6, 2004, 12:37 pm
Help reading a file July 26, 2007, 12:08 am
Net::SFTP from memory instead of file? August 22, 2005, 6:49 pm
reading a file to the end regardless of result January 15, 2006, 4:40 pm
Reading the first .jpg file from a .rar archive? June 21, 2006, 7:59 am
reading file on a different machine January 14, 2008, 9:00 pm
Reading file issue February 28, 2008, 4:13 pm
Perl read file eat up my memory... June 1, 2008, 2:41 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap