Click here to get back home

FAQ 5.28 How can I read in an entire file all at once?

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
FAQ 5.28 How can I read in an entire file all at once? PerlFAQ Server 06-08-2008
Posted by PerlFAQ Server on June 8, 2008, 9:03 am
Please log in for more thread options
This is an excerpt from the latest version perlfaq5.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

5.28: How can I read in an entire file all at once?


You can use the File::Slurp module to do it in one step.

use File::Slurp;

$all_of_it = read_file($filename); # entire file in scalar
@all_lines = read_file($filename); # one line per element

The customary Perl approach for processing all the lines in a file is to
do so one line at a time:

open (INPUT, $file) || die "can't open $file: $!";
while (<INPUT>) {
chomp;
# do something with $_
}
close(INPUT) || die "can't close $file: $!";

This is tremendously more efficient than reading the entire file into
memory as an array of lines and then processing it one element at a
time, which is often--if not almost always--the wrong approach. Whenever
you see someone do this:

@lines = <INPUT>;

you should think long and hard about why you need everything loaded at
once. It's just not a scalable solution. You might also find it more fun
to use the standard Tie::File module, or the DB_File module's $DB_RECNO
bindings, which allow you to tie an array to a file so that accessing an
element the array actually accesses the corresponding line in the file.

You can read the entire filehandle contents into a scalar.

{
local(*INPUT, $/);
open (INPUT, $file) || die "can't open $file: $!";
$var = <INPUT>;
}

That temporarily undefs your record separator, and will automatically
close the file at block exit. If the file is already open, just use
this:

$var = do { local $/; <INPUT> };

For ordinary files you can also use the read function.

read( INPUT, $var, -s INPUT );

The third argument tests the byte size of the data on the INPUT
filehandle and reads that many bytes into the buffer $var.



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.

Posted by Bill H on June 8, 2008, 6:48 pm
Please log in for more thread options
> =A0 =A0 The customary Perl approach for processing all the lines in a file=
is to
> =A0 =A0 do so one line at a time:
>
> =A0 =A0 =A0 =A0 =A0 =A0 open (INPUT, $file) =A0 =A0 || die "can't open $fi=
le: $!";
> =A0 =A0 =A0 =A0 =A0 =A0 while (<INPUT>) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 chomp;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 # do something with $_
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 =A0 =A0 close(INPUT) =A0 =A0 =A0 =A0 =A0 =A0|| die "can't =
close $file: $!";
>

I recently had to rework a program that used the above method in many
places to load in small files (100 lines or so) containing product
information, sales tax, ect. for a simple cart because a high placed
security firm that was overseeing the project felt that the above
method caused too much lag on servers and a possible collision with
users. Not wanting to argue I followed their advice and replaced it
with the @lines =3D <INPUT> method shown below. Not that it matters now,
but did they have a valid point?

> =A0 =A0 This is tremendously more efficient than reading the entire file i=
nto
> =A0 =A0 memory as an array of lines and then processing it one element at =
a
> =A0 =A0 time, which is often--if not almost always--the wrong approach. Wh=
enever
> =A0 =A0 you see someone do this:
>
> =A0 =A0 =A0 =A0 =A0 =A0 @lines =3D <INPUT>;
>
> =A0 =A0 you should think long and hard about why you need everything loade=
d at
> =A0 =A0 once. It's just not a scalable solution. You might also find it mo=
re fun
> =A0 =A0 to use the standard Tie::File module, or the DB_File module's $DB_=
RECNO
> =A0 =A0 bindings, which allow you to tie an array to a file so that access=
ing an
> =A0 =A0 element the array actually accesses the corresponding line in the =
file.

Bill H

Posted by Martijn Lievaart on June 9, 2008, 2:51 am
Please log in for more thread options
On Sun, 08 Jun 2008 15:48:22 -0700, Bill H wrote:

>>     The customary Perl approach for processing all the lines in a
>>     file is to do so one line at a time:
>>
>>             open (INPUT, $file)     || die "can't open $file:
>>             $!"; while (<INPUT>) {
>>                     chomp;
>>                     # do something with $_
>>                     }
>>             close(INPUT)            || die "can't close $file:
>>             $!";
>>
>>
> I recently had to rework a program that used the above method in many
> places to load in small files (100 lines or so) containing product
> information, sales tax, ect. for a simple cart because a high placed
> security firm that was overseeing the project felt that the above method
> caused too much lag on servers and a possible collision with users. Not
> wanting to argue I followed their advice and replaced it with the @lines
> = <INPUT> method shown below. Not that it matters now, but did they have
> a valid point?

I suspect (add in a chomp somewhere) the latter method is somewhat more
efficient. A difference that is dwarfed by the overhead of the I/O, even
for a 100 line file.

Valid point? Nope, difference insignificant.

However, if processing each line takes some time and you want to protect
yourself against the files changing, the second method just makes the
race-window somewhat smaller. You'ld need locking in that case.

Valid point? Nope, wrong advice.

But as we don't know all variables involved, the point may have been
valid, though I suspect not.

M4

Posted by nolo contendere on June 9, 2008, 10:11 am
Please log in for more thread options
> > =A0 =A0 The customary Perl approach for processing all the lines in a fi=
le is to
> > =A0 =A0 do so one line at a time:
>
> > =A0 =A0 =A0 =A0 =A0 =A0 open (INPUT, $file) =A0 =A0 || die "can't open $=
file: $!";
> > =A0 =A0 =A0 =A0 =A0 =A0 while (<INPUT>) {
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 chomp;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 # do something with $_
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> > =A0 =A0 =A0 =A0 =A0 =A0 close(INPUT) =A0 =A0 =A0 =A0 =A0 =A0|| die "can'=
t close $file: $!";
>
> I recently had to rework a program that used the above method in many
> places to load in small files (100 lines or so) containing product
> information, sales tax, ect. for a simple cart because a high placed
> security firm that was overseeing the project felt that the above
> method caused too much lag on servers and a possible collision with
> users. Not wanting to argue I followed their advice and replaced it
> with the @lines =3D <INPUT> method shown below. Not that it matters now,
> but did they have a valid point?
>
> > =A0 =A0 This is tremendously more efficient than reading the entire file=
into
> > =A0 =A0 memory as an array of lines and then processing it one element a=
t a
> > =A0 =A0 time, which is often--if not almost always--the wrong approach. =
Whenever
> > =A0 =A0 you see someone do this:
>
> > =A0 =A0 =A0 =A0 =A0 =A0 @lines =3D <INPUT>;
>
> > =A0 =A0 you should think long and hard about why you need everything loa=
ded at
> > =A0 =A0 once. It's just not a scalable solution. You might also find it =
more fun
> > =A0 =A0 to use the standard Tie::File module, or the DB_File module's $D=
B_RECNO
> > =A0 =A0 bindings, which allow you to tie an array to a file so that acce=
ssing an
> > =A0 =A0 element the array actually accesses the corresponding line in th=
e file.
>

What exactly do you mean by "load in"? And what did *they* mean by
"caused too much lag on servers and a possible collision with users"?

If you're loading into a database, the best method would be to bulk
load if possible, using the bulk loader util of whichever database
you're using. Barring that, another efficient way would be to batch up
the records you're loading, using parameter binding, etc. Was the line-
by-line method slow before because they were loading and committing
every record?

Posted by brian d foy on June 10, 2008, 12:05 pm
Please log in for more thread options
In article

> >     The customary Perl approach for processing all the lines in a file is to
> >     do so one line at a time:
> >
> >             open (INPUT, $file)     || die "can't open $file: $!";
> >             while (<INPUT>) {
> >                     chomp;
> >                     # do something with $_
> >                     }
> >             close(INPUT)            || die "can't close $file: $!";
> >
>
> I recently had to rework a program that used the above method in many
> places to load in small files (100 lines or so) containing product
> information, sales tax, ect. for a simple cart because a high placed
> security firm that was overseeing the project felt that the above
> method caused too much lag on servers and a possible collision with
> users. Not wanting to argue I followed their advice and replaced it
> with the @lines = <INPUT> method shown below. Not that it matters now,
> but did they have a valid point?

It depends on their particular point and your particular task. If they
are worried about lag and collisions, they should have told you to get
onto a database server instead of reading from files. :)

Similar ThreadsPosted
FAQ: How can I read in an entire file all at once? October 8, 2004, 11:10 am
FAQ 5.26: How can I read in an entire file all at once? November 14, 2004, 6:03 pm
FAQ 5.26 How can I read in an entire file all at once? January 21, 2005, 12:03 pm
FAQ 5.27 How can I read in an entire file all at once? April 5, 2005, 5:03 am
FAQ 5.27 How can I read in an entire file all at once? June 20, 2005, 11:03 pm
FAQ 5.27 How can I read in an entire file all at once? October 3, 2005, 10:03 am
FAQ 5.27 How can I read in an entire file all at once? November 19, 2005, 11:03 pm
FAQ 5.27 How can I read in an entire file all at once? July 2, 2006, 9:03 am
FAQ 5.27 How can I read in an entire file all at once? August 23, 2006, 9:03 pm
FAQ 5.27 How can I read in an entire file all at once? November 4, 2006, 3:03 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap