|
Posted by nolo contendere on June 9, 2008, 10:11 am
Please log in for more thread options > > =A0 =A0 The customary Perl approach for processing all the lines in a fi=
le is to
> > =A0 =A0 do so one line at a time:
>
> > =A0 =A0 =A0 =A0 =A0 =A0 open (INPUT, $file) =A0 =A0 || die "can't open $=
file: $!";
> > =A0 =A0 =A0 =A0 =A0 =A0 while (<INPUT>) {
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 chomp;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 # do something with $_
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> > =A0 =A0 =A0 =A0 =A0 =A0 close(INPUT) =A0 =A0 =A0 =A0 =A0 =A0|| die "can'=
t close $file: $!";
>
> I recently had to rework a program that used the above method in many
> places to load in small files (100 lines or so) containing product
> information, sales tax, ect. for a simple cart because a high placed
> security firm that was overseeing the project felt that the above
> method caused too much lag on servers and a possible collision with
> users. Not wanting to argue I followed their advice and replaced it
> with the @lines =3D <INPUT> method shown below. Not that it matters now,
> but did they have a valid point?
>
> > =A0 =A0 This is tremendously more efficient than reading the entire file=
into
> > =A0 =A0 memory as an array of lines and then processing it one element a=
t a
> > =A0 =A0 time, which is often--if not almost always--the wrong approach. =
Whenever
> > =A0 =A0 you see someone do this:
>
> > =A0 =A0 =A0 =A0 =A0 =A0 @lines =3D <INPUT>;
>
> > =A0 =A0 you should think long and hard about why you need everything loa=
ded at
> > =A0 =A0 once. It's just not a scalable solution. You might also find it =
more fun
> > =A0 =A0 to use the standard Tie::File module, or the DB_File module's $D=
B_RECNO
> > =A0 =A0 bindings, which allow you to tie an array to a file so that acce=
ssing an
> > =A0 =A0 element the array actually accesses the corresponding line in th=
e file.
>
What exactly do you mean by "load in"? And what did *they* mean by
"caused too much lag on servers and a possible collision with users"?
If you're loading into a database, the best method would be to bulk
load if possible, using the bulk loader util of whichever database
you're using. Barring that, another efficient way would be to batch up
the records you're loading, using parameter binding, etc. Was the line-
by-line method slow before because they were loading and committing
every record?
|