Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Parsing Large Files

I recently received a reply to a previous post which is almost the answer I
needed.  The problem is, when I tried it at work, it wouldn't work.  The
reason it didn't work was because we have an early version of Perl at work
which does not support "values" as in:

my(%nid, %id);

foreach (values %id) {
  foreach my $y (keys %$_) {
     foreach (values %}) {
        die "duplicate $_!" if exists $nid;

Can this be done in a different way that doesn't use values?  Also, I do
care about the x-values as well, so I don't think the code above will work
as is.  Basically, I'm parsing one file to get an ID given an x,y,z
cartesian coordinate.  This gives me the hash %id which is keyed by
coordinates, whose values are id numbers.  Then I close
this file, and open a large results file which only contains ids with
results.  I want to pull out all values for a unique y into a separate file
and sort them by x ascending.  Each id occurs 21 times in the large file
for a total of 21 subcases.  I want them printed as in:

id    value(subcase1)  value(subcase2) value(subcase3) . . .
id's ordered by x ascending in a file for a unique value of y.

Any help would be greatly appreciated.

Re: Parsing Huge Files



Thursday 09 December 2004 03:04:10 pm



References: 1

> All,
> I have a huge text file (around 700MB) that I am trying to parse.

Whether that is huge depends on your RAM.  I just might be able to throw
that into a big old hash and work with it that way.  Can you?

> I was
> successful, but I don't think I have done it very efficiently.

What kind of efficiency are you concerned with?  CPU, memory, programmer
time, maintainer time?

> Basically
> in my program I have a hash containing some unique id numbers which are
> keyed by cartesian (x,y,z) coordinates.
> #!/usr/bin/perl -w
> use strict;
> my %id;
> $id = 123456;
> $id = 246891;
> $id = 169245;
> $id = 274321;
> # End of example code
> Here is where it is getting difficult for me.  I have 8 possible
> y-values, 420 x-values and 1 z-value

If there is only one possible z-value, what is the point of having a

1*8*420 = 3360 different IDs (assuming all are different)

It seems like you only really care about the Y value of any given ID,
and not the X or Z.
So invert and prune the hashes so that $nid="-11", etc.

my %nid;

foreach (values %id) {
  foreach my $y (keys %$_) {
     foreach (values %}) {
        die "duplicate $_!" if exists $nid;

> The format of my 700MB file is something like
> #ID             Value
> 123456          200
> 274321          100
> 246891          400
> 169245          600
> 123456          300
> 274321          50
> 246891          600 #each element id occurs 24 times once for each
> subcase

Well then, I assume the subcases must occur in order?

3360 different IDs, times 24 occurences, = 80640 lines you care about.
Out of 700MB/15char per line = 50 million lines.

You can surely store 80640 lines, no?

> What I want to do is for each unique y value, I want to create a file
> named after each y-value which looks like the following
> element ID      Value(subcase1) Value(subcase2) Value(subcase3) . . .
> 123456          200             300     400
> 246891          400             600     800
> 169245          600             800     999
> So basically, given a fixed y and z value, loop over all x values and get
> element ids which are then searched for in the big file and reformatted
> into individual files for each unique y-value.  Is there a very efficient
> way to do this in Perl?

my %y_stuff; # hash by Y of hash by ID of list of values
while (<>) { chomp;
  my ($id,$v)=split;
  next unless exists $nid;
  push @}}, $v;

Printing this out is left as an exercise.


Re: Parsing

> The reason it didn't work was because we have an early version of
> Perl at work which does not support "values" as in:

The values() function has been in perl since perl version 1. I very
much doubt that is your problem. If you give actual error messages
instead of your misinterpretations of them, maybe someone can help you.
-- /
    "Rational thought. It's an acquired taste." -- Gunn, Angel: the Series

Site Timeline