How to get offset position from unpack()?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


   The unpack() function is very, very useful for me, as I regularly
do a lot of unpacking of non-Perl-created data strings to see what
information they hold.  If I didn't have the use of the unpack()
function, certain tasks would be much more difficult.

   However, there's something I want to do with unpack() that I
haven't figured out how to do:  I'd like to unpack part of a string,
but keep track of where the unpacking ended, so I can resume unpacking
the string (at a later time) where I left off.

   Here's a trivial example:

   Let's say I have a data string that holds lists of strings, like

   " 2 5hello 5world 2 2hi 5there"

The first number (" 2") signifies that the first list holds two
strings.  The next number (" 5") signifies that the first encoded
string is 5 characters long.  The next number (also a " 5") signifies
the same for the next encoded string.

   So I could write a format string for unpack() to be:  "a2/(a2/a)"

   So the lines of code:

      my $dataString = ' 2 5hello 5world extra data';
      my @a = unpack 'a2/(a2/a)', $dataString;
      print "$_\n"  foreach @a;

would output:


   My question becomes:  What if I want to parse out the extra data
later with a different pack string?  It would be nice if there was a
way to return the current offset somehow with unpack(), so that I
could unpack again with something like this:

      my @b = unpack "\@$offset $newPackString", $dataString;

   Now, I could calculate this offset myself by examining what was
placed in @a, but this gets tricky fast with packstrings that use "Z",
"A", and 'a' (and combinations).

   (Incidentally, C's sscanf() function has a little-known "n" format
character that returns the number of characters consumed.  I'm hoping
that unpack() has a similar feature.)

   I posted a similar question back in 2004, and Anno Siegel responded
with the suggestion of adding "a*" to my first packstring, and then
using the length() of the last element to calculate the offset, like

      my $dataString = ' 2 5hello 5world extra data';
      my @a = unpack 'a2/(a2/a) a*', $dataString;
      my $offset = length($dataString) - length( pop(@a) );
      print "$_\n"  foreach @a;
      my @b = unpack "\@$offset $newPackString", $dataString;

While this approach technically works, repeatedly using "a*" at the
end of a packstring in a continual loop creates a O(n^2) algorithm.
This isn't a problem for short $dataStrings, but is a significant
problem when $dataStrings are long and/or have no limit in length.

   I've noticed that Perl 5.10 added lots of convenient new features
to pack() and unpack() (such as the ability to pack floats and doubles
in an endian-ness different than your own), so I'm hoping that
unpack() now has a way to return the $dataString offset.  However,
I've read both "perldoc -f unpack" and "perldoc -f pack" but I can't
seem to find this behavior documented, if it exists at all.

   So does anyone know if I can get unpack() to return an offset?


   -- Jean-Luc

Re: How to get offset position from unpack()?

Quoted text here. Click to load it

    ~% perl -E'my $x = "aaa"; say for unpack "a2.", $x'


Re: How to get offset position from unpack()?

Quoted text here. Click to load it

   Wow, thanks!  The '.' character was exactly what I was looking for!

   (I notice it's new in Perl 5.10, so if I'm working for platforms
that have an older version of Perl I'll just have to just the old "a*"

   I tried searching for "."'s behavior in "perldoc -f unpack",
"perldoc -f pack", and even "perldoc perlpacktut", but I couldn't find
where it mentions that it returns the offset when used with unpack().
Is there a place that explains this with a little more depth?

   Anyway, thanks for your help, Ben.  I really appreciate it.

   -- Jean-Luc

Re: How to get offset position from unpack()?

Quoted text here. Click to load it

If there is, I didn't find it either. I just tried 'unpack "."' and
'unpack "@"' since they looked like likely possibilities, and "."


Site Timeline