# Comparing Lists

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on

I'm starting to come across situations in which I need to compare
two lists item-for-item, and it occurs to me that Perl doesn't
have any easy way to do that. I'm ending up having to compare
one element at a time, like so:

elsif
(
48 == \$bytes[ 0] &&  38 == \$bytes[ 1] && 178 == \$bytes[ 2]
&& 117 == \$bytes[ 3] && 142 == \$bytes[ 4] && 102 == \$bytes[ 5]
&& 207 == \$bytes[ 6] &&  17 == \$bytes[ 7] && 166 == \$bytes[ 8]
&& 217 == \$bytes[ 9] &&   0 == \$bytes[10] && 170 == \$bytes[11]
&&   0 == \$bytes[12] &&  98 == \$bytes[13] && 206 == \$bytes[14]
&& 108 == \$bytes[15]
)
{
\$new_suffix = '.wma';
}

That seems clumsy, but trying to compare two lists directly doesn't
work in Perl because Perl interprets them as scalar expressions using
the "comma operator", like so:

say "YES" if (7,6,4) == (3,9,4); # compares 4 to 4, so "YES"

Arrayifying those doesn't work either, because of scalar context:

say "YES" if @ == @; # compares 3 to 3, so "YES"

Switching to smartmatch *does* work, but YIKES this syntax is weird:

say "YES" if @ ~~ @; # prints nothing
say "YES" if @ ~~ @; # prints YES

so in my first example above I could do:

elsif ( @bytes[0,1,2,3,4,5,6,7,8,910,11,12,13,14,15]
~~ @ )
{
\$new_suffix = '.wma';
}

It works, but seems overly complex to me.

Is there a simpler, more-direct way to do list comparisons?

Wait, I can see a couple of simplifications. Replace the indices
with 0..15, and try leaving out the @{}. Probably won't work, though:

elsif ( @bytes[0..15]
~~ [48,38,178,117,142,102,207,17,166,217,0,170,0,98,206,108] )
{
\$new_suffix = '.wma';
}

HUH.  That actually works.  Apparently ~~ implicitly dereferences [].

Now, I wonder if there's a way to do "are these lists identical?"
comparisons without the smartmatch operator? (Other than &&ing together
comparisons of individual elements, that is.)

--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'
http://www.well.com/user/lonewolf/

## Re: Comparing Lists

use List::Util qw(all);

say "Identical" if scalar(@foo) == scalar(@bar)
&& all { \$foo[\$_] == \$bar[\$_] } \$0..\$#foo;

It is a little bit stronger than your original && based code as it also
compares the length of the arrays, but otherwise it should be quite
equivalent to it.

This is probably one of the cases where I would actually like to use the
smartmatch operator, but most of the complex rules keeps me from adding
smartmatch to my active use of perl.

//Makholm

## Re: Comparing Lists

On 28/5/2015 7:37 πμ, Robbie Hatley wrote:

you did not defy clear what your want, only some no-sense snippets
so here is something that make sense to me

my @list1 = qw/a b c/;
my @list2 = qw/a D c/;

for(my \$i=0; \$i<= (\$#list1 == \$#list2 ? \$#list1 : die 'dif size'); \$i++)
{ ie "no at \$i" unless \$list1[\$i] eq \$list2[\$i]}

## Re: Comparing Lists

Doing the \$#list1 == \$#list2 ? \$#list1 : die 'dif size' comparison after
every iteration despite it's result will (ordinarily) never change seems
wasteful (and also confusing) to me.

## Re: Comparing Lists

[...]

For some definition of 'works', namely, when what the smart match
operator happens to do for a given pair of arguments happens to be what
you want, cf

perl -e 'print [1,2,3] ~~ [1,[4, sub { 1 },'Terpsichore'], 3], "\n"'

This prints 1 because the 2 from the first array is smart-matched
against all elements of the array in the 2nd position of the 2nd
array. Since the subroutine in the 2nd position of that always returns
1, the 'scalar sub truth' (people who come up with terms like 'sub
truth' without noticing that this is nonsense should never be in charge of
'abstractions' ...) test for this subroutine will succeed no matter what
the left argument happens to be.

It's generally impossible to write a general comparison routine in Perl,
anyway, because depending on what is supposed to be compared how, either
== or eq needs to be used. And both of them can be overloaded. Provided
there's some idea how to deal with the expected elements on both lists,
I'd use something a la

sub cmpl
{
return unless @ == @;
\$_[0][\$_] == \$_[1][\$_] or return for 0 .. \$#;

return 1;
}

This could be

cmpl(\@\@)

then, 'literal' array could be used as arguments instead of references
to arrays but I don't like the idea that it's impossible to tell how a
subroutine will handle its arguments without knowing its definition.

## Re: Comparing Lists

[...]

Unrelated remark: I think you should consider getting rid of the
mathematician's folly of eternally assserting that you PROTEST(!!1)
against the fact that = is both used for assignment and returns a value
by dancing the Yoda whenever something mutable is to be compared with
a constant: A question usually asks for attributes of an object, eg 'Is
the car blue' and not whether a certain attribute belongs to the object,
'Is blue the car?'.

## Re: Comparing Lists

On 5/28/2015 5:08 AM, Rainer Weikusat wrote:

LOL! Noticed that, you did? ;-) Yep, that's a personal habbit I fostered
back in 2002-2008 when I was doing heavy programming in C and C++.
After getting bit a few times by writing "if (c=3) "
when I actually meant "if (c==3)", I got fed up and started writing
such things back-assward:

if (37 == blue_cars)
{
Thoughts = tell_me_all_your_thoughts_on_God(Bob);
}

Puzzling, it may seem; but works, it does.

--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'
http://www.well.com/user/lonewolf/

## Re: Comparing Lists

Considering that this has been handed down as "r3@lly ph@t 1ns1d3rz
tr1ck" among generations of people who felt the desire to put their
refusal to learn this in writing, it's not puzzling at all, just
annoying to anyone else. Which is IMHO the whole point of it, especially
considering that it doesn't work as lvalues also need to be compared
with other lvalues.

## Re: Comparing Lists

At 3:22PM on the morning of 5/28/2015, Rainer Weikusat wrote
(regarding the habit of some C, C++, and Perl programmers to write
equality tests as '37 == \$blue_cars' instead of '\$blue_cars == 37):

I'll allow as how it can seem "annoying", especially in light of its
violation of natural language patterns.

However, I think that in most cases, annoyance is not the purpose.
The purpose is to avoid hard-to-troubleshoot run-time errors by
replacing them with easy-to-troubleshoot compile-time errors.

For example, the following will cause any C++ compiler to abort
compilation:

if (37 = blue_cars) {std::cout << "Hi!" << std::endl;} // C++ ERROR

Now, it's true that that trick won't work for LVALUES, as in this
(probably buggy) Perl code:

say "Hi!" if \$fred = \$bob; # VALID Perl code, but probably wrong.

However, saying that the trick is pointless because it won't work
in all cases is like saying that people shouldn't use seatbelts
and airbags because "sometimes they fail". You'd need a better
reason than that to eschew them, I think. You'd need to show
they cause more problems than they solve.

So I think it boils down to this question: "which is more annoying:
puzzling run-time errors, or 'backwards' equality tests?" I'll go
for the backwards tests. I don't mind doing things backwards.

--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'
http://www.well.com/user/lonewolf/

## Re: Comparing Lists

In my experience these are not hard to troubleshoot errors.  They show
up immediately on testing if not before (see below).

C++ compilers.  Perl will also warn about such constructs (if asked).  I tink

<snip>
--
Ben.

## Re: Comparing Lists

This sticks out so plainly as obviously wrong that it ought to cause
someone else to 'abort compilation' before ever starting it. = is an
operator symbol without any natural meaning and in the context of a
C-like language, it means 'assignment'. This may have been a bad idea,
at least some people are seriously convinced of that (usually called
Niklaus or Wirth), but the corresponding descision was made almost
half a century ago: This discussion is long over and the people whose
opinion differed didn't prevail. That's a fact of life. Event elementary
school pupils are ruthlessly expected to learn much more complicated
things and considering this, the argument seem pretty unreal to me, not
really distinguishable from - say - arguing about how many angels can
dance on the head of a pin, and just about as useful.

Here's a more interesting one:

say "Tasty!" if "bread" == \$foodstuff;

This is almost certainly wrong as it converts the value of \$foodstuff to a
number and compares it with 0.

I didn't write anything about 'seatbelts and airbags', not the least
opinion. But picking up your example nevertheless: Assuming someone sits
on an ordinary airoplane seat, seatbelt fastenend, and seat and someone
are than dropped out of a plane bomb bay at an altitude of 10,000 feet,
that someone is going to be killed regardless of the seatbelt. Likewise,
there's no reodering of arguments to comparisons which will cause a
compiler to flag an erroneous assignment when a comparison of two
l-values was intended. The 'failure rate' is 100%, thus, everyone
programming who isn't (somehow) prevented/ prohibited from comparing
l-lvalues must be capable of handling the situation.

[...]

The runtime errors are transient, the intentionally confusing text
forever.

## Re: Comparing Lists

On Wednesday, May 27, 2015 at 9:37:19 PM UTC-7, Robbie Hatley wrote:

Assuming no binary data or memory-busting lists:

say "same" if join("",@list1) eq join("",@list2);

--
Charles DeRykus

## Re: Comparing Lists

On 5/28/2015 2:10 PM, C.DeRykus wrote:

Hmmm. Yes, that might work.

Or even...

elsif
(
substr(\$buffer, 0, 16) eq
join '', map
48,38,178,117,142,102,207,17,166,217,0,170,0,98,206,108
)
{
\$new_suffix = '.wma';
}

\$buffer is the first 50 bytes of a file. What I'm doing here is
looking at the first 50 bytes of each file in a set of files of
unknown type, and using that info to decide what types of files
I'm dealing with, then set the extensions accordingly. This is
useful for files scavenged from browser caches, as those generally
do not have file name extensions such as ".txt", ".jpg", ".mp3",
etc.

I pulled out the ordinals into an array called @bytes, thinking that
it would be easier to compare byte patterns that way. But I now think
that best way to compare lists in Perl is... DON'T.

Since all the numbers on both sides of the comparisons are in the
0-255 range, I can use the join/map/chr construct above to convert
various byte patterns to strings and compare them to same-size
substrings of \$buffer.

Of course I could skip the "join '', map " and just make strings
out of the byte patterns like so:

"\x3b\x25\xd2\x1e\x6b\xc2\x97\x8e\xa7"

BUT, that would require converting all the numbers form decimal to hex,
and involves lots of ugly \x\x\x\x\x\x\x\.

I like my "join '', map " idea better. (Provided that it works;
I haven't tried it yet.)

--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'
http://www.well.com/user/lonewolf/

## Re: Comparing Lists

Perl handles arbitrary binary data in strings just fine. Including that
it supports building binary strings from numerically represented input
bytes, ie your

join '', map 48,38,178,117,142,102,207,17,166,217,0,170,0,98,206,108

can also be expressed as

pack('C*', 48,38,178,117,142,102,207,17,166,217,0,170,0,98,206,108)

'C' here referring to the C type 'unsigned char'.

[...]

One of the nice things about perl is that it can run arbitrary Perl code while compiling so the pack('C*', ...) can also be used like this

use constant SIGNATURE => pack('C*', ....);

## Re: Comparing Lists

On 5/29/2015 5:50 AM, Rainer Weikusat wrote:

I think I saw "pack" in the the list of "functions" in the Camel book
but skipped over it, not appreciating the value of "packing" things.
But your "pack" code does look simpler and more direct than join/map/chr.
And in this case I wouldn't need to use "unpack" either, as I'm comparing
fixed patterns to a \$buffer containing the first 50 bytes of whatever
file I'm currently looking at.

Interesting, but really just moves the signature definitions up to the
top of the file instead of down in the "set_extension" subroutine where
they're being used. I prefer having them spelled out right inside the
elsifs's where the extensions are being set. This script has quite a
few different extensions in it (all of them the types of files you
find in browser caches with no file name extensions to tell you
what kind of files they are). Just a small sample (some of them now

# ====== AVI ======= :
elsif ("AVI" eq substr(\$buffer, 8, 3))
{
\$new_suffix  = '.avi';
}

# ====== FLAC ======= :

elsif ( 'fLaC' eq substr(\$buffer,0,4) )
{
\$new_suffix = '.flac';
}

# ======= FLV ======= :

elsif ('FLV' eq substr(\$buffer,0,3) )
{
\$new_suffix = '.flv';
}

# ====== GIF ======= :

elsif ( 'GIF' eq substr(\$buffer,0,3) )
{
\$new_suffix = '.gif';

}

# ====== JPG ======= :

elsif ( pack('C3',255,216,255) eq substr(\$buffer,0,3) )
{
\$new_suffix = '.jpg';
}

# ======= MP4 ======= :

elsif ( 'ftypmp4' eq substr(\$buffer,4,7) )
{
\$new_suffix = '.mp4';
}

# ====== PDF ======= :

elsif ( 'PDF' eq substr(\$buffer,1,3) )
{
\$new_suffix = '.pdf';
}

# ====== PNG ======= :

elsif ( 'PNG' eq substr(\$buffer,1,3) )
{
\$new_suffix = '.png';
}

# ====== RAR ======= :

elsif ( 'Rar' eq substr(\$buffer,0,3) )
{
\$new_suffix = '.rar';
}

# ====== WMA ======= :

elsif
(
pack('C[16]',  48,  38, 178, 117, 142, 102, 207,  17,
166, 217,   0, 170,   0,  98, 206, 108)
eq substr(\$buffer, 0, 16)
)
{
\$new_suffix = '.wma';
}

# ====== DEFAULT ======= :
else
{
\$new_suffix = '.unk';
}

--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'
http://www.well.com/user/lonewolf/

## Re: Comparing Lists

[...]

[...]

[10 repetitions of the same code working with different data]

>    # ====== DEFAULT ======= :
>    else
>    {
>       \$new_suffix = '.unk';
>    }

One doesn't have to stop with moving the signatures out of the matching
code, all of the useful data can be collected in a 'signature data
structure' and the redundant data can be dropped, leading to something
like

---------
use constant S_SIG =>    0;
use constant S_OFS =>    1;
use constant S_EXT =>    2;

my @sigs = (
['AVI',    8,    'avi'],
['fLaC',    0,    'flac'],
['FLV',    0,    'flv'],
['GIF',    0,    'gif'],
['ftypmp4',    4,    'mp4'],
['PDF',    1,    'pdf'],
['PNG',    1,    'png'],
['Rar',    0,    'rar' ],
[pack('C*', 48, 38, 178, 117, 142, 102, 207, 17, 166, 217, 0, 170, 0, 98, 206, 108),
0,    'wma'],
[pack('C*', 255, 216, 255),
0,    'jpg']);

sub guess_ext
{
for (@sigs) {
if (substr(\$_[0], \$_->[S_OFS], length(\$_->[S_SIG])) eq \$_->[S_SIG]) {
return '.'.\$_->[S_EXT];
}
}

return '.wtf ';
}
---------

[purposely somewhat more verbose than I'd usually write it]

pack('C*', ...)

means 'pack everything which is available' --- it's not only that
computers are better at counting than humans but this also means the
input list can be changed without also changing the item count.

## Re: Comparing Lists

CD> Assuming no binary data or memory-busting lists:

CD> say "same" if join("",@list1) eq join("",@list2);

Even disregarding the inefficiency, it's annoying to get warnings about
undefined values, and of course, you probably don't want to treat "" the
same as undef... which converting it to a string will do.

Here's a simple version, lightly tested.

Ted

#+begin_src perl
sub differ
{
my @list1 = (undef,2,3);
my @list2 = (undef,2);

return 1 if scalar @list1 != scalar @list2;

my \$differ = 0;
while (scalar @list1)
{
my \$x = pop @list1;
my \$y = pop @list2;

# xor
return 1 if (defined \$x && !defined \$y) || (!defined \$x && defined \$y);

# because of the above, we now know \$x and \$y are either both defined
# or both undefined (the latter case makes them equal)

return 1 if (defined \$x && defined \$y) && (\$x ne \$y);   # note this assumes you want string equality
}

return 0;
}
#+end_src

## Re: Comparing Lists

On Wednesday, June 3, 2015 at 6:53:16 AM UTC-7, Ted Zlatanov wrote:

Of course with no "memory-busting lists", efficiency may not be concerning; undef warnings easily quashed; and "" treated same as undef a non-factor.

Or not...

## Re: Comparing Lists

CD> Of course with no "memory-busting lists", efficiency may not be
CD> concerning; undef warnings easily quashed; and "" treated same as
CD> undef a non-factor.

CD> Or not...

I think a correct efficient solution is strictly more useful, so I'll
just point to List::Compare on CPAN now.

Ted

## Re: Comparing Lists

solves the problem" doesn't really answer the original question,
especially not without at least showing an example of how it's supposed
to be used (in this case, it would need just about as much code to be
written as a direct comparison routine in order to execute some part
of hundreds of lines of code hidden in the module). Also, there is
really no 'correct' way to compare two lists in Perl as the choice of
==/ != vs eq/ ne vs 'something completely different', eg \$a - \$b <
0.0001 for comparing floating point numbers and whether or not undef
needs to be handled and how it is supposed to be handled depends on the
situation.

Eg, a subroutine which expects references to two arrays whose elements
can be compared via eq/ne but which avoids uninitialized value warnings
and distinguishes between undef and an empty string, testing the 'lists'
for equality, could look like this:

sub zl_eq
{
local (*a, *b) = @_;

defined(\$a[\$_]) ? defined(\$b[\$_]) && \$a[\$_] eq \$b[\$_] : !defined(\$b[\$_]) or return
for 0 .. \$#a;

return 1;
}