# Most efficient way to do set-comparison

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
Hi all,

If set A defined elements a,b,c (a set of permissible program options,
let's say), and we receive from user options in set B.

I want to make sure that user options in B are exactly same as those
defined in A .... nothing more, nothing less and completely
identical....this represents some kinda set operation if I'm not
wrong...

A U B = A
A int B = A

How to do this most efficiently in Perl...?

## Re: Most efficient way to do set-comparison

Many set operations can be implemented quite easy by using hash'es
where the keys are the elements in the sets.

//Makholm

## Re: Most efficient way to do set-comparison

let's say), and we receive from user options in set B.

Yep...I could make it out, but what is the most efficient way....using
any trick in the book of Perl? I could come up with this:

Listing 1
==========
use warnings;
use strict;

my %legalopts = (a => 1, b => 1, c => 1);

my %opts = (a => 1, c => 1); # failure would be that some elements are
missing (i.e. 'b')
# my %opts = (a => 1, b => 2, c => 3, d => 4); # failure would be that
extra options are present (i.e. 'd')

my @del_list = delete @opts{keys %legalopts};

\$num_def = grep map { defined \$_ } @del_list;
die "Some options are missing in opts" unless @del_list == \$num_def;

if (scalar keys %opts)
{
print "Extra elements found in opts: ";
print "\$_ " foreach keys %opts;
print "\n";
die;
}

Pls. tell me if any better way exists...

## Re: Most efficient way to do set-comparison

Krishna Chaitanya wrote:

\$ perl -le '
%legalopts = (a=>1, b=>1, c=>1); %opts = (a=>1, c=>1);
@L = sort keys %legalopts; @O = sort keys %opts;
print "@L" eq "@O" ? "Ok" : "Not equal"
'
Not equal
\$

--
Email: http://www.gunnar.cc/cgi-bin/contact.pl

## Re: Most efficient way to do set-comparison

Fails with:

%opts = ('a b'=>1, c=>1);

--
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

## Re: Most efficient way to do set-comparison

Ouch! This should fix that:

\$ perl -le '
%legalopts = (a=>1, b=>1, c=>1); %opts = (a=>1, b=>1);
@L = sort keys %legalopts; @O = sort keys %opts;
print "@L" eq "@O" && @L == @O ? "Ok" : "Not equal"
'
Not equal
\$

OTOH you might object again and ask what happens if:

%legalopts = ("a b"=>1, c=>1); %opts = (a=>1, "b c"=>1);

I give up. This simplified approach is not safe, unless you can preclude
those odd combinations based on the rest of the code.

--
Email: http://www.gunnar.cc/cgi-bin/contact.pl

## Re: Most efficient way to do set-comparison

I hate suggestions like that, because the stringifications of the arrays
can be equal when the arrays aren't.

--
Ruud

## Re: Most efficient way to do set-comparison

You want the "symmetric difference" to be empty.

perldoc -q difference

How do I compute the difference of two arrays?
How do I compute the intersection of two arrays?

--
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

## Re: Most efficient way to do set-comparison

Set::Scalar has an is_equal method, and overloads ==

## Re: Most efficient way to do set-comparison

Krishna Chaitanya wrote:

See List:Util and List::MoreUtils.

Or call this:

sub cmp_set {
my %a; @a{ @ } = (); delete @a{ @ };
my %b; @b{ @ } = (); delete @b{ @ };
if ( %b ) {
return 2 if %a;  # dissimilar
return -1;
}
return %a ? 1 : 0;
}

--
Ruud

## Re: Most efficient way to do set-comparison

wrote:

You seem (in another post under this thread) to want more than just intersection
and equality. There is no real efficient way to do this under any language, but
if you want to report over/under relationships, this may help.

This is one approach. There may be more generalization that could be done,
perhaps combining all the mappings into one array. Certainly though, there
is no one-liner hiding for this problem.

-sln

-----------------------------------------------------------------------------

## options2.pl
##

use strict;
use warnings;

my \$Debug = 1;

my %legalopts = (a => 1, b => 1, c => 1, d => 1);
my %useropts  = (a => 1, c => 1, e => 1, f => 1, g => 1);
#my %useropts  = (a => 1, b => 1, c => 1, d => 1);

sub mapOptions {
my (\$legal, \$user) = @_;
my %all = ();
@all { keys % } = ( 1 ) x ( keys % );
@all { keys %  } = map { !defined \$_ ? 2 : 3 } @all { keys % };
\%all;
}
sub getOptMap {
return map { \$options-> == \$mask ? \$_ : () }  sort keys %;
}

##
my \$all       = mapOptions ( \%legalopts, \%useropts );
my @legalonly = getOptMap ( \$all, 1 );
my @useronly  = getOptMap ( \$all, 2 );
my @common    = getOptMap ( \$all, 3 );

##
if ( \$Debug ) {
print "\nDebug:\n";
print "All                        - ",( sort keys % ),"\n";
print "Legal missing/non-matching - ",@legalonly,", elements = ",
scalar(@legalonly), "\n";
print "User too many/non-matching - ",@useronly,", elements = ",
scalar(@useronly), "\n";
print "Common matching            - ",@common,", elements = ", scalar(@common),
"\n";
}
##
if ( @legalonly || @useronly ) {
print "\nError:\n";
print "Legal options are '",( sort keys %legalopts ),"'\n";
print "Missing options '",@legalonly,"'\n"  if (@legalonly);
print "Non-legal options '",@useronly,"'\n" if (@useronly);
} else {
print "\nSucessfully enterred legal options '",( sort keys %useropts ),"' !\n"
}

__END__

output:
-------------

Debug:
All                        - abcdefg
Legal missing/non-matching - bd, elements = 2
User too many/non-matching - efg, elements = 3
Common matching            - ac, elements = 2

Error:
Legal options are 'abcd'
Missing options 'bd'
Non-legal options 'efg'

## Re: Most efficient way to do set-comparison

On Sat, 14 Mar 2009 21:10:26 GMT, sln@netherlands.com wrote:

[snip]

^^^^^^
or, no need for return statement, the map ... is just fine

-sln

## Re: Most efficient way to do set-comparison

On Sat, 14 Mar 2009 21:10:26 GMT, sln@netherlands.com wrote:

So this is possible. There is getOptions() and getOptionsNoCase().
I found it interresting for my own use and, I thought I would drop
it back into here for anybody elses use. All mappings are now in one array,
and only one function call is needed per comparison of parameter pairs.
Since all the parts are broken out, there is many possibilities available
for error reporting. If the rigid standards are removed (its just
interpretation),
the array of common parameter keys can be silently processed quite easily.
That is of interrest to me.

-sln

======================================

## options3.pl
##

use strict;
use warnings;

# indexes, if fully referenced option is needed
use constant OPT_LEGAL_ONLY => 0;
use constant OPT_USER_ONLY  => 1;
use constant OPT_ALL        => 2;
use constant OPT_COMMON     => 3;

sub getOptions {
my (\$legal, \$user) = @_;
my @options = ([],[],[],[]);
# -- 'user only' and common
my %legalonly = %{ \$legal };
for ( sort keys %{ \$user } ) {
push @{    !exists \$legal->{ \$_ } ? \$options[1] : \$options[3] }, \$_;
delete \$legalonly{ \$_ };
}
# -- legal only
@{ \$options[0] } = sort keys %legalonly;
# -- all
@{ \$options[2] } = sort keys %{ { map { \$_ => 1 } (keys %{ \$legal }, keys %{
\$user }) } };
@options;
}

sub getOptionsNoCase {
my (\$l, \$u) = @_;
my \$legal = { map { lc \$_, 1 } keys %{ \$l } };
my \$user  = { map { lc \$_, 1 } keys %{ \$u } };
getOptions( \$legal, \$user );
}

##

my \$Debug = 1;

my %legalopts = (a => 1, b => 1, c => 1, d => 1);
my %useropts  = (a => 1, c => 1, e => 1, f => 1, g => 1);
#my %useropts  = (a => 1, b => 1, c => 1, d => 1); # try a sucessfull match

my (\$legalonly, \$useronly, \$all, \$common) = getOptions ( \%legalopts, \%useropts
);

if ( \$Debug ) {
print "\nDebug:\n";
print "All                        - ",@\$all,      ", elements = ",
scalar(@\$all),       "\n";
print "Legal missing/non-matching - ",@\$legalonly,", elements = ",
scalar(@\$legalonly), "\n";
print "User too many/non-matching - ",@\$useronly, ", elements = ",
scalar(@\$useronly),  "\n";
print "Common matching            - ",@\$common,   ", elements = ",
scalar(@\$common),    "\n";
}

if ( @\$legalonly || @\$useronly ) {
print "\nError:\n";
print "Legal options are '",( sort keys %legalopts ),"'\n";
print "Missing legal user options '",@\$legalonly,"'\n"  if (@\$legalonly);
print "Non-legal user options '",@\$useronly,"'\n" if (@\$useronly);
} else {
print "\nSucessfully enterred legal options '",( sort keys %useropts ),"' !\n"
}

__END__

output:
-------

Debug:
All                        - abcdefg, elements = 7
Legal missing/non-matching - bd, elements = 2
User too many/non-matching - efg, elements = 3
Common matching            - ac, elements = 2

Error:
Legal options are 'abcd'
Missing legal user options 'bd'
Non-legal user options 'efg'

## Re: Most efficient way to do set-comparison

larger one...namely that of how to design a constructor in the best
way in general for .pm that I write. Some of the common tasks in
constructor would involve:

1. creating hash-ref and blessing it with package name
2. populating keys and values in hash-ref with what the user supplied
(this is where this set-comparison comes of use)
3. doing other "validations" on the values of those user-supplied keys
..
..finally returning the blessed referent.

A major discussion here at my work place is - is it good to put strict
checks in new() and bail out the program if user didn't give any
options to it or gave bad options. I'm of this idea. But several
others feel the new() should be benign and when you call other
methods, it should report problem and bail out.

for designing constructors?

Also, is there a cool way to create the get-set methods for the
tricks or something??

## Re: Most efficient way to do set-comparison

There are many. Look in the Class:: namespace, for example.

Many people interested in OO on Perl are moving to using Moose, so you
might want to give that a look.

Ben

## Re: Most efficient way to do set-comparison

wrote:

To me, it all depends on what your module does.

If its a busy module, usually the majority of instance data is module generated
to a defualt state with helper member subs that can change the state during the
course of operations after initialization. So that file-scoped global arrays and
scalars hold some kind of general static data tha is used by the module to
generate
and track dynamic instance data to be stuffed into the "\$self" blessed hash
reference.

Users may pass in arguments (that they can add/change) to new() but ideally, it
should
not be a requirement at construction time. So the module should have the default
parameters
in \$self if none are provided. So in constructors, the module should set up a
default state
then any args passed in to the constructor should be passed to another method,
along with
the instance reference, that add/change values to the hash. Then that method can
be called
later by users to change state at a time past instantiation.
So as a general rule, you want to set up methods both public and private, that
change state,
during runtime. It is possible that over the course of usage, the \$self hash
gets used quite
frequently and it state sometimes needs to be reset before other operations take
place.

Beyond state that persists between method calls, most of the argument
(parameter) checking
is done within user called methods. Most of the parameters are consumed as local
to the method,
however with complex data, processing sometimes permutes to sub calls to private
helper methods
and area's of \$self are sometimes used for scratch instead of passing around the
data again.

It all depends on how you design it. Its based on your needs. If its a big item,
you may want
to build a reporting truss within it that can be called by methods. There may be
different levels
of errors from fatal to misdemeaner. A good logging system is always recommended.

I would always recommend ways to work around just die'ing if you can avoid it,
report the
error and continue. You never know, your module may be used in automation. But
if you have
to croak, do it for a good reason, otherwise report an error and return from sub
with some
indication of error, but without dying.

For setting/getting \$self values, unless its something very distinct, usually
the way its done
is to have a setVal() kind of method that accepts a hash. Processing and
validation is
done via something like:     while (my (\$name, \$val) = splice (@args, 0, 2))
But there a many many ways to do this, it depends on your needs. You can
absolutely force
compliance any way you want.

Good luck.
-sln

## Re: Most efficient way to do set-comparison

The mention of private methods is interesting...esp. in Perl. Since
there are no C++-like keywords (private, et al), it's a lot more work
to ensure privacy in Perl for OO? Like using file-level scoping,
closures, etc? Would love to know how this is done as an example, so
that I make up my own ways of designing classes in future. Any good
CPAN modules you know that have private method implementation?

One of my biggest problems is how to achieve true-blue data
encapsulation in Perl.....much like in C++/Java. Am looking for the
strong OO-like encapsulation with the beauty and terseness of Perl.
What an amazing combo that would be!

Yep, in 1 of the .pm I am writing, I've resorted to create more \$self
keys "internally" to store some of the results of my calculations
using user input, but which I want the object to "remember" and not be
evaluated again and again (the user doesn't know about these keys as
they are not documented). So, the first-time call to this method would
calculate something, create a key on \$self for it and store it. I know
this won't change with repeated calls to this function, so the
subsequent calls will just examine if this key is defined or not and
return the value if defined.

What is a reporting truss, pls? I've searched but didn't find much on
pls. site examples of good CPAN modules that already do all this? I am
prepared to read every line of them to understand these concepts. Pls.
don't mind if I ask for examples, but they really drive home all of
you guys' points...

Thanks for all of your time, help...means much.

## Re: Most efficient way to do set-comparison

wrote:

There probably is. Perl is not really OO though. You can get private subs
via anonymous subs, but most don't need to do it. Users usually don't muck with
unadvertised methods and it won't do what you think anyway.
You can distinguish the private ones with an underscore _func {}, but it is not
private.

Read up in the perlmod.pod for basic info on OO stuff "Guidelines for Module
Creation":

"Generally anything not exported is still accessible from outside the module
using the
ModuleName::item_name (or \$blessed_ref->method) syntax. By convention you can
underscore on names to indicate informally that they are 'internal' and not for
public use.
(It is actually possible to get private functions by saying: my \$subref = sub {
... }; &\$subref;.
But there's no way to call that directly as a method, because a method must have
a name in the symbol table.)"

[snip]

I'll see if I can find a primitave example for you.

Below is a way of using anonymous sub methods as private (if its really
necessary).

-sln

## Test.pl
# ==================

use strict;
use warnings;

use Ptest;

my \$t = Ptest->new();
\$t->PrintWorld( 'Hello' );

# print Ptest::\$p,"\n";  <- won't work, %p is file-scoped lexical in
Ptest!

__END__

## Ptest.pm
# ==================

use strict;
use warnings;

package Ptest;
our @ISA = qw();

# File scope vars
# -----------------
my %p = ();  # hash of private method refs

# User methods
# -------------
sub new
{
my \$class = shift;
my \$self = {
B => 'World',
C => 'Bye',
};
return bless (\$self, \$class);
}

sub PrintWorld
{
my (\$self, \$str) = @_;
\$p ( \$self, 'Hello' );
}

sub PrintBye
{
my \$self = shift;
print "\$self->!\n";
}

# Private methods
# -----------------

\$p = sub
{
my (\$self, \$A) = @_;
print "\$A \$self->\n";
\$self->PrintBye();
};
1;

## Re: Most efficient way to do set-comparison

Hi sln, could you get your hands on an example yet? Thanks much...