# [perl-python] exercise: partition a list by equivalence

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on

here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge(\$pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

For those of you unfamiliar with math lingoes, partition a list means
to create sublists, based on some criteria. In our case, the input list
comes in the form of pairs, and its members are the union of all
members in the pairs. The criterion for partition in our case is that
of equivalence, specified in the pairs input.

Try to write a Python code for this. In the Python code, the input
should be a list of couples. (for Perlers, sketch out the algorithm on
paper and try to code in Python directly.)

I'll post Perl & Python code tomorrow.

==This is brought to you by the Perl-Python community.==
== http://xahlee.org/perl-python/python.html ==

Xah
xah@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

## Re: [perl-python] exercise: partition a list by equivalence

>For those of you unfamiliar with math lingoes, partition a list means
> to create sublists, based on some criteria.

Typical moronic mathematicians with their exclusionary lingoes...why
can't they just say "create sublists based on some criteria" like
normal people?

- alex23

## Re: [perl-python] exercise: partition a list by equivalence

Xah Lee wrote:
> here's another interesting algorithmic exercise, again from part of a
> larger program in the previous series.
>
> Here's the original Perl documentation:
>
> =pod
>
> merge(\$pairings) takes a list of pairs, each pair indicates the
> sameness
> of the two indexes. Returns a partitioned list of same indexes.
>
> For example, if the input is
> merge( [ [1,2], [2,4], [5,6] ] );
>
> that means 1 and 2 are the same. 2 and 4 are the same. Therefore
> 1==2==4. The result returned is
>
> [[4,2,1],[6,5]];
>
> (ordering of the returned list and sublists are not specified.)
>
> =cut

Almost a joke:

from numarray import *

def merge(*pairs):
m, M = min(flattened), max(flattened)

d = M - m + 1
matrix = zeros((d, d), type = Bool)

for x, y in pairs:
X, Y = x - m, y - m

matrix[X, X] = 1
matrix[X, Y] = 1
matrix[Y, X] = 1
matrix[Y, Y] = 1

while True:
next = greater(dot(matrix, matrix), 0)
if alltrue(ravel(next == matrix)):
break
matrix = next

results = []
for i in range(d):
eqls, = nonzero(matrix[i])
if eqls.size():
if i == eqls[0]:
results.append(tuple(x + m for x in eqls))

return results

Disclaimer: I'm not an expert in numarray and suppose the something can
be dramatically imporved.

## [perl-python] exercise: partition a list by equivalence

here's the answer to the partition by equivalence exercise.

-------------------------------------------

# Perl code

sub merge(\$) {
my @pairings = @;

my @interm; # array of hashs

# chop the first value of @pairings into @interm
\$interm[0]=; \$='x';
shift @pairings;

N1: for my \$aPair (@pairings) {
for my \$aGroup (@interm) {
if (exists \$)
='x'; next N1}
if (exists \$)
='x'; next N1}
}
push @interm, ; \$='x';
}

my @fin = shift @interm;

N2: for my \$group (@interm) {
for my \$newcoup (@fin) {
foreach my \$k (keys %\$group) {
if (exists \$) {map { \$='x'}
(keys %\$group); next N2;}
}
}
push @fin, \$group;
}
return map {[keys (%\$_)]} @fin;
}

-----------------------------------------------
# Here's a direct translation of the Perl code above into python:
©def merge(pairings): # pairings is a list of couples. e.g.
[(9,2),(7,6),...]
©    # interm is a list of groups. Each group is a list that hold
©    # equivalent numbers. interm stands for interim result. Each
group
©    # is a dictionary. Keys are numbers, values are all dummy
©    # 'x'. Dictionary is used for ease of dealing with duplicates or
©    # move first pair of pairings into interm as the first group
pairings[0]
©    # go thru pairings. For each pair, check if it is in any group in
©    # interm. If any part of pair is in a group, then add the other
©    # part into that group. If no part of the pair is in any group,
©    # then add this pair into interm as a new group.
break
break
interm[-1][aPair[1]]='x'
©    # now make another pass of the groups in interm, because some
pair
©    # that may connect two groups (i.e. with one element in one
group,
©    # and second element in another group), yet the pair is simply
©    # consumed by a group.
©    # This pass will check if there are any element in any other
©    # group, if so, such two groups will be unioned. In this pass, we
©    # move things from interm into fin. fin==final.
©                    for kk in group.keys(): newcoup[kk]='x';
©    # now turn the dictionaries into lists for return value
©    for group in fin: result.append(group.keys())
-------------------
I wrote this (Perl) program in 2003-09, and now basically forgot all
about the internals. The original Perl code does not have inline
comments, nor public consumable documentation as this is my own
project. In the process of translation and the publication and
algorithm i used as i go thru the lines. I was thinking of a quick
brainless translation word-for-word, but that turned out not possible
as i run into problems.

(While i'm learning Python, i run into frustrations with the Python
Documentation. (although it has far more quality than Perl
documentations). The frustrations with documentations will be appended

The translation problem i run into is this. In Perl, there's a GOTO
construct where in a loop one can say "break XYZ" to jump to a
arbitrary outer loop labeled XYZ. Python has "break" but does not
provide a GOTO jump as in Perl. In the process, i have to actually
figure out (for the first time for me) how loops with GOTO jumps can be
translated to alternative structure. This turned out not to be too
hard. For a GOTO jump to a far outer group, one can use multiple breaks
at the end of each loop, possibly in addiction adding a "else"
clause to the different levels of the loops. (Python language can have
a "else" clause for "for" loops. It is executed when the loop
completes. (as opposed to when a break inside jumped out))

Here is a loop with GOTO, translated into Python without:

N1: for my \$aPair (@pairings) {
for my \$aGroup (@interm) {
if (exists \$)
='x'; next N1}
if (exists \$)
='x'; next N1}
}
push @interm, ; \$='x';
}
break
break
interm[-1][aPair[1]]='x'

Below is another such trans-structure, distinct from the above.

N2: for my \$group (@interm) {
for my \$newcoup (@fin) {
foreach my \$k (keys %\$group) {
if (exists \$) {map { \$='x'}
(keys %\$group); next N2;}
}
}
push @fin, \$group;
}
©                    for kk in group.keys(): newcoup[kk]='x';

The Python version above has not been tested much, but i suppose it is
fine since it is basically a copy of the Perl code. The Perl version is
fairly reliable, as it went thru the gauntlet of special cases testing
and i've used it over the year in the larger program... however no any
formal proof or exhaustive machine testing has been done. Later i might
write some program to auto-test them... but that'd be another day. The
Python version can also use some clean up, or even rewrite as a program
in the Python language proper. Addenda or Errata will be added on this
page.

PS all together there are some 3 or so solutions posted on the
newsgroups. (one by private email) I will have to filter them out and
study them.

Any interesting or important Addenda or Errata will be emailed out
later.
In addition to being archived here:
http://xahlee.org/perl-python/partition_by_equiv.html

Xah
xah@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

## Re: exercise: partition a list by equivalence

The GOTO statement from Perl has been messed up.

This block:
©                    for kk in group.keys(): newcoup[kk]='x';

should be:

comlete code is at:
http://xahlee.org/perl-python/partition_by_equiv.html

Xah
xah@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

## Re: exercise: partition a list by equivalence

an interesting problem so developed now is to write a function that
generate test cases for the purpose of testing performance. (just for
fun)

the design of this function could be interesting. We want to be able to
give parameters in this function so as to spit out all possible screw
test cases. First of all, a range of n (some millions) numbers. Then, a
fraction that specifies the percentage of these number are equivalent.
1 being all equivalent. 0 being all "distinct" (having only one
equivalent member (since the input comes in pairs)). Then we want to
have a parameter that says something like the sizes of the equivalence
groups. Suppose 50% of number are equal among themselves (i.e. have
more than one equivalent member). 1 would be all in one group. 0 would
mean all partitions have size 3 or 2. (there are more to it here...
since this is a distribution) ... Then, we need to turn this range of
integers into pairs. That's something more to think about.

So with this function at hand, we'll be able to say for sure which code
performs better (and under what type of input)

the Order notion in computing mathematics is fairly useless for finer
details.

PS it is not trivial to design this pair generating function. I don't
know if the above is on the right track, but essentially we want to
categorize the type of inputs according to the mathematical operational
performance aspect of partition by equivalence, and distill them into
parameters.

another func to write is one that canonicalize the output. Such as
sorting. So that all results can be compared simply by = in Python.

failing to design a elaborate pair_gen, we can just have pairs of
random numbers. But exactly what nature is such input... is more to

(in my original application, each number represent a computer file,
there are up to tens of thousands of files, and much less than 0.1% is
same as another, and for files that are same, each equivalent group
number no more than 10 or so.)

Xah
xah@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

John Machin wrote:
> FWIW, here's a brief UAT report:
>
> Appears to work: Reinhold, David, Xah...
> Has bug(s): ...

## [perl-python] exercise: partition a list by equivalence

when i try to run the following program, Python complains about some
global name frozenset is not defined. Is set some new facility in
Python 2.4?

©    for x1, x2 in pairings:
©    return [list(aset) for aset in set(sets.itervalues())]

it would be nice if the two working programs do not use some package.
This problem shouldn't need to.

Xah
xah@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

## Re: exercise: partition a list by equivalence

Xah Lee wrote:
> when i try to run the following program, Python complains about some
> global name frozenset is not defined. Is set some new facility in
> Python 2.4?

http://www.python.org/doc/2.3/whatsnew /
http://www.python.org/doc/2.4/whatsnew/whatsnew24.html

You must be running 2.3. If you persist in running 2.3, put the
following at the top of the file:

import sys
PY_VERSION = sys.version_info[:2]
if PY_VERSION == (2,3):
from sets import Set as set
from sets import ImmutableSet as frozenset

That will get Reinhold's gadget working under 2.3. The bearophile's
function uses collections.deque which is built-in to 2.4.

> it would be nice if the two working programs do not use some package.
> This problem shouldn't need to.

Look at the newsgroup again. By my count there are apparently-working
versions from SIX people: Reinhold, bear.+, Brian Beck, David Eppstein,
yourself, and me. Only David's uses stuff that's not in the Python 2.4
off-the-shelf distribution.

## Re: exercise: partition a list by equivalence

John Machin:
>Perhaps I'm missing a posting of yours -- what are merge2 and merge4?
What is "this random pairs test"? What is xMax (nPairs isn't hard to
guess), and how are you generating your input data?<

merge2 and this random pairs test comes from the post by Brian Beck.
merge4 is the first in my last post. And merge3 isn't included (nor
tested) because it uses the original (slow) addArcs.
This is the simple test generator taken from his post:

xMax = 3000
nPairs = xMax*5
l = [[randint(0,xMax), randint(0,xMax)] for i in xrange(nPairs)]
For such high number of nPairs (arcs), the resulting graph usuall has
one big connected component, and few tiny debris... To produce a more
divided graph, nPairs have to be much smaller, like xMax*1.5.

>FWIW, my offering is attached. <

I'm sorry, I don't see the attach... (just the different title).

John Machin:
>Only David's uses stuff that's not in the Python 2.4 off-the-shelf
distribution.<

Well, using already mede functions and classes is good, it avoids you
to reinvent things each time. Some of my versions too use my graph
library (for a problem like this I usually don't create stand alone
versions like merge6 and merge7). And my original point was adding one
graph data structure to the standard Python library :-]
I haven't tested the speed of David Eppstein version...

Anyway, this is a simplified version of merge6, that uses an indirect
graph g, instead of the directed graph, and avoids the use of chain:

.. from collections import deque
.
.. def merge7(l):
..     # Undirect graph g creation:
..     g = {}
..     for n1,n2 in l:
..         if n1 not in g: g[n1] =
..         else: g[n1][n2] = 0
..         if n2 not in g: g[n2] =
..         else: g[n2][n1] = 0
..     # Connected components:
..     result = []
..     visited = set()
..     for root in g.iterkeys():
..         if root not in visited:
..             component = [root]
..             Q = deque() # A queue
..             Q.append(root)
..             while Q:
..                 n1 = Q.popleft()
..                 for n2 in g[n1].iterkeys():
..                     if n2 not in visited:
..                         Q.append(n2)
..                         component.append(n2)
..             result.append(component)
..     return result

It's a bit shorter and a little faster than merge6, memory used is
similar (there is only one dictionary, so there is only one
ammortization overhead of the dictionary, but it contains the sum of
both arcs, so the ammortization can be twice as big, so... memory used
can be the same).
(Usually BFS and DFS memory requirements are a little different; this
implementation uses BFS, and I think it uses a bit less memory than
DFS, but for this problem/test the difference isn't significative.)

Bear hugs,
Bearophile

## Re: exercise: partition a list by equivalence

> > it would be nice if the two working programs do not use some package.
> > This problem shouldn't need to.
>
> Look at the newsgroup again. By my count there are apparently-working
> versions from SIX people: Reinhold, bear.+, Brian Beck, David Eppstein,
> yourself, and me. Only David's uses stuff that's not in the Python 2.4
> off-the-shelf distribution.

Where "stuff" means a single 62-line file that I linked to in my post.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein /

## Re: exercise: partition a list by equivalence

I started to collect i believe the 4 or so solutions by different
people... but seems it's gonna take some an hour or more... So far the
only other one i've run and find alright is Reinhold Birkenfeld's
original. Others worth noting i'm aware of is David Epsteinn, improved
versions from Reinhold Birkenfeld, the one using graphs by bearophile
....

since many of you have already collected and tested these... i wonder
if anyone'd be interested to put together all the (working) code in a
single message? (or even a webpage.)

thanks.

Xah
xah@xahlee.org
http://xahlee.org/PageTwo_dir/more.html

## Re: [perl-python] exercise: partition a list by equivalence

Please consider my submission also (Python 2.3-compatible).

-- Paul McGuire

..    import sets
.
..    input = [[1, 2], [3, 4], [2, 3], [4, 5]]
..    input = [[1, 2], [3, 4], [4, 5]]
..    input = [[1, 2],[2,1], [3, 4], [4, 5],[2,2],[2,3],[6,6]]
.
..    def merge(pairings):
..        ret = []
..        for a,b in pairings:
..            s1 = None
..            s2 = None
..            for s in ret:
..                if a in s or b in s:
..                    if s1 is None:
..                        s1 = s
..                    else:
..                        s2 = s
..                        break
..            else:
..                for s in ret:
..                    if a in s:
..                        break
..                    elif b in s:
..                        break
..                else:
..                    ret.append(sets.Set([a,b]))
..                continue
..            ret.remove(s1)
..            ret.remove(s2)
..            ret.append(s1.union(s2))
.
..        return [list(s) for s in ret]
..
..    print merge(input)

## Re: [perl-python] exercise: partition a list by equivalence

A slightly better version, only walks the set of cumulative list of
sets once per pairing.

-- Paul

..    import sets
.
..    input = [[1, 2], [3, 4], [2, 3], [4, 5]]
..    input = [[1, 2], [3, 4], [4, 5]]
..    input = [[1, 2],[2,1], [3, 4], [4, 5],[2,2],[2,3],[6,6]]
.
..def merge(pairings):
..    ret = []
..    for a,b in pairings:
..        aset = None
..        bset = None
..        for s in ret:
..            if not aset and a in s:
..                aset = s
..            if not bset and b in s:
..                bset = s
..            if aset and bset:
..                break
..        else:
..            if aset:
..            elif bset:
..            else:
..                ret.append(sets.Set([a,b]))
..            continue
..        if aset is not bset:
..            ret.remove(aset)
..            ret.remove(bset)
..            ret.append(aset.union(bset))
.
..    return [list(s) for s in ret]
..
..print merge(input)