Multi-field sorting (general case)

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
The goal is to sort by multiple fields. Got an idea from other post.

my-prog 0 1n 2
=> first alphabetically field-0, then numerically field-1, then alphabetically field-3
(1n means field-1 is numerically sorted)
See code below.

How can I generalize this for arbitrary N-fields? For example, for N=4 (0,1,2,3),
my-prog 2 3n 0 1n
etc.

(3 fields, hard-coded case)
==============================================
\$re[\$_] = qr/^(?:\S+\s+)(\S+)/ for 0..2;

for (sort { (\$a=~/\$re[0]/)[0] cmp (\$b=~/\$re[0]/)[0] or
(\$a=~/\$re[1]/)[0] <=> (\$b=~/\$re[1]/)[0] or
(\$a=~/\$re[2]/)[0] cmp (\$b=~/\$re[2]/)[0] } <DATA> ) { print; }

__DATA__
yy      7       perl
xx      101     hello
yy      33      world
xx      33      world
yy      101     hello
yy      7       hello
===============================================

Output:
xx      33      world
xx      101     hello
yy      7       hello
yy      7       perl
yy      33      world
yy      101     hello

James

Re: Multi-field sorting (general case)

Perl's sort() is stable. That means if you have a sorted array A1 and
you sort if again by a new criteria, then those elements which have the
same priority by the new criteria will among themself stay in the old
order as previously established in A1.

Therefore all you have to do is to loop over your criteria, one after
the other, as many as you like, starting with the least significant.

jue

Re: Multi-field sorting (general case)

On 29/1/2015 03:50, James wrote:

use strict;
use warnings;
my \$Sorter = BuildSorter( qr/(\S+)/, 'cmp', '<=>', 'cmp');

for (sort \$Sorter <DATA> )

sub BuildSorter
{
my \$regex = shift;
my @code;
for (my \$i=0; \$i<@_;\$i++) {
push @code, '(\$a=~/\$regex/g)'. "[\$i] \$_[\$i] ". '(\$b=~/\$regex/g)['.\$i.']'}
eval 'sub{'. join(' or ', @code). '}'
}

__DATA__
yy      7       perl
xx      101     hello
yy      33      world
xx      33      world
yy      101     hello
yy      7       hello

Re: Multi-field sorting (general case)

A naive approach similar to the one George posted, just with less
insistence pm doing things backwards for the sake of ... well ... doing
them backwards,

--------------
use constant CMPS_TMPL =>     'sub { my (@fa, @fb); @fa = \$a =~ /(\S+)/g; @fb = \$b =~ /(\S+)/g; %s; }';
use constant CMP_TMPL =>    '\$fa[%d] %s \$fb[%d]';

sub build_cmp
{
my \$spec = \$_[0];
my @cmp;

for (\$spec =~ /(\S+)/g) {
/(\d+)(.*)/;
push(@cmp, sprintf(CMP_TMPL, \$1, \$2 eq 'n' ? '<=>' : 'cmp', \$1));
}

eval(sprintf(CMPS_TMPL, join(' or ', @cmp)));
}

my \$cmp = build_cmp('0 1n 2');

print(sort \$cmp <DATA>);

__DATA__
yy      7       perl
xx      101     hello
yy      33      world
xx      33      world
yy      101     hello
yy      7       hello
----------------

A drawback of this is that sort will run the comparison routine more
than once for every element of the input list and each time it runs, it
splits both string arguments into fields, despite they were already
splitted into field the last time sort was called with a particular
argument. This can be avoided by doing a so-called 'Schwartzian
transform' of the input data, eg,

-----------------
use constant CMP_TMPL =>    '\$a->[1][%d] %s \$b->[1][%d]';

sub build_cmp
{
my @cmp;

for (\$_[0] =~ /(\S+)/g) {
/(\d+)(.*)/;
push(@cmp, sprintf(CMP_TMPL, \$1, \$2 eq 'n' ? '<=>' : 'cmp', \$1));
}

eval(sprintf('sub { %s }', join(' or ', @cmp)));
}

sub field_sort(\$@)
{
my \$cmp = shift;

map { \$_->[0]; } sort \$cmp (map { [\$_, [/(\S+)/g]] } @_);
}

my \$cmp = build_cmp('0 1n 2');
print(field_sort(\$cmp, <DATA>));

__DATA__
yy      7       perl
xx      101     hello
yy      33      world
xx      33      world
yy      101     hello
yy      7       hello
------------------

Re: Multi-field sorting (general case)

* James wrote in comp.lang.perl.misc:

My http://search.cpan.org/perldoc?List::OrderBy module lets you write

my @sorted = order_cmp_by { \$fields[0] }
then_by { \$fields[1] }
then_cmp_by { \$fields[3] } @unsorted;

The `then_` functions return an object that is ultimately used by the
`order_` functions that sorts the elements by first extracting a key
followed by a `sort` call that loops through all "levels" until it can
distinguish two elements.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
D-10243 Berlin · PGP Pub. KeyID: 0xA4357E78 · http://www.bjoernsworld.de
Available for hire in Berlin (early 2015)  · http://www.websitedev.de/

Re: Multi-field sorting (general case)

use Sort::Maker ;

uri