Click here to get back home

Accomodate for poor db design using Sort::Maker?

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Accomodate for poor db design using Sort::Maker? DJ Stunks 12-09-2006
Posted by DJ Stunks on December 9, 2006, 12:48 am
Please log in for more thread options


Hi all, topical question here regarding Sort::Maker

I have a poorly designed db table over which I have no control. The
data isn't normalized but for each row I'd like to return the column
numbers which have the highest number of counts, and the highest column
number with any counts at all. I refer to these values as mode and max
respectively, but please refer to my code and it's output below for a
better understanding. (I have provided sample input data and mimicked
the fetchrow_hashref function provided by DBI)

Both sorts seem klunky, and after the discussion about the efficiency
gains given by Sort::Maker in another thread I'd like to take advantage
if I can (the sorts must be performed for each row returned by the
database).

I'm not sure how to pass in the information in the $hashref to
make_sorter. Advice would be greatly appreciated.

TIA,
-jp

Code:
#!/usr/bin/perl

use strict;
use warnings;

print "id\tmode\tmax\n";
while ( my $hashref = fetchrow_hashref() ) {
        my $id = $hashref->;

        # mode = col number with greatest count
        my $mode = ( map { $_->[0] }
         sort { $b->[1] <=> $a->[1] }
         map { [ $_, $hashref-> ] } 1..5 )[0];

        # max = highest col number with any count
        my $max = ( sort { $b <=> $a }
         map { $_ if $hashref-> > 0 } 1..5 )[0];

        $max = 1 if $max eq ''; #all zeros

        print "$id\t$mode\t$max\n";
}

{ my @columns;

sub fetchrow_hashref {

        # silence warning that $line may be equal to 0 below
        no warnings 'misc';
        my $line = <DATA> or return;

        if ( $. == 1 ) {
                @columns = split ' ', substr $line, 1;
                return fetchrow_hashref();
        }

        my %hash;
        @hash{ @columns } = split ' ', $line;

        return \%hash;
} }

__DATA__
#id col1 col2 col3 col4 col5
1 2 8 8 0 0
2 5 0 0 0 0
3 3 2 8 2 5
4 4 3 8 2 0
5 5 7 7 0 0
6 9 5 6 0 0
7 6 4 5 7 0
8 2 3 0 0 0
9 9 2 3 3 5
10 0 4 1 0 0
11 0 0 0 0 0
12 1 1 1 1 1

Output:
id mode max
1 2 3
2 1 1
3 3 5
4 3 4
5 2 3
6 1 3
7 4 4
8 2 2
9 1 5
10 2 3
11 1 1
12 1 5


Posted by John W. Krahn on December 9, 2006, 12:05 pm
Please log in for more thread options


DJ Stunks wrote:
> Hi all, topical question here regarding Sort::Maker
>
> I have a poorly designed db table over which I have no control. The
> data isn't normalized but for each row I'd like to return the column
> numbers which have the highest number of counts, and the highest column
> number with any counts at all. I refer to these values as mode and max
> respectively, but please refer to my code and it's output below for a
> better understanding. (I have provided sample input data and mimicked
> the fetchrow_hashref function provided by DBI)
>
> Both sorts seem klunky, and after the discussion about the efficiency
> gains given by Sort::Maker in another thread I'd like to take advantage
> if I can (the sorts must be performed for each row returned by the
> database).
>
> I'm not sure how to pass in the information in the $hashref to
> make_sorter. Advice would be greatly appreciated.
>
> TIA,
> -jp
>
> Code:
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> print "id\tmode\tmax\n";
> while ( my $hashref = fetchrow_hashref() ) {
>         my $id = $hashref->;
>
>         # mode = col number with greatest count
>         my $mode = ( map { $_->[0] }
>          sort { $b->[1] <=> $a->[1] }
>          map { [ $_, $hashref-> ] } 1..5 )[0];
>
>         # max = highest col number with any count
>         my $max = ( sort { $b <=> $a }
>          map { $_ if $hashref-> > 0 } 1..5 )[0];
>
>         $max = 1 if $max eq ''; #all zeros


If you want efficiency then you should just use loops instead of sorting:


my ( $mode, $curr ) = ( 1, 0 );
for ( 1 .. 5 ) {
if ( $hashref->{ "col$_" } > $curr ) {
$curr = $hashref->{ "col$_" };
$mode = $_;
}
}

my $max = 1;
for ( reverse 1 .. 5 ) {
if ( $hashref->{ "col$_" } > 0 ) {
$max = $_;
last;
}
}



John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall

Posted by harryfmudd [AT] comcast [DOT] on December 9, 2006, 2:42 pm
Please log in for more thread options


John W. Krahn wrote:
> DJ Stunks wrote:
>
>>Hi all, topical question here regarding Sort::Maker
>>
>>I have a poorly designed db table over which I have no control. The
>>data isn't normalized but for each row I'd like to return the column
>>numbers which have the highest number of counts, and the highest column
>>number with any counts at all. I refer to these values as mode and max
>>respectively, but please refer to my code and it's output below for a
>>better understanding. (I have provided sample input data and mimicked
>>the fetchrow_hashref function provided by DBI)
>>
>>Both sorts seem klunky, and after the discussion about the efficiency
>>gains given by Sort::Maker in another thread I'd like to take advantage
>>if I can (the sorts must be performed for each row returned by the
>>database).
>>
>>I'm not sure how to pass in the information in the $hashref to
>>make_sorter. Advice would be greatly appreciated.
>>
>>TIA,
>>-jp
>>
>>Code:
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>>
>> print "id\tmode\tmax\n";
>> while ( my $hashref = fetchrow_hashref() ) {
>>         my $id = $hashref->;
>>
>>         # mode = col number with greatest count
>>         my $mode = ( map { $_->[0] }
>>          sort { $b->[1] <=> $a->[1] }
>>          map { [ $_, $hashref-> ] } 1..5 )[0];
>>
>>         # max = highest col number with any count
>>         my $max = ( sort { $b <=> $a }
>>          map { $_ if $hashref-> > 0 } 1..5 )[0];
>>
>>         $max = 1 if $max eq ''; #all zeros
>
>
>
> If you want efficiency then you should just use loops instead of sorting:
>
>
> my ( $mode, $curr ) = ( 1, 0 );
> for ( 1 .. 5 ) {
> if ( $hashref->{ "col$_" } > $curr ) {
> $curr = $hashref->{ "col$_" };
> $mode = $_;
> }
> }
>
> my $max = 1;
> for ( reverse 1 .. 5 ) {
> if ( $hashref->{ "col$_" } > 0 ) {
> $max = $_;
> last;
> }
> }
>
>
>
> John

Have you looked at List::Util? It's a standard module in Perl 5.8, and
contains a max function. So the first loop becomes something like

use List::Util qw;

my $mode = max (map } 1..5);

Assuming the only zeros are trailing, the second loop could be

my $max = grep {$_ > 0} map } 1..5

since the value of a list in scalar context is the number of elements in
the list. If you have zeros embedded, this won't give the right answer,
since what it really does is find the number of values > 0, not the
highest index with a value > 0.

Of course, for either of these you might want to make the hashref into a
real list, to avoid mapping twice.

Tom Wyant

Posted by John W. Krahn on December 10, 2006, 2:09 am
Please log in for more thread options


harryfmudd [AT] comcast [DOT] net wrote:
>
> Have you looked at List::Util? It's a standard module in Perl 5.8, and
> contains a max function. So the first loop becomes something like
>
> use List::Util qw;
>
> my $mode = max (map } 1..5);
>
> Assuming the only zeros are trailing, the second loop could be
>
> my $max = grep {$_ > 0} map } 1..5
>
> since the value of a list in scalar context is the number of elements in
> the list. If you have zeros embedded, this won't give the right answer,
> since what it really does is find the number of values > 0, not the
> highest index with a value > 0.
>
> Of course, for either of these you might want to make the hashref into a
> real list, to avoid mapping twice.

The OP's code produces:

id mode max
1 2 3
2 1 1
3 3 5
4 3 4
5 2 3
6 1 3
7 4 4
8 2 2
9 1 5
10 2 3
11 1 1
12 1 5

But your's produces:

id mode max
1 8 3
2 5 1
3 8 5
4 8 4
5 7 3
6 9 3
7 7 4
8 3 2
9 9 5
10 4 2
11 0 0
12 1 5

So neither $mode nor $max is correct for all lines.




John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall

Posted by harryfmudd [AT] comcast [DOT] on December 11, 2006, 12:43 pm
Please log in for more thread options


John W. Krahn wrote:
> harryfmudd [AT] comcast [DOT] net wrote:
>
>>Have you looked at List::Util? It's a standard module in Perl 5.8, and
>>contains a max function. So the first loop becomes something like
>>
>>use List::Util qw;
>>
>>my $mode = max (map } 1..5);
>>
>>Assuming the only zeros are trailing, the second loop could be
>>
>>my $max = grep {$_ > 0} map } 1..5
>>
>>since the value of a list in scalar context is the number of elements in
>>the list. If you have zeros embedded, this won't give the right answer,
>>since what it really does is find the number of values > 0, not the
>>highest index with a value > 0.
>>
>>Of course, for either of these you might want to make the hashref into a
>>real list, to avoid mapping twice.
>
>
> The OP's code produces:
>
> id mode max
> 1 2 3
> 2 1 1
> 3 3 5
> 4 3 4
> 5 2 3
> 6 1 3
> 7 4 4
> 8 2 2
> 9 1 5
> 10 2 3
> 11 1 1
> 12 1 5
>
> But your's produces:
>
> id mode max
> 1 8 3
> 2 5 1
> 3 8 5
> 4 8 4
> 5 7 3
> 6 9 3
> 7 7 4
> 8 3 2
> 9 9 5
> 10 4 2
> 11 0 0
> 12 1 5
>
> So neither $mode nor $max is correct for all lines.
>
>
>
>
> John

My mistake, corrected by DJ Stunks in adjacent post; I was returning the
mode, rather than the index of the mode. Etc.

Tom Wyant

Similar ThreadsPosted
ANNOUNCE: Sort::Maker .02 September 2, 2004, 5:09 am
Sort::Maker : anonymous sub is compiled outside of my module December 5, 2006, 8:25 am
Sort::Maker: style => 'plain' difficulty December 14, 2006, 4:35 am
Sort::Maker: (Notes) The plain and the orcish don't include the "init_code" December 14, 2006, 7:32 am
CGI scripts and modular design? September 28, 2004, 9:41 am
The Month in Perl Software Design: Review of January 2005 February 10, 2005, 10:39 pm
[ANN] Sort::Key 0.02 April 28, 2005, 11:02 am
Win32::OLE::Const 'Microsoft Excel' - sort & freeze pane April 30, 2008, 12:48 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap