# sorting a hash / 2008-06-01

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on

I want to sort a hash. The hash contains a list of cities and their
temperature and I want the 4 cities with max temp. The problem is that
the city-names are one extra level deep with the state-name coming in-
between. I wondered whether I should build the hash differently. A
different format would be: state_city, with the underbar separating
the state and the city.
\$hash  = 38 ;
\$hash  = 38 ;

my %hash = () ;
\$hash{San Jose}  = 84 ;
\$hash{San Fran}  = 94 ;
\$hash  = 38 ;
\$hash  = 66 ;
\$hash =  72 ;
\$hash =  96 ;
\$hash{Fort Worth} =  62 ;
\$hash =  96 ;
\$hash =  55 ;
\$hash =  55 ;

How do I sort this hash, please?

## Re: sorting a hash / 2008-06-01

dn.perl@gmail.com wrote:

\$ perl -le'
my %hash = (
Calif => {
"San Jose" => { max_temp => 84 },
"San Fran" => { max_temp => 94 },
Cupertino  => { max_temp => 38 },
Fremont    => { max_temp => 66 },
},
Texas => {
Dallas       => { max_temp => 72 },
Austin       => { max_temp => 96 },
"Fort Worth" => { max_temp => 62 },
},
Mass  => {
Boston     => { max_temp => 96 },
Framingham => { max_temp => 55 },
Worcester  => { max_temp => 55 },
},
);

print "City: \$_->[0]  Temperature: \$_->[1]"
for (
sort { \$b->[ 1 ] <=> \$a->[ 1 ] }
map  { my \$hash = \$_; map [ \$_, \$hash->{ \$_ }{ max_temp } ], keys
%\$hash }
values %hash
)[ 0 .. 3 ];
'
City: Austin  Temperature: 96
City: Boston  Temperature: 96
City: San Fran  Temperature: 94
City: San Jose  Temperature: 84

John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

## Re: sorting a hash / 2008-06-01

I am going to assume you want the cities with the top four maximum
temperatures.

First, note that it might make your life easier later to use standard
abbreviations such as CA for California, TX for Texas. If you need to
also present longer names, you could use a lookup table.

Similarly, I would use the actual identifier for the reporting
temperature measurement station instead of a cutesy abbreviation of the
city name. You could use a lookup table to map station identifiers to
city names.

It is also not nice to post code that you have not tested at all:

C:\Temp> cat s.pl
#!/usr/bin/perl

use strict;
use warnings;

my %hash = () ;
\$hash{San Jose}  = 84 ;
\$hash{San Fran}  = 94 ;
\$hash  = 38 ;
\$hash  = 66 ;
\$hash =  72 ;
\$hash =  96 ;
\$hash{Fort Worth} =  62 ;
\$hash =  96 ;
\$hash =  55 ;
\$hash =  55 ;

C:\Temp> s
Can't locate object method "San" via package "Jose" (perhaps you forgot
to load "Jose"?) at C:\Temp\s.pl line 7.

You need to quote keys that do not consist solely of \w characters.

Second, the data structure is dictated by the problem at hand usually. I
think it is futile to venture whether a different data structure would
be more appropriate using only the information you give in this post. In
general, I do not favor using composite hash keys unless there is a
complelling reason to do so.

However, assuming you are only storing a single attribute for each city,
then I would recommend:

my %maxtemp;

\$maxtemp->{San Jose} = 94;

etc.

By definition, a hash table is unsorted. You may present the contents in
a certain order, but the data structure itself remains unaffected by
that presentational transformation.

That said, a full blown sort seems to be unnecessary here. You said all
you want are the four cities with the highest maximum tempreatures. On
the other hand, simply sorting and picking up the four elements at the
top has a certain appeal as well. Both of these might fail according to
some criteria if there are ties. How do you handle ties?

The following code will find the highest four temperatures and list the
cities with those temperatures. It does not handle the case where there
are fewer than four distinct temperatures in the whole data set.

#!/usr/bin/perl

use strict;
use warnings;

my %maxtemp = () ;
\$maxtemp->{'San Jose'}   = 84;
\$maxtemp->{'San Fran'}   = 94;
\$maxtemp->  = 38;
\$maxtemp->    = 66;
\$maxtemp->     = 72;
\$maxtemp->     = 96;
\$maxtemp->{'Fort Worth'} = 62;
\$maxtemp->      = 96;
\$maxtemp->  = 55;
\$maxtemp->   = 55;

my %cities_by_temp;

for my \$state ( keys %maxtemp ) {
for my \$city ( keys %{ \$maxtemp } ) {
my \$temp = \$maxtemp->;
push @{ \$cities_by_temp }, "\$city, \$state";
}
}

my @highest = (sort { \$b <=> \$a } keys %cities_by_temp)[0 .. 3];

for my \$temp ( @highest ) {
print "\$temp\t", join("\n\t", @} ), "\n";
}

__END__

--
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc /

## Re: sorting a hash / 2008-06-01

On May 30, 5:59 am, "A. Sinan Unur" wrote:

I was asked to list cities with the top 4 maximum temperature, and
there was no word on how to handle ties. But that is a trivial part of
the problem. I suggested a solution (which the client was fine with)
that if a tie makes the list extend beyond 4 where city #4 has temp >
90, list all elements of the tie. If the city #4 has temp <= 90, break
the tie at 4.

Thus :
Austin  105
Miami    93
New York 91
Phoenix 91
Boston   91
Framingham  91    (6 cities listed)
Not listed ===> Dallas  88.

But :
Austin  105
Miami    93
New York 88
Phoenix 88
Not listed ==> Boston   88. Stop after first 4 entries.
Not listed ==> Chicago   88
Not listed ==> Seattle   82

## Re: sorting a hash / 2008-06-01

dn.perl@gmail.com wrote:

Well, I'd rather say it contains three hash references.

This is one sensible way to sort that data structure:

foreach my \$state ( sort keys %hash ) {
print "State: \$state\n";
foreach my \$city ( sort { \$a cmp \$b } keys %{ \$hash } ) {
print "\$city = \$hash\n";
}
print "\n";
}

I'm not sure what you mean by that. Please clarify what's the desired
output.

Probably. This is one idea:

my %hash = (
'San Jose' => {
state => 'Calif',
max_temp => 84,
},
'San Fran' => {
state => 'Calif',
max_temp => 94,
},
);

---------------^^^^^^^^
Hash keys with spaces need to be quoted.

\$hash{'San Jose'}  = 84 ;

--
Email: http://www.gunnar.cc/cgi-bin/contact.pl

## Re: sorting a hash / 2008-06-01

@mid.individual.net:

...

http://en.wikipedia.org/wiki/Athens_ (town),_New_York

http://en.wikipedia.org/wiki/Athens,_Georgia

Sinan

--
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc /

## Re: sorting a hash / 2008-06-01

On Fri, 30 May 2008 13:11:22 UTC, Gunnar Hjalmarsson

Sorry to jump in with another question but I have a very similar
problem. I am processing a consolidated apache2 logfile. I have
multiple virtual hosts. All I care about are the site, the page
served, a counter and the date.

So my hash looks like \$urls Beyond that I have a counter
and date thus:
\$urls[0]++; # count
\$urls[1] = \$date;

This works fine and I can list by site the page, count and date.

foreach \$site ( keys %urls)
{
foreach my \$url (keys %})
{
print "\$site \$url \$urls[0] \$urls[1]\n";
}
}

Putting a sort into the url loop gives me the results sorted by page
as expected. What I cannot figure out is how to do it by count and by
date.

I have tried various ideas I found by google but they all tend to be
similar to this

sub by_count
{
\$urls[0] <=> \$urls[0] or \$a cmp \$b;
}

But this throws lots of "Use of uninitialized value....." errors on
that line and in doing so gets the wrong pages attributed to a site. I
have tried with yet another hash on the end with count & date keys
instead of the array, but it does not help.

I would be grateful for any pointers.

--
Regards
Dave Saville

NB Remove nospam. for good email address

## Re: sorting a hash / 2008-06-01

*SKIP*

Piece of advice.  The next time you'll would like to I<jump in> consider
Because it wasn't seen.

*SKIP*

Hopefully Uri won't see that.  I<\$a> and I<\$b> are special.  However
only in context of B<sort>.

*SKIP*

If I guessed your problem right way, than:

print "\$site \$_ \$urls[0] \$urls[1]\n"
foreach(
sort { \$urls[0] <=> \$urls[0]; }
sort { \$urls[1] <=> \$urls[1]; }
keys %});

(Sorry for extra(?) parenthesis and semicolons;  I'm not sure yet if
B<perl> would compile my intensions right.)  Please note, that supposes
that dates are unix-epoched;  otherwise you must do the comparision by
himself or pass it some other module.

--
Torvalds' goal for Linux is very simple: World Domination

## Re: sorting a hash / 2008-06-01

EP> *SKIP*

>> Sorry to jump in with another question but I have a very similar
>> problem. I am processing a consolidated apache2 logfile. I have
>> multiple virtual hosts. All I care about are the site, the page
>> served, a counter and the date.

EP> Piece of advice.  The next time you'll would like to I<jump in> consider
EP> Because it wasn't seen.

EP> *SKIP*
>> I have tried various ideas I found by google but they all tend to be
>> similar to this

>> sub by_count
>> {
>> \$urls[0] <=> \$urls[0] or \$a cmp \$b;
>> }

EP> Hopefully Uri won't see that.  I<\$a> and I<\$b> are special.  However
EP> only in context of B<sort>.

i did. my eyes are bleeding!

>> I would be grateful for any pointers.

EP> If I guessed your problem right way, than:

EP> print "\$site \$_ \$urls[0] \$urls[1]\n"
EP>   foreach(
EP>     sort { \$urls[0] <=> \$urls[0]; }
EP>     sort { \$urls[1] <=> \$urls[1]; }
EP>     keys %});

are you (or the OP) trying to do a multilevel sort? it looks like yours
will work but it is unusual to do two sort passes. and it relies on the
sort to be stable (meaning equal keys stay in the same ordering post
sort). perl now uses a stable sort but earlier versions didn't. it is
not something you should depend upon.

and of course Sort::Maker makes this easy. (untested):

use Sort::Maker ;
my \$sorter = make_sorter( 'ST', number => '\$urls[0],
number => '\$urls[1] ) ;

my @sorted = \$sorter->( keys %} ) ;

and i don't know the data structure so there may be ways to improve
that.

uri

--
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

## Re: sorting a hash / 2008-06-01

wrote:

True, but the problem looked so similar.

Sorry I don't understand what you are getting at - apart from an in
joke.

<snip>

No not multi level here, just two ways of presenting the data
depending on which \$site it came from. Thanks for the help guys, but
they are only variations on what I had tried with no luck. However, I
have discovered that here (OS/2) there is a bug in perl (5.8.2). I
don't know yet if it is a bug in perl or the port. I suspect the
latter, but

foreach my \$url (sort [0] <=> \$urls[0] }
keys %})

and

foreach my \$url (sort by_value keys %})

sub by_value
{
\$urls[0] <=> \$urls[0];
}

Give different results. The first works correctly and the second for
some reason yet to be determined gets the *wrong* value of \$site. I
stuck a print \$site in the subroutine. That is where all the errors
came from, it was trying to compare site A's urls against site B's -
No wonder there where a lot of errors :-)

I was going to run my test case on my Solaris box but the darn thing
decided to trash its hard drive :-(

Oh, and the date is text and sortable - YYYY/MM/DD.

--
Regards
Dave Saville

NB Remove nospam. for good email address

## Re: sorting a hash / 2008-06-01

DS> No not multi level here, just two ways of presenting the data
DS> depending on which \$site it came from. Thanks for the help guys, but
DS> they are only variations on what I had tried with no luck. However, I
DS> have discovered that here (OS/2) there is a bug in perl (5.8.2). I
DS> don't know yet if it is a bug in perl or the port. I suspect the
DS> latter, but

DS> foreach my \$url (sort [0] <=> \$urls[0] }
DS> keys %})

DS> and

DS> foreach my \$url (sort by_value keys %})

DS> sub by_value
DS> {
DS>   \$urls[0] <=> \$urls[0];
DS> }

DS> Give different results. The first works correctly and the second for
DS> some reason yet to be determined gets the *wrong* value of \$site. I
DS> stuck a print \$site in the subroutine. That is where all the errors
DS> came from, it was trying to compare site A's urls against site B's -
DS> No wonder there where a lot of errors :-)

i highly doubt this is a perl bug. my gut feeling is that you have a
scoping problem. the first sort keeps \$a and \$b inside the sort block
and those will be set correctly. if you lexically declared \$a and \$b in your
code before the by_value sub, those will be used and screw up your
sort. or maybe a different \$site is being used because it is in a
different scope. you need to post more code so we can see the
problem. it isn't just with the above code.

uri

--
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

## Re: sorting a hash / 2008-06-01

wrote:

a

I suspect your are right and I have done something *really* stupid -
but I get the same on a Solaris system. \$a & \$b are correct and from
the correct \$site.

==================
#!/usr/bin/perl

use strict;
use warnings;

my \$site;
my %sites;
my %urls;

\$sites++;
\$sites++;
\$sites++;

\$urls[0]++; # count
\$urls[1] = 'yyyy/mm/dd';
\$urls[0]++; # count
\$urls[1] = 'yyyy/mm/dd';
\$urls[0]++; # count
\$urls[1] = 'yyyy/mm/dd';
\$urls[0]++; # count
\$urls[1] = 'yyyy/mm/dd';
\$urls[0]++; # count
\$urls[1] = 'yyyy/mm/dd';

foreach \$site (keys %sites)
{
print "\nSite: \$site\n";

foreach my \$url (sort [0] <=> \$urls[0]
or \$a cmp \$b} keys %})
{
print "\$site \$url \$urls[0] \$urls[1]\n";
}
}

foreach \$site (keys %sites)
{
print "\nSite: \$site\n";

foreach my \$url (sort by_value keys %})
{
print "\$site \$url \$urls[0] \$urls[1]\n";
}
}

sub by_value
{
print "Sort site: >\$site< \$a \$b\n";
\$urls[0] <=> \$urls[0] or \$a cmp \$b;
}

==========

Yeilds:

==============
Site: SSL
SSL url_2 1 yyyy/mm/dd
SSL url_3 1 yyyy/mm/dd

Site: DEEZEE
DEEZEE url_2 2 yyyy/mm/dd
DEEZEE url_1 1 yyyy/mm/dd

Site: Web2

Site: SSL
Use of uninitialized value in concatenation (.) or string at try.pl
line 50.
Sort site: >< url_3 url_2
Use of uninitialized value in hash element at try.pl line 51.
Use of uninitialized value in hash element at try.pl line 51.
Use of uninitialized value in hash element at try.pl line 51.
Use of uninitialized value in numeric comparison (<=>) at try.pl line
51.
Use of uninitialized value in numeric comparison (<=>) at try.pl line
51.
SSL url_2 1 yyyy/mm/dd
SSL url_3 1 yyyy/mm/dd

Site: DEEZEE
Use of uninitialized value in concatenation (.) or string at try.pl
line 50.
Sort site: >< url_1 url_2
Use of uninitialized value in hash element at try.pl line 51.
Use of uninitialized value in hash element at try.pl line 51.
Use of uninitialized value in numeric comparison (<=>) at try.pl line
51.
Use of uninitialized value in numeric comparison (<=>) at try.pl line
51.
DEEZEE url_1 1 yyyy/mm/dd
DEEZEE url_2 2 yyyy/mm/dd

Site: Web2

==========

In this case \$site has no value in the sub. In the real program it had
one of the other sites as a value rather than the correct one.

--
Regards
Dave Saville

NB Remove nospam. for good email address

## Re: sorting a hash / 2008-06-01

You should declare variables in the tightest scope you can, otherwise
use strict will be less helpful in finding such scoping problems.

...

it would be better to use "foreach my \$site"

The way foreach works, the \$site inside the loop is not the same
as the \$site declared outside the loop.

The \$site used by by_value is the one from outside the foreach loop, not
the one inside the foreach loop.  Usually you'd just pass in \$site as an
argument, but when the sub is called automatically from sort, that doesn't
work.

There are a variety ways around this, but none of them are entirely
satisfactory.  One would be to make the sub an ordinary subroutine,
rather than one made specifically to be used by sort, then invoke
it explicitly rather than implicitly:

foreach my \$url (sort keys %})
...
sub by_value
{
my (\$a,\$b,\$site)=@_;
print "Sort site: >\$site< \$a \$b\n";
\$urls[0] <=> \$urls[0] or \$a cmp \$b;
}

Xho

--
The costs of publication of this article were defrayed in part by the
this fact.

## Re: sorting a hash / 2008-06-01

On Thu, 12 Jun 2008 15:57:37 UTC, xhoster@gmail.com wrote:

Ah Ah. I did not know that. Many thanks for pointing it out, that of
course explains everything.

--
Regards
Dave Saville

NB Remove nospam. for good email address

## Re: sorting a hash / 2008-06-01

Because that would make it clearer what is happening (see below).

What Xho didn't point out is that Perl has two different types of
scoping: Dynamic scoping and lexical scoping.

Lexical scoping is introduced with "my" and works like in most other
programming languages: The variable is visible until the end of the
block - as seen in the source code. So:

1  #!/usr/local/bin/perl
2  use warnings;
3  use strict;

4  my \$x = 'X';
5  for \$x (qw(a b c d)) {
6      foo();
7  }
8  print "\$x\n";

9  sub foo {
10      print "\$x\n";
11  }

will print

X
X
X
X
X

because the "inner" \$x introduced in line 5 is only visible until the closing
brace in line 7, and both the print in line 8 and the print in line 10
will see the "outer" \$x introduced in line 4.

If you omit the "my" in line 5, that doesn't change, because loop
variables are always implicitely scoped to the loop.

Now, change line 4 to

4  our \$x = 'X';

and run the script again:

a
b
c
d
X

what happened? Now \$x is a global variable, and the implicit scoping in
the loop will use dynamic scoping instead of lexical scoping. The \$x
inside of the loop is still a different from the \$x outside of the loop
(as you can see, the print in line 8 still prints "X"), but "inside of
the loop" now contains subs called from the loop, so the print inside
foo will see the "inner" \$x.

Dynamic scoping is used very little these days because it is confusing
and error-prone: You need to know from where a function is called to
know which variables it is seeing. But it does exist and for some
specialized uses it may even be clearer.

hp

## Re: sorting a hash / 2008-06-01

On Sat, 14 Jun 2008 07:47:54 UTC, "Peter J. Holzer"

<snip>

Ah, so there is a subtle difference between "my \$thing" and "our
\$thing" when declared outside of any code blocks. ie at the "top"
level of a script. I had never seen a condition before where one could
tell the difference. "my \$thing" at that level *appears* to behave in
a global fashion in that everything can "see" it.

Thanks, you learn something new everyday, The trick is not to forget
it :-)

--
Regards
Dave Saville

NB Remove nospam. for good email address

## Re: sorting a hash / 2008-06-01

>> wrote:
>>
>> my \$site;

>> foreach \$site (keys %sites)

x> The way foreach works, the \$site inside the loop is not the same
x> as the \$site declared outside the loop.

i was smelling a scoping issue since the sort sub used external stuff
and also the loop variable. i didn't spot the extra my \$site. but i was
right in general! :)

x> There are a variety ways around this, but none of them are entirely
x> satisfactory.  One would be to make the sub an ordinary subroutine,
x> rather than one made specifically to be used by sort, then invoke
x> it explicitly rather than implicitly:

or he could use a different name for the for loop variable. since for
variables are localized, it would be seen in the sort sub as he wrote
it. it was the my \$site outside that was being seen by the sort
sub. regardless, it isn't a good idea to use sort subs that refer to
external variables - too dangerous as we have seen here.

uri

--
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

## Re: sorting a hash / 2008-06-01

Thank you both very much for the education. Much appreciated.

--
Regards
Dave Saville

NB Remove nospam. for good email address

## Re: sorting a hash / 2008-06-01

*SKIP*

Me.  That was my guess.

I have unusual habbit of chaining C<map>s and C<grep>s.  That was plain
inertia.

I rarely C<sort> (honestly, I don't at all).  However I've definetely
seen notes on stability of C<sort>.  I was ignorant.  Mea culpa.

*CUT*

--
Torvalds' goal for Linux is very simple: World Domination

## Re: sorting a hash / 2008-06-01

Are you trying to sort all of your records at once, regardless of site?
You can't do that with a simple sort. If you just want to sort the data
by site, here is an example:

#!/usr/local/bin/perl
use strict;
use warnings;

my %urls = (
site1 => {
url1_1 => [ 3, '2008-01-01' ],
url1_2 => [ 2, '2008-02-02' ],
url1_3 => [ 5, '2008-02-03' ],
url1_4 => [ 4, '2008-03-05' ],
},
site2 => {
url2_1 => [ 7, '2008-09-11' ],
url2_2 => [ 4, '2008-03-10' ],
url2_3 => [ 1, '2008-02-24' ],
url2_4 => [ 8, '2008-03-31' ],
}
);

print "Sorted by site, url:\n";
for my \$site ( sort keys %urls ) {
for my \$url ( sort keys %} ) {
my( \$count, \$date ) = @};
print "\$site, \$url, \$count, \$date\n";
}
}

print "Sorted by site, count:\n";
for my \$site ( sort keys %urls ) {
for my \$url (
sort { \$urls->[0] <=> \$urls->[0] }
keys %} )
{
my( \$count, \$date ) = @};
print "\$site, \$url, \$count, \$date\n";
}
}

If you want something else, please post a complete program showing what
you are trying to do. Thanks.

--
Jim Gibson