Click here to get back home

sorting a hash / 2008-06-01

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
sorting a hash / 2008-06-01 dn.perl@gmail.com 05-30-2008
Get Chitika Premium
Posted by A. Sinan Unur on May 30, 2008, 9:51 am
Please log in for more thread options
@mid.individual.net:

> dn.perl@gmail.com wrote:

...

>> The problem is that
>> the city-names are one extra level deep with the state-name coming
>> in-between. I wondered whether I should build the hash differently.
>
> Probably. This is one idea:
>
> my %hash = (
> 'San Jose' => {
> state => 'Calif',
> max_temp => 84,
> },
> 'San Fran' => {
> state => 'Calif',
> max_temp => 94,
> },
> );

http://en.wikipedia.org/wiki/Athens_(town),_New_York

http://en.wikipedia.org/wiki/Athens,_Georgia

Sinan

--
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Posted by Dave Saville on June 11, 2008, 12:25 pm
Please log in for more thread options
On Fri, 30 May 2008 13:11:22 UTC, Gunnar Hjalmarsson

> dn.perl@gmail.com wrote:
> > I want to sort a hash. The hash contains a list of cities and their
> > temperature
>
> Well, I'd rather say it contains three hash references.
>
> This is one sensible way to sort that data structure:
>
> foreach my $state ( sort keys %hash ) {
> print "State: $state\n";
> foreach my $city ( sort { $a cmp $b } keys %{ $hash } ) {
> print "$city = $hash\n";
> }
> print "\n";
> }

Sorry to jump in with another question but I have a very similar
problem. I am processing a consolidated apache2 logfile. I have
multiple virtual hosts. All I care about are the site, the page
served, a counter and the date.

So my hash looks like $urls Beyond that I have a counter
and date thus:
$urls[0]++; # count
$urls[1] = $date;

This works fine and I can list by site the page, count and date.

foreach $site ( keys %urls)
{
foreach my $url (keys %})
{
print "$site $url $urls[0] $urls[1]\n";
}
}

Putting a sort into the url loop gives me the results sorted by page
as expected. What I cannot figure out is how to do it by count and by
date.

I have tried various ideas I found by google but they all tend to be
similar to this

sub by_count
{
$urls[0] <=> $urls[0] or $a cmp $b;
}

But this throws lots of "Use of uninitialized value....." errors on
that line and in doing so gets the wrong pages attributed to a site. I
have tried with yet another hash on the end with count & date keys
instead of the array, but it does not help.

I would be grateful for any pointers.

--
Regards
Dave Saville

NB Remove nospam. for good email address

Posted by Eric Pozharski on June 11, 2008, 3:56 pm
Please log in for more thread options
*SKIP*

> Sorry to jump in with another question but I have a very similar
> problem. I am processing a consolidated apache2 logfile. I have
> multiple virtual hosts. All I care about are the site, the page
> served, a counter and the date.

Piece of advice. The next time you'll would like to I<jump in> consider
stoling the thread. Otherwise your question can be left unanswered.
Because it wasn't seen.

*SKIP*
> I have tried various ideas I found by google but they all tend to be
> similar to this

Forget B<google>, use B<perldoc> instead.

> sub by_count
> {
> $urls[0] <=> $urls[0] or $a cmp $b;
> }

Hopefully Uri won't see that. I<$a> and I<$b> are special. However
only in context of B<sort>.

*SKIP*

> I would be grateful for any pointers.

If I guessed your problem right way, than:

print "$site $_ $urls[0] $urls[1]\n"
foreach(
sort { $urls[0] <=> $urls[0]; }
sort { $urls[1] <=> $urls[1]; }
keys %});

(Sorry for extra(?) parenthesis and semicolons; I'm not sure yet if
B<perl> would compile my intensions right.) Please note, that supposes
that dates are unix-epoched; otherwise you must do the comparision by
himself or pass it some other module.

--
Torvalds' goal for Linux is very simple: World Domination

Posted by Uri Guttman on June 12, 2008, 12:02 am
Please log in for more thread options

EP> *SKIP*

>> Sorry to jump in with another question but I have a very similar
>> problem. I am processing a consolidated apache2 logfile. I have
>> multiple virtual hosts. All I care about are the site, the page
>> served, a counter and the date.

EP> Piece of advice. The next time you'll would like to I<jump in> consider
EP> stoling the thread. Otherwise your question can be left unanswered.
EP> Because it wasn't seen.

EP> *SKIP*
>> I have tried various ideas I found by google but they all tend to be
>> similar to this

EP> Forget B<google>, use B<perldoc> instead.

>> sub by_count
>> {
>> $urls[0] <=> $urls[0] or $a cmp $b;
>> }

EP> Hopefully Uri won't see that. I<$a> and I<$b> are special. However
EP> only in context of B<sort>.

i did. my eyes are bleeding!

>> I would be grateful for any pointers.

EP> If I guessed your problem right way, than:

EP> print "$site $_ $urls[0] $urls[1]\n"
EP> foreach(
EP> sort { $urls[0] <=> $urls[0]; }
EP> sort { $urls[1] <=> $urls[1]; }
EP> keys %});

are you (or the OP) trying to do a multilevel sort? it looks like yours
will work but it is unusual to do two sort passes. and it relies on the
sort to be stable (meaning equal keys stay in the same ordering post
sort). perl now uses a stable sort but earlier versions didn't. it is
not something you should depend upon.

and of course Sort::Maker makes this easy. (untested):

use Sort::Maker ;
my $sorter = make_sorter( 'ST', number => '$urls[0],
                         number => '$urls[1] ) ;

my @sorted = $sorter->( keys %} ) ;

and i don't know the data structure so there may be ways to improve
that.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

Posted by Dave Saville on June 12, 2008, 5:12 am
Please log in for more thread options
wrote:

>
> EP> *SKIP*
>
> >> Sorry to jump in with another question but I have a very similar
> >> problem. I am processing a consolidated apache2 logfile. I have
> >> multiple virtual hosts. All I care about are the site, the page
> >> served, a counter and the date.
>
> EP> Piece of advice. The next time you'll would like to I<jump in> consider
> EP> stoling the thread. Otherwise your question can be left unanswered.
> EP> Because it wasn't seen.
>

True, but the problem looked so similar.

> EP> *SKIP*
> >> I have tried various ideas I found by google but they all tend to be
> >> similar to this
>
> EP> Forget B<google>, use B<perldoc> instead.
>
> >> sub by_count
> >> {
> >> $urls[0] <=> $urls[0] or $a cmp $b;
> >> }
>
> EP> Hopefully Uri won't see that. I<$a> and I<$b> are special. However
> EP> only in context of B<sort>.
>
> i did. my eyes are bleeding!
>

Sorry I don't understand what you are getting at - apart from an in
joke.

> >> I would be grateful for any pointers.
>
> EP> If I guessed your problem right way, than:
>
> EP> print "$site $_ $urls[0] $urls[1]\n"
> EP> foreach(
> EP> sort { $urls[0] <=> $urls[0]; }
> EP> sort { $urls[1] <=> $urls[1]; }
> EP> keys %});
>
> are you (or the OP) trying to do a multilevel sort? it looks like yours
> will work but it is unusual to do two sort passes. and it relies on the
> sort to be stable (meaning equal keys stay in the same ordering post
> sort). perl now uses a stable sort but earlier versions didn't. it is
> not something you should depend upon.
<snip>

No not multi level here, just two ways of presenting the data
depending on which $site it came from. Thanks for the help guys, but
they are only variations on what I had tried with no luck. However, I
have discovered that here (OS/2) there is a bug in perl (5.8.2). I
don't know yet if it is a bug in perl or the port. I suspect the
latter, but

foreach my $url (sort [0] <=> $urls[0] }
keys %})

and

foreach my $url (sort by_value keys %})

sub by_value
{
$urls[0] <=> $urls[0];
}

Give different results. The first works correctly and the second for
some reason yet to be determined gets the *wrong* value of $site. I
stuck a print $site in the subroutine. That is where all the errors
came from, it was trying to compare site A's urls against site B's -
No wonder there where a lot of errors :-)

I was going to run my test case on my Solaris box but the darn thing
decided to trash its hard drive :-(

Oh, and the date is text and sortable - YYYY/MM/DD.





--
Regards
Dave Saville

NB Remove nospam. for good email address

Similar ThreadsPosted
Sorting a hash containing a hash of hashes December 14, 2005, 2:29 pm
Hash Sorting June 14, 2005, 2:49 pm
Sorting Hash by Value and Key May 17, 2007, 9:57 am
Sorting on sub-hash values June 23, 2005, 11:30 am
Sorting AofH over hash key(s)... October 30, 2007, 4:40 pm
Nested sorting of a hash December 6, 2007, 6:23 am
warnings on sorting hash of hashes January 5, 2005, 11:53 pm
sorting data - hash vs. list September 11, 2005, 4:41 pm
Sorting array of hash references October 26, 2006, 6:21 am
Sorting "string" numerical keys from a hash. September 5, 2004, 2:47 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap