|
Posted by Ben Bullock on June 19, 2008, 8:10 am
Please log in for more thread options
Following on from the discussion about finding all of a set of
characters in a string, here is a "cockroach race" I've made to see
which solution is faster. I ran this on Perl 5.10, so you might get
different results with some other version.
#!/usr/local/bin/perl
use warnings;
use strict;
use List::MoreUtils qw/all/;
sub bullock
{
my ($s, @chars) = @_;
my $anychar = join '', @chars;
my $matchany = join '.*',("[$anychar]") x @chars;
if ($s =~ /$matchany/) {
my $copy = $s;
for my $c (@chars) {
return 0 unless $copy =~ s/$c//g;
}
return 1;
}
return 0;
}
sub lalli
{
my ($s, @chars) = @_;
return all { $s =~ /$_/ } @chars;
}
sub jackman
{
my ($s, @chars) = @_;
my @re;
for (@chars) {
my $re = $_;
for my $c (@chars) {
next if $c eq $_;
$re .= "(?=.*$c)";
}
push @re, $re;
}
my $re = join '|', @re;
return $s =~ /$re/;
}
sub j_index
{
my ($s, @chars) = @_;
my $matched = 1;
for my $char (@chars) {
return 0 if index($s, $char) == -1;
}
return 1;
}
sub j_b
{
my ($s, @chars) = @_;
my $re;
for my $c (@chars) {
$re .= "(?=.*$c)";
}
return $s =~ /$re/;
}
sub j_b2
{
my ($s, @chars) = @_;
my $re = '^';
for my $c (@chars) {
$re .= "(?=.*$c)";
}
return $s =~ /$re/;
}
use Benchmark qw( cmpthese );
sub comparethem
{
my ($text, $chars, $ret) = @_;
my %chars = map {$_ => 1} split ('', $chars);
my @chars = sort keys %chars;
my $count = 100000;
cmpthese $count,
{
'bullock' => sub {bullock($text,@chars) == $ret or die '1'},
'lalli' => sub {lalli($text,@chars) == $ret or die '2'},
'jackman' => sub {jackman($text,@chars) == $ret or die '3'},
'j_index' => sub {j_index($text,@chars) == $ret or die '4'},
'j_b' => sub {j_b($text,@chars) == $ret or die '5'},
'j_b2' => sub {j_b($text,@chars) == $ret or die '6'},
};
}
comparethem ("naninuneno", 'aiueo', 1);
comparethem ("naninuneni", 'aiueo', 0);
comparethem ("naninuneni", 'abcdefghijklmnopqrstuvwxyz', 0);
comparethem ('abcdefghijklmnopqrstuvwxyz'x100, 'aeiou', 1);
__END__
The clear winner on all four tests is Glenn Jackman's version
using "index":
Rate bullock jackman lalli j_b j_b2 j_index
bullock 20000/s -- -17% -41% -74% -74% -84%
jackman 24213/s 21% -- -29% -68% -68% -80%
lalli 34014/s 70% 40% -- -55% -55% -72%
j_b 75758/s 279% 213% 123% -- -0% -38%
j_b2 75758/s 279% 213% 123% 0% -- -38%
j_index 121951/s 510% 404% 259% 61% 61% --
Rate jackman bullock lalli j_b j_b2 j_index
jackman 21053/s -- -10% -37% -72% -72% -85%
bullock 23419/s 11% -- -30% -69% -69% -83%
lalli 33670/s 60% 44% -- -56% -56% -76%
j_b 75758/s 260% 223% 125% -- -0% -45%
j_b2 75758/s 260% 223% 125% 0% -- -45%
j_index 138889/s 560% 493% 313% 83% 83% --
Rate jackman j_b2 j_b lalli bullock j_index
jackman 1386/s -- -93% -94% -95% -95% -97%
j_b2 20921/s 1409% -- -4% -18% -25% -52%
j_b 21786/s 1472% 4% -- -15% -22% -50%
lalli 25641/s 1750% 23% 18% -- -8% -42%
bullock 27855/s 1910% 33% 28% 9% -- -36%
j_index 43860/s 3064% 110% 101% 71% 57% --
Rate bullock jackman lalli j_b2 j_b j_index
bullock 3557/s -- -78% -86% -89% -89% -96%
jackman 16234/s 356% -- -35% -50% -50% -83%
lalli 25063/s 605% 54% -- -23% -24% -74%
j_b2 32680/s 819% 101% 30% -- -0% -66%
j_b 32787/s 822% 102% 31% 0% -- -66%
j_index 95238/s 2577% 487% 280% 191% 190% --
However, second place is not so clear. Despite what Bart Lateur
thought, there is no difference between the performance of the
anchored and unanchored regexes (j_b and j_b2 above). The solution I
posed initially fails badly on a long input string. The regex solution
posted by Glenn Jackman fails particularly badly on a long list of
characters, because of the O(n^2) size of the regex. However, the
thing I initially posted does miles better with a long input string,
coming a close second.
|
|
Posted by Ben Bullock on June 19, 2008, 8:54 am
Please log in for more thread options
On Thu, 19 Jun 2008 12:10:34 +0000, Ben Bullock wrote:
> 'j_index' => sub {j_index($text,@chars) == $ret or die '4'},
> 'j_b' => sub {j_b($text,@chars) == $ret or die '5'},
> 'j_b2' => sub {j_b($text,@chars) == $ret or die '6'},
^
2
Ooops!
> However, second place is not so clear. Despite what Bart Lateur
> thought, there is no difference between the performance of the
> anchored and unanchored regexes (j_b and j_b2 above).
As Bart Lateur thought, adding the anchor ^ increased the speed quite a
bit, especially with the negative matches (the second and third ones):
Rate j_b j_b2 j_index
j_b 72993/s -- -7% -42%
j_b2 78740/s 8% -- -37%
j_index 125000/s 71% 59% --
Rate j_b j_b2 j_index
j_b 70423/s -- -11% -49%
j_b2 79365/s 13% -- -43%
j_index 138889/s 97% 75% --
Rate j_b j_b2 j_index
j_b 20704/s -- -18% -54%
j_b2 25126/s 21% -- -45%
j_index 45455/s 120% 81% --
Rate j_b j_b2 j_index
j_b 31348/s -- -3% -62%
j_b2 32468/s 4% -- -61%
j_index 83333/s 166% 157% --
|
|
Posted by Paul Lalli on June 19, 2008, 9:54 am
Please log in for more thread options > Following on from the discussion about finding all of a set of
> characters in a string, here is a "cockroach race" I've made to see
> which solution is faster. I ran this on Perl 5.10, so you might get
> different results with some other version.
[...]
> However, second place is not so clear. Despite what Bart Lateur
> thought, there is no difference between the performance of the
> anchored and unanchored regexes (j_b and j_b2 above). The solution I
> posed initially fails badly on a long input string. The regex solution
> posted by Glenn Jackman fails particularly badly on a long list of
> characters, because of the O(n^2) size of the regex. However, the
> thing I initially posted does miles better with a long input string,
> coming a close second.
Added lalli_idx to use index() rather than pattern matches in my
suggestion (and corrected the j_b/j_b2 typo):
sub lalli_idx
{
my ($s, @chars) =3D @_;
return all { index($s, $_) !=3D -1 } @chars;
}
Running on 5.8.4 for solaris:
Rate bullock lalli jackman lalli_idx j_b2
j_b j_index
bullock 14599/s -- -15% -23% -76% -78%
-78% -85%
lalli 17212/s 18% -- -9% -71% -74%
-74% -82%
jackman 19011/s 30% 10% -- -68% -71%
-72% -81%
lalli_idx 60241/s 313% 250% 217% -- -8%
-10% -39%
j_b2 65789/s 351% 282% 246% 9% --
-2% -33%
j_b 67114/s 360% 290% 253% 11% 2%
-- -32%
j_index 98039/s 572% 470% 416% 63% 49%
46% --
Rate jackman bullock lalli j_b lalli_idx
j_b2 j_index
jackman 17452/s -- -2% -13% -67% -72%
-74% -85%
bullock 17857/s 2% -- -11% -66% -71%
-73% -84%
lalli 20040/s 15% 12% -- -62% -67%
-70% -83%
j_b 52910/s 203% 196% 164% -- -14%
-21% -54%
lalli_idx 61350/s 252% 244% 206% 16% --
-9% -47%
j_b2 67114/s 285% 276% 235% 27% 9%
-- -42%
j_index 114943/s 559% 544% 474% 117% 87%
71% --
Rate jackman j_b lalli j_b2 bullock
lalli_idx j_index
jackman 1200/s -- -94% -94% -95% -96%
-97% -98%
j_b 20877/s 1640% -- -2% -6% -37%
-39% -57%
lalli 21322/s 1677% 2% -- -4% -36%
-38% -56%
j_b2 22321/s 1760% 7% 5% -- -33%
-35% -54%
bullock 33223/s 2669% 59% 56% 49% --
-4% -31%
lalli_idx 34483/s 2774% 65% 62% 54% 4%
-- -29%
j_index 48309/s 3926% 131% 127% 116% 45%
40% --
Rate bullock jackman j_b2 j_b lalli
lalli_idx j_index
bullock 2818/s -- -75% -83% -83% -83%
-95% -97%
jackman 11086/s 293% -- -31% -32% -33%
-79% -87%
j_b2 16129/s 472% 45% -- -1% -3%
-70% -81%
j_b 16234/s 476% 46% 1% -- -2%
-70% -81%
lalli 16556/s 488% 49% 3% 2% --
-69% -80%
lalli_idx 53763/s 1808% 385% 233% 231% 225%
-- -37%
j_index 84746/s 2908% 664% 425% 422% 412%
58% --
|
| Similar Threads | Posted | | [regex] grep for chars in any order | June 18, 2008, 3:06 am |
| perlipc doc / another race (?) | June 27, 2005, 1:33 pm |
| Minimizing chance of race conditions | November 15, 2004, 9:17 pm |
| Adding a delimiter inbetween number characters and letter characters | January 12, 2005, 9:25 am |
| What does 'grep -M' do? | April 14, 2007, 8:42 pm |
| Grep Help | November 16, 2007, 12:56 am |
| Grep -v option | November 4, 2004, 3:05 pm |
| using grep and diff | May 10, 2005, 10:13 am |
| Grep in parts of an XML | June 23, 2005, 12:48 am |
| Need help with grep function | April 28, 2006, 11:13 am |
|