# efficiency of if ( my @a = /pattern/g ) { print "@a\n" }

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
Gentlemen, is this the most efficient way to do this, or should one use
split() and more arrays and loops?

use strict;
use warnings FATAL => 'all';
while (<DATA>) {
if ( my @a = /BB|BM|MY|SG|TW|US/g ) { print "@a\n" }
}
__DATA__
BD BE BF BG BA BB WF BM BN BO BH BI BJ BT JM BV BW WS...
BD BE BF BG BA BB WF BN BO BH BI BJ BT JM BV BW WS BR...
BD WF BF BG BA BB BE BM BN BO BH BI BJ BT JM JO WS BS BY BZ...
PR FR GU IL KR VI CA JP IT US TW NZ AU GB BR IN NL IE MX ES...

## Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }

On Mon, 07 Mar 2011 03:33:33 +0800, jidanni wrote:

Iff your input data is so clean as this, your way is tops.

M4

## Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }

jidanni@jidanni.org wrote:

You can answer that yourself by benchmarking (perldoc Bench)
other solutions. The following isn't exactly the same, since
it's looking for the exact values, instead of something
that might contain BB or BM or MY, etc. but looking at
your data, you're possibly after the exact value.

my %item = map{ \$_ => 1 } qw( BB BM MY SG TW US );
while( <DATA> )
{
my @a;
for my \$k (  split( / / ) )
{
push( @a, \$k ) if \$item{ \$k };
}
print "@a\n" if @a;
}

That's 37% faster, on my machine.

If this is all your code is doing, then it would be good
to experiment a bit.  If the code is doing many other things,
then it won't matter that much.  e.g. A slight
optimization, depending on the input, would be to only
split up the values, if one exists in the line:

next unless /BB|BM|MY|SG|TW|US/;

or maybe using egrep might be better:

egrep 'BB|BM|MY|SG|TW|US' file | script.pl

Of course, if every line has one of those values, then
that's a useless thing to do.

If you're really doing this a lot or are just playing around
to find if something else might be faster, experiment a
bit with different solutions. The Bench module, provides
a means to measure different solutions, against each other.

## Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }

JG> You can answer that yourself by benchmarking (perldoc Bench)
JG> other solutions. The following isn't exactly the same, since
JG> it's looking for the exact values, instead of something
JG> that might contain BB or BM or MY, etc. but looking at
JG> your data, you're possibly after the exact value.

it is Benchmark.

JG> my %item = map{ \$_ => 1 } qw( BB BM MY SG TW US );
JG> while( <DATA> )
JG> {
JG>     my @a;
JG>     for my \$k (  split( / / ) )
JG>         {
JG>         push( @a, \$k ) if \$item{ \$k };
JG>     }
JG>     print "@a\n" if @a;
JG> }

JG> That's 37% faster, on my machine.

i was thinking a hash as well. alternation in regexes can be slow (some
optimizaions have been done recently though).

i wouldn't even do the split. i think it would be faster (i am not in
the mood to benchmark it) to do grab loop like this (untested). it saves
building up the list of tokens in each line loop.

while( \$line = /(\w\w)/g ) {

next if \$item{ \$1 } ;

...

i forget the boolean test direction so if could be unless.

uri

--
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

## Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }

J. Gleixner wrote:

Why split( / / ) and not just:

for my \$k ( split )

Besides, split( / / ) won't remove the newline.

John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.                   -- Albert Einstein

## Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }

OK thanks fellas. (I was doing my own homebrew scan of