# Consecutive Numbers

#### Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: \$!";

my \$runs = 0;
while(<IN>){
chomp;
my \$first = substr(\$_,0,1);
my \$asc = \$first . \$first+1 . \$first+2 . \$first+3;
my \$des = \$first . \$first-1 . \$first-2 . \$first-3;
if (/^(\$asc|\$des)/){
\$runs++
}
}
print "There were \$runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.

--
Graham Drabble
If you're interested in what goes on in other groups or want to find
an interesting group to read then check news.groups.reviews for what
others have to say or contribute a review for others to read.

## Re: Consecutive Numbers

Graham Drabble wrote:

Here's my solution.  I don't know if it's better or worse, but it's
certainly different (a couple extra test cases given - 6 'valid' lines
total):

#!/usr/bin/perl
use strict;
use warnings;

my \$rows;

while (<DATA>){
chomp;
my @nums = (split //)[0..3];
\$rows++ if (grep {\$nums[\$_] == \$nums[\$_-1]+1} 1..3) == 3;
\$rows++ if (grep {\$nums[\$_] == \$nums[\$_-1]-1} 1..3) == 3;
}

print "Total of \$rows rows\n";

__DATA__
12345867
23457658
34568765
43215687
13245678
67893131
12335668
13456789
76542109

## Re: Consecutive Numbers

Please don't speak of numbers when you mean digits.  Your example is hard
to understand without that distinction.  That is a waste of time.

Worry about speed when the program runs too slow, not before.

I don't know about shorter, or even faster, but your way of building
a regular expression for each case is certainly clumsy and limited.

Regular expressions are for text, but what you are dealing with are
numbers (despite my stressing that they are digits).  Use arithmetic
operations for numbers, the other way lies, for instance, a famous
Y2K bug.

Your ascending or descending sequences of digits can be considered
arithmetic sequences with a difference of +1 or -1.  That the elements
happen to be single-digit numbers is irrelevant here.  So write a
general routine that recognizes arithmetic sequences (of any length):

sub is_arithmetic_sequence {
my \$delta = shift;
while ( @_ > 1 ) {
\$_ + \$delta == \$_[ 0] or return 0 for shift;
}
return 1;
}

That can deal with ascending and descending sequences, so you don't
have to code for both.

To extract the four-digit sequences from each line, a regex is
appropriate.  So the main loop is

my \$runs = 0;
while ( <DATA> ) {
my @seq = /(\d)(\d)(\d)(\d)/ or next;
\$runs ++ if
is_arithmetic_sequence( -1, @seq) or
is_arithmetic_sequence( 1, @seq);
}
print "There were \$runs runs\n";

That should deal with a few thousand lines with no big problems on
current hardware.  The code gives the right result for your data,
but isn't severely tested.

Anno

## Re: Consecutive Numbers

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote in

I think the OP wanted to see if the first four digits in each line were
consecutive. In light of that, and because I was curious, here is another
way of doing it.

use strict;
use warnings;

my \$runs = 0;
my \$lines = 0;

my \$t = time;

while(my \$line = <DATA>) {
next unless \$line =~ /^(\d\d\d\d)/;
++\$lines;
my @digits = split '', \$1;
if(is_arithmetic_sequence(1, @digits)
|| is_arithmetic_sequence(-1, @digits)) {
++\$runs;
}
}

\$t = time - \$t;

print "There were \$runs runs\n";
print "It took \$t seconds to process \$lines lines\n";

sub is_arithmetic_sequence {
my \$delta = shift;
for my \$d (1 .. \$#_) {
return 0 unless \$delta == \$_[\$d] - \$_[\$d-1];
}
return 1;
}
__DATA__
12345867
23457658

.... 1_211_896 lines ...

C:\Temp> perl md.pl
There were 774520 runs
It took 63 seconds to process 1211896 lines

That's about 20_000 lines per second on a 1Ghz Celeron running Win XP. Not

--
A. Sinan Unur
1usa@llenroc.ude.invalid
(remove '.invalid' and reverse each component for email address)

## Re: Consecutive Numbers

X-Ftn-To: Anno Siegel

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote:
>    sub is_arithmetic_sequence {
>        my \$delta = shift;
>        while ( @_ > 1 ) {
>            \$_ + \$delta == \$_[ 0] or return 0 for shift;
>        }
>        return 1;
>    }

=== my benchmark ===
>                 Rate       foreach         plain nosub_foreach   nosub_plain
>foreach        5698/s            --          -68%          -91%          -92%
>plain         17913/s          214%            --          -72%          -75%
>nosub_foreach 64412/s         1030%          260%            --          -11%
>nosub_plain   72595/s         1174%          305%           13%            --

I know that foreach is costly, even when using it only to topicalize
scalars. What baffles me here, is that there is far greater difference
between first two benchmarks => (foreach:plain = 1:3.14; your
is_arithmetic_sequence runs three times faster without foreach) compared to
next two 1:1.12 ?:)
Is Benchmark being misused?

use Benchmark qw(:all);
use strict;

sub is_arithmetic_sequence {
my \$delta = shift;
while ( @_ > 1 ) {
\$_ + \$delta == \$_[ 0] or return 0 for shift;
}
return 1;
}
sub is_arithmetic_sequence2 {
my \$delta = shift;
while ( @_ > 1 ) {
(shift) + \$delta == \$_[ 0] or return 0;
}
return 1;
}

cmpthese(40_000, {
foreach => sub {
my @r = 0..30;
is_arithmetic_sequence(1, @r);
},
plain => sub {
my @r = 0..30;
is_arithmetic_sequence2(1, @r);
},
nosub_foreach => sub {
my @r = 0..30;
my \$delta = 1;
while ( @r > 1 ) {
\$_ + \$delta == \$_[ 0] or return 0 for shift @r;
}
return 1;
},
nosub_plain => sub {
my @r = 0..30;
my \$delta = 1;
while ( @r > 1 ) {
(shift @r) + \$delta == \$_[ 0] or return 0;
}
return 1;
}
});

--
Matija

## Re: Consecutive Numbers

X-Ftn-To: Matija Papec

>is_arithmetic_sequence runs three times faster without foreach) compared to
>next two 1:1.12 ?:)
>Is Benchmark being misused?

I was using wrong arrays in last two subs. ;)
>                 Rate nosub_foreach       foreach   nosub_plain         plain
>foreach        5698/s            5%            --          -64%          -68%
>plain         17841/s          228%          213%           12%            --
>nosub_foreach  5434/s            --           -5%          -66%          -70%
>nosub_plain   15981/s          194%          180%            --          -10%

It seems that foreach is main speed killer.

--
Matija

## Re: Consecutive Numbers

> X-Ftn-To: Anno Siegel
>
> anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote:
> >    sub is_arithmetic_sequence {
> >        my \$delta = shift;
> >        while ( @_ > 1 ) {
> >            \$_ + \$delta == \$_[ 0] or return 0 for shift;
> >        }
> >        return 1;
> >    }
>
> === my benchmark ===
> >                 Rate       foreach         plain nosub_foreach   nosub_plain
> >foreach        5698/s            --          -68%          -91%          -92%
> >plain         17913/s          214%            --          -72%          -75%
> >nosub_foreach 64412/s         1030%          260%            --          -11%
> >nosub_plain   72595/s         1174%          305%           13%            --
>
> I know that foreach is costly, even when using it only to topicalize
> scalars. What baffles me here, is that there is far greater difference
> between first two benchmarks => (foreach:plain = 1:3.14; your
> is_arithmetic_sequence runs three times faster without foreach) compared to
> next two 1:1.12 ?:)
> Is Benchmark being misused?

Well, you have cleared that up in a followup.  Benchmarking is an art.
It is all to common to find you have benchmarked something that has
little to do with the original problem.  Happens to everyone...

> sub is_arithmetic_sequence {
>     my \$delta = shift;
>     while ( @_ > 1 ) {
>         \$_ + \$delta == \$_[ 0] or return 0 for shift;
>     }
>     return 1;
> }

I was entirely unconcerned with speed when I wrote that.  The one-shot
"for" is there so that shift() and \$_[ 0] don't appear in the same
statement the way they do in the code below.

> sub is_arithmetic_sequence2 {
>     my \$delta = shift;
>     while ( @_ > 1 ) {
>         (shift) + \$delta == \$_[ 0] or return 0;

This is arguably wrong.  The Perl interpreter is free to evaluate
subexpressions in any order, so there is no guarantee that shift()
has happened before \$_[ 0] is evaluated.  The result of the expression
is undefined, even though there was probably never a Perl interpreter
that did it the other way 'round.

>     }
>     return 1;
> }

Anno

## Re: Consecutive Numbers

X-Ftn-To: Anno Siegel

anno4000@lublin.zrz.tu-berlin.de (Anno Siegel) wrote:
>>         \$_ + \$delta == \$_[ 0] or return 0 for shift;
>>     }
>>     return 1;
>> }
>
>I was entirely unconcerned with speed when I wrote that.  The one-shot

Ofcourse, it's just that I find your example interesting in light of
"subroutine calls slow down the program". While such way of thinking is
true, it's probably overestimated as there are greater speed killers like
"foreach".

>> sub is_arithmetic_sequence2 {
>>     my \$delta = shift;
>>     while ( @_ > 1 ) {
>>         (shift) + \$delta == \$_[ 0] or return 0;
>
>This is arguably wrong.  The Perl interpreter is free to evaluate
>subexpressions in any order, so there is no guarantee that shift()
>has happened before \$_[ 0] is evaluated.  The result of the expression
>is undefined, even though there was probably never a Perl interpreter
>that did it the other way 'round.

Ah, so I'm living on the edge? Are there some other common cases where it
comes to undefined results?

--
Matija

## Re: Consecutive Numbers

> X-Ftn-To: Anno Siegel
>
> >> sub is_arithmetic_sequence2 {
> >>     my \$delta = shift;
> >>     while ( @_ > 1 ) {
> >>         (shift) + \$delta == \$_[ 0] or return 0;
> >
> >This is arguably wrong.  The Perl interpreter is free to evaluate
> >subexpressions in any order, so there is no guarantee that shift()
> >has happened before \$_[ 0] is evaluated.  The result of the expression
> >is undefined, even though there was probably never a Perl interpreter
> >that did it the other way 'round.
>
> Ah, so I'm living on the edge? Are there some other common cases where it
> comes to undefined results?

It happens every time an expression has a subexpression that has a
side-effect on other parts of the expression.  The problem is by no
means limited to Perl, though Perl may have more operations with
side-effects than other languages.  The scalar-modifying ++, --, +=,
*= and their ilk are candidates, as are the array-modifying operations
shift, unshift, pop, push and splice.  There are many more.

They can be used freely in any expression as long as the modified
object doesn't appear again in the same expression.  When it does,
there is no way of telling whether it is going to be used in its
original state or after modification.

Only a few languages (Lisp) commit themselves to a defined order of
evaluation of expressions.  Others wisely reserve the freedom of
changing the evaluation order at random.  This is wise because small
changes in the parsing technique (for sake of efficiency) may change
the order of evaluation in hard-to-predict ways.

Btw, the problem is worse in languages that have macro capabilities,
like C.  A macro can hide the fact that a subexpression appears more
than once.  The standard example is

#define SQUARE( x, y) ((x)*(x))
int i = 1;
printf( "%d is not 4\n", SQUARE( ++i));

That looks entirely innocent, yet it prints "6 is not 4".

Anno

## Re: Consecutive Numbers

: I have a file that contains a list of numbers. I'm trying to process
: the file to find out how many rows start with 4 consecutive numbers
: (either ascending or decending). Currently I've got
:
: use strict;
: use warnings;
:
: open (IN, '4bell.txt') or die "Can't open IN: \$!";
:
: my \$runs = 0;
: while(<IN>){
:     chomp;
:     my \$first = substr(\$_,0,1);
:     my \$asc = \$first . \$first+1 . \$first+2 . \$first+3;
:     my \$des = \$first . \$first-1 . \$first-2 . \$first-3;
:     if (/^(\$asc|\$des)/){
:         \$runs++
:     }

The sample data don't reveal it, but the equal precedence of the "+" and
"." operators can create false positive matches.  Try it with an element
starting with "7900".

Even after disambiguating with parentheses, e.g.

my \$asc = \$first . (\$first+1) . (\$first+2) . (\$first+3);

it could still create a false positive match if an element started with,
say, "78910".

: }
: print "There were \$runs runs\n";
:
: IN
: 12345867
: 23457658
: 34568765
: 43215687
: 13245678
:
: Prints
: There were 4 runs
:
: which is correct.

: However I can't help but think there must be a
: shorter solution but can't think of it. Any ideas? The file could
: contain up to 5000 lines.

if(
/^([0-6])(??{(\$1+1) . (\$1+2) . (\$1+3)})/
or
/^([3-9])(??{(\$1-1) . (\$1-2) . (\$1-3)})/
) {
\$runs++;
}

or:

if(
/^([0-6])/ and substr(\$_,1,3) - \$1 x 3 == 123
or
/^([3-9])/ and substr(\$_,1,3) - \$1 x 3 == -123
) {
\$runs++;
}

## Re: Consecutive Numbers

Graham Drabble (graham.drabble@lineone.net) wrote on MMMMCXXI September
MCMXCIII in
;;  I have a file that contains a list of numbers. I'm trying to process
;;  the file to find out how many rows start with 4 consecutive numbers
;;  (either ascending or decending). Currently I've got
;;
;;  use strict;
;;  use warnings;
;;
;;  open (IN, '4bell.txt') or die "Can't open IN: \$!";
;;
;;  my \$runs = 0;
;;  while(<IN>){
;;      chomp;
;;      my \$first = substr(\$_,0,1);
;;      my \$asc = \$first . \$first+1 . \$first+2 . \$first+3;
;;      my \$des = \$first . \$first-1 . \$first-2 . \$first-3;
;;      if (/^(\$asc|\$des)/){
;;          \$runs++
;;      }
;;  }
;;  print "There were \$runs runs\n";
;;
;;  IN
;;  12345867
;;  23457658
;;  34568765
;;  43215687
;;  13245678
;;
;;  Prints
;;  There were 4 runs
;;
;;  which is correct. However I can't help but think there must be a
;;  shorter solution but can't think of it. Any ideas? The file could
;;  contain up to 5000 lines.

The solutions I've seen so far all calculate something inside the loop.
That's not as efficient as doing the work outside of the loop. One should
realise there are only 12 possible starting strings. So, I'd do something
like (untested):

my \$N = 4;
my @c = map {my \$n = \$_; join "" => map {\$n + \$_} 0 .. \$N} 0 .. 10 - \$N + 1;
my @d = map {my \$n = \$_; join "" => map {\$n - \$_} 0 .. \$N} \$N - 1 .. 10;
my \$r = join "|" => @c, @d;

open my \$fh => '4bell.txt' or die "open: \$fh\n";
while (<\$fh>) {\$runs ++ if /\$r/}
print "There were \$runs runs\n";

__END__

Alternatively, only could write the first four lines as:

my \$r = join "|" => qr /0123 1234 2345 3456 4567 5678 6789
3210 4321 5432 6543 7654 8765 9876/;

Abigail
--
perl -wle'print"Κυστ αξοτθες Πεςμ Θαγλες"^"\x80"x24'

## Re: Consecutive Numbers

Indeed. This results in an increase from about 20_000 lines per second to

ITYM qw.

Sinan.

--
A. Sinan Unur
1usa@llenroc.ude.invalid
(remove '.invalid' and reverse each component for email address)

## Re: Consecutive Numbers

[A complimentary Cc of this posting was sent to
Graham Drabble

I'm puzzled at all complications: what is wrong with using something like

my \$four = substr \$in, 0, 4;
print 'OK' if length \$four == 4 and
(0 <= index '0123456789', \$four or 0 <= index '9876543210', \$four);

Ilya

## Re: Consecutive Numbers

There's nothing wrong, it apperars to be even a tad faster than Abigail's
regex. (Only goes to show that alternatives in a regex are relatively slow).

When I introduced a general function is_arithmetic_sequence() to solve
the problem, I didn'd have speed in mind, but, on the contrary, that
it is affordable to use a more general utility.  With the original data
size of a few thousand, the difference in speed wouldn't have mattered.

Anno

## Re: Consecutive Numbers

Thanks everyone. I think I'm going to go with this one, partly for
speed and partly because it's simple enough that when I come back to
it I'll know what it does!

Is there any reason that you've used 0<= index rather than index >=0?

--
Graham Drabble
If you're interested in what goes on in other groups or want to find
an interesting group to read then check news.groups.reviews for what
others have to say or contribute a review for others to read.

## Re: Consecutive Numbers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2004-12-13, Graham Drabble scribbled these
curious markings:

It's an extension of a technique used to prevent the all-too-common:

if(\$var = 0) { mumble() }

where

if(\$var == 0) { mumble() }

is what was meant. At least, that's how I understand it. I personally
don't see the point of doing that with relational operators other than
==, but I'm sure that someone can specify a reason.

Best Regards,
Christopher Nehren
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFBvPI4k/lo7zvzJioRAkFFAJ4yzuG6Ln/q2gPwb9G14fGqC7+C+gCgsfYg
jNKKQEOABKaheJAjVJEaICQ=
=RnOJ
-----END PGP SIGNATURE-----

--
I abhor a system designed for the "user", if that word is a coded
pejorative meaning "stupid and unsophisticated".  -- Ken Thompson
If you ask the wrong questions, you get answers like "42" and "God".
Unix is user friendly. However, it isn't idiot friendly.

## Re: Consecutive Numbers

Graham Drabble wrote:

Because of precedence (see perlop.)  "index \$a, \$b >= 0" would require
parentheses "index( \$a, \$b ) >= 0" or it would be interpreted by perl as
"index \$a, ( \$b >= 0 )" while "0 <= index \$a, \$b" does not require parentheses.

John
--
use Perl;
program
fulfillment

## Re: Consecutive Numbers

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Ilya Zakharevich
> [A complimentary Cc of this posting was NOT [per weedlist] sent to
> Graham Drabble
> > I have a file that contains a list of numbers. I'm trying to process
> > the file to find out how many rows start with 4 consecutive digits
> > (either ascending or decending). Currently I've got
>
> I'm puzzled at all complications: what is wrong with using something like
>
>   my \$four = substr \$in, 0, 4;
>   print 'OK' if length \$four == 4 and
>     (0 <= index '0123456789', \$four or 0 <= index '9876543210', \$four);

Or one can try something like

my \$four = substr \$in, 0, 4;
print 'OK' if length \$four == 4 and
0 <= index "0123456789\x9876543210", \$four;

of fewer opcodes to dispatch...

Yours,
Ilya

P.S.  Here I use the fact that utf8 character repertuar is larger than
UTF-8 one.

## Re: Consecutive Numbers

X-Ftn-To: Graham Drabble

#untested
use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: \$!";

my \$runs = 0;
my \$s2 = reverse my \$s1 = join "", 0..9;
while (<IN>) {
my (\$m) = /^(\d)/ or next;
\$runs++ if \$s1 =~ /\$m/ or \$s2 =~ /\$m/;
}
print "There were \$runs runs\n";

--
Matija

## Re: Consecutive Numbers

>I have a file that contains a list of numbers. I'm trying to process
>the file to find out how many rows start with 4 consecutive numbers
>(either ascending or decending). Currently I've got
>
>use strict;
>use warnings;
>
>open (IN, '4bell.txt') or die "Can't open IN: \$!";
>
>my \$runs = 0;
>while(<IN>){
>    chomp;
>    my \$first = substr(\$_,0,1);
>    my \$asc = \$first . \$first+1 . \$first+2 . \$first+3;
>    my \$des = \$first . \$first-1 . \$first-2 . \$first-3;
>    if (/^(\$asc|\$des)/){
>        \$runs++
>    }
>}
>print "There were \$runs runs\n";
>
>IN
>12345867
>23457658
>34568765
>43215687
>13245678
>
>Prints
>There were 4 runs
>
>which is correct. However I can't help but think there must be a
>shorter solution but can't think of it. Any ideas? The file could
>contain up to 5000 lines.

Since this is Perl, a solution involving a hash is always worth considering:

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: \$!";

my @run_list = qw/0123 1234 2345 3456 4567 5678 6789
3210 4321 5432 6543 7654 8765 9876/;
my %run_hash;
@run_hash = ();
my \$runs = 0;

while (<IN>) {
\$runs++ if exists \$run_hash;
}

print "There were \$runs runs\n";

Gary Ansok
--
Rule #87: If the thought of something makes me giggle for longer than 15
seconds, I am to assume that I am not allowed to do it.
-- www.skippyslist.com