Reading next line, finding missing number in sequence

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
Hello,
I know this has been discussed before and I've tried some of the
solutions too.  But I've been unsuccessful so far.  I have a simple
text file of numbers, one on each line and just need a script that
will find the missing number.  Example of file:
1
2
3
5

My script:

open (FILE, "FILE.txt");
open (MISSING, ">Missing.txt");

while (<FILE>) {
\$currline = \$_;
\$nxtline=\$currline++ ;
if (\$_ != \$nxtline) {
print MISSING "Missing occurrence is \$_  \n";
}
}

close FILE;
close MISSING;

This doesn't identify that 4 is missing.  Any ideas?
Thank you in advance,
Tara Pillion

Re: Reading next line, finding missing number in sequence

> Hello,
> I know this has been discussed before and I've tried some of the
> solutions too.  But I've been unsuccessful so far.  I have a simple
> text file of numbers, one on each line and just need a script that
> will find the missing number.  Example of file:
> 1
> 2
> 3
> 5
>
> My script:
>
> open (FILE, "FILE.txt");
> open (MISSING, ">Missing.txt");
>
> while (<FILE>) {
>    \$currline = \$_;
>    \$nxtline=\$currline++ ;
>    if (\$_ != \$nxtline) {
>        print MISSING "Missing occurrence is \$_  \n";
>    }
> }
>
> close FILE;
> close MISSING;
>
> This doesn't identify that 4 is missing.  Any ideas?

Yes. You have a logic error.

1. \$currline = \$_;   # this sets \$currline to the value just read
2. \$nxtline = \$currline++; # this sets \$nxtline to the same value
3. if( \$_ != \$nxtline ) {  # this compares \$nxtline to the same value

In other words, \$_ and \$nxtline are always the same.

The postfix increment operator (\$x++) returns the value _before_
incrementing. The prefix operator (++\$x) returns the value _after_
incrementing. However, using the prefix version won't fix your problem,
either. You need to compare each line to one greater than the previous
line (untested):

my \$expected = 1;
while(<FILE>) {
if( \$_ != \$expected ) {
print MISSING "Missing ...";
}
\$expected = \$_ + 1;
}

Re: Reading next line, finding missing number in sequence

wrote:

> my \$expected = 1;
> while(<FILE>) {
>   if( \$_ != \$expected ) {
>     print MISSING "Missing ...";
>   }
>   \$expected = \$_ + 1;
> }

But consider this file:

1
2
3
5
6
7
9
10

This script will print:

Missing 4...
Missing 5...
Missing 6...
Missing 7...
Missing 8...

(Assuming, of course, that you modified it to print the missing number.)

Brian

-----

(\$a='%Q\$yW0se3%qhggfIi')=~s,([f-y]),qq;"\c\$1";,ege,@l=unpack'a5a5a*',\$a;for\$i(
@l){\$\$i.=sprintf"%lx",\$_ for
unpack'C*',\$i;push@n,\$\$i;}\$"=',',\$_="\c`",\$p=eval"
pack'VVN',@n",@b=unpack'C12',\$p;\$m=4054314,\$a=96;(++\$a,\$m>>=1)&1?s@\$@chr\$a-!(\$a
%6-4)*32@e:\$;while\$m;@z=split m
&&;for\$j(@b)

Re: Reading next line, finding missing number in sequence

> wrote:
>
> > my \$expected = 1;
> > while(<FILE>) {
> >   if( \$_ != \$expected ) {
> >     print MISSING "Missing ...";
> >   }
> >   \$expected = \$_ + 1;
> > }
>
> But consider this file:
>
> 1
> 2
> 3
> 5
> 6
> 7
> 9
> 10
>
> This script will print:
>
> Missing 4...
> Missing 5...
> Missing 6...
> Missing 7...
> Missing 8...
>

Hmm... I get only:

Missing 4
Missing 8

which is correct, isn't it?

> (Assuming, of course, that you modified it to print the missing number.)

I modified it thus:

my \$expected = 1;
while(<DATA>) {
if( \$_ != \$expected ) {
print "Missing \$expected\n";
}
\$expected = \$_ + 1;
}

__DATA__
1
2
3
5
6
7
9
10

>
> Brian
>
> -----
>
>
(\$a='%Q\$yW0se3%qhggfIi')=~s,([f-y]),qq;"\c\$1";,ege,@l=unpack'a5a5a*',\$a;for
\$i(
> @l){\$\$i.=sprintf"%lx",\$_ for
> unpack'C*',\$i;push@n,\$\$i;}\$"=',',\$_="\c`",\$p=eval"
>
pack'VVN',@n",@b=unpack'C12',\$p;\$m=4054314,\$a=96;(++\$a,\$m>>=1)&1?s@\$@chr\$a-!
(\$a
> %6-4)*32@e:\$;while\$m;@z=split m
> &&;for\$j(@b)

Re: Reading next line, finding missing number in sequence

Thank you, Brian.  I used your suggestion and modification and it
worked well, except when there were two numbers in a row missing.  For
example, if I had
1
2
5
6
7

It would list 3 as the missing number but not 4.  I can work with this
though.  Thanks a bunch.

Tara

> > I modified it thus:
>
> my \$expected = 1;
> while(<DATA>) {
>   if( \$_ != \$expected ) {
>     print "Missing \$expected\n";
>   }
>  \$expected = \$_ + 1;
> }
>
> __DATA__
> 1
> 2
> 3
> 5
> 6
> 7
> 9
> 10
>
> > Brian

Re: Reading next line, finding missing number in sequence

> Thank you, Brian.  I used your suggestion and modification and it
> worked well, except when there were two numbers in a row missing.  For
> example, if I had
> 1
> 2
> 5
> 6
> 7
>
> It would list 3 as the missing number but not 4.  I can work with this
> though.  Thanks a bunch.
>
> Tara
>
> > > I modified it thus:
> >
> > my \$expected = 1;
> > while(<DATA>) {
> >   if( \$_ != \$expected ) {
> >     print "Missing \$expected\n";
> >   }
> >  \$expected = \$_ + 1;
> > }

Here's an idea

my \$expected = 1;
while(<DATA>) {
for(;\$expected < \$_;++\$expected) {
print "Missing \$expected\n";
}
\$expected=\$_+1;
}

Some of the other solutions posted here, handle this scenario as well,
though.

> >
> > __DATA__
> > 1
> > 2
> > 3
> > 5
> > 6
> > 7
> > 9
> > 10
> >
> > > Brian

Re: Reading next line, finding missing number in sequence

That one works perfectly.
Thank you!

> Here's an idea
>
> my \$expected = 1;
> while(<DATA>) {
>   for(;\$expected < \$_;++\$expected) {
>    print "Missing \$expected\n";
>   }
>  \$expected=\$_+1;
> }
>
> Some of the other solutions posted here, handle this scenario as well,
> though.
>

Re: Reading next line, finding missing number in sequence

Pea said to us:

> Thank you, Brian.  I used your suggestion and modification and it
> worked well, except when there were two numbers in a row missing.
[...]

As a learning experience, Bowsayge created a program that seems
to be able to list the missing numbers from a range:

#!/usr/bin/perl
use strict;
use warnings;

chomp (my @numbers = <DATA>);
s/\D+//g for (@numbers);
@numbers = sort { \$a <=> \$b } @numbers;
my (\$min, \$max) = (\$numbers[0], \$numbers[\$#numbers]);
my %hash = map +(\$_, 1), @numbers;
my @missing = grep !defined(\$hash), (\$min..\$max);

printf "%-20s %s\n", 'numbers', 'missing';
printf "%-20s %s\n", "@numbers", "@missing";

__DATA__
1
6
2e
5
what9fun
7

--
my (@str) = split //,'rroeiaptbJeleuwhet sn.n  ';
my (@ndx, @arr) = qw(15 11 7 23 22 5 13 8 21 0 19 16 14
1 20 9 10 3 4 2 6 24 18 12 17);
\$arr[\$ndx[\$_]] = \$str[\$_] for (@ndx); print @arr, "\n";

Re: Reading next line, finding missing number in sequence

> Pea said to us:
>
> > Thank you, Brian.  I used your suggestion and modification and it
> > worked well, except when there were two numbers in a row missing.
> [...]
>
> As a learning experience, Bowsayge created a program that seems
> to be able to list the missing numbers from a range:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
>
> chomp (my @numbers = <DATA>);
> s/\D+//g for (@numbers);

Ignoring non-digits is an extra feature.  It may be useful, but maybe not.
There is no good reason to bring it in here.

> @numbers = sort { \$a <=> \$b } @numbers;

You are sorting the numbers only to get their minimum and maximum (the
rest of the algorithm doesn't need them sorted).  In general, that is
wasteful, especially when the lists are long.  The standard module
List::Util has functions that find the minimum and maximum in linear
time.  For this example you could dodge the issue and simply assume
the numbers come sorted.

> my (\$min, \$max) = (\$numbers[0], \$numbers[\$#numbers]);

\$numbers[\$#numbers] can be written as \$numbers[-1].

> my %hash = map +(\$_, 1), @numbers;
> my @missing = grep !defined(\$hash), (\$min..\$max);

You have taken care to set the hash values to 1, so defined() is not
necessary.

> printf "%-20s %s\n", 'numbers', 'missing';
> printf "%-20s %s\n", "@numbers", "@missing";

[snip data]

Your code shows a very plausible use of a hash.  In general, a hash
is the structure of choice when the problem can be expressed in terms
of sets.  The set elements get to be the hash keys (or the keys a hash
gets probed for).  The values are of little importance in this application
of hashes.  Your code is a good example.

The sets in this case are the integers in a range, and some (explicitly
given) subset thereof.  The problem is to find the set difference.
With sets of integers it can be of advantage to use arrays for the
representation instead of hashes, especially if the integers are small.
Arrays use substantially less storage and are a little faster than
hashes.  You may find it instructive to re-write your code to use
an array.  You will have to change very little, except for the "map"
line.

Going a step farther in storage conservation, a bit vector could be
used.  It is the most compact way to store a set of small integers.
With @numbers, \$min and \$max being set:

my \$set = '';
vec( \$set, \$_, 1) = 1 for @numbers;
my @missing = grep !vec( \$set, \$_, 1), \$min .. \$max;

Again, there isn't much change from your code to this variant.

Anno

Re: Reading next line, finding missing number in sequence

Anno Siegel wrote:
>>
>>my (\$min, \$max) = (\$numbers[0], \$numbers[\$#numbers]);
>
> \$numbers[\$#numbers] can be written as \$numbers[-1].

And (\$numbers[0], \$numbers[\$#numbers]) can be written as @numbers[0,-1]

John
--
use Perl;
program
fulfillment

Re: Reading next line, finding missing number in sequence

Anno Siegel said to us:

[...]
> Arrays use substantially less storage and are a little faster than
> hashes.  You may find it instructive to re-write your code to use
> an array.

Yes, it works with an array too.

> Going a step farther in storage conservation, a bit vector could be
> used.
[...]
>
>     my \$set = '';
>     vec( \$set, \$_, 1) = 1 for @numbers;
>     my @missing = grep !vec( \$set, \$_, 1), \$min .. \$max;
>
> Again, there isn't much change from your code to this variant.
>
> Anno

Thanks, the vector works like a charm too.

Re: Reading next line, finding missing number in sequence

wrote:

>> On Tue, 10 Aug 2004 16:14:50 -0700, Jim Gibson
>> wrote:
>>
>> > my \$expected = 1;
>> > while(<FILE>) {
>> >   if( \$_ != \$expected ) {
>> >     print MISSING "Missing ...";
>> >   }
>> >   \$expected = \$_ + 1;
>> > }
>>
>> But consider this file:
>>
>> 1
>> 2
>> 3
>> 5
>> 6
>> 7
>> 9
>> 10
>>
>> This script will print:
>>
>> Missing 4...
>> Missing 5...
>> Missing 6...
>> Missing 7...
>> Missing 8...
>>
>
> Hmm... I get only:
>
> Missing 4
> Missing 8
>
> which is correct, isn't it?

Yeah, I guess you're right. Apparently when I looked at that code
yesterday I thought I saw
\$expected++;
instead of
\$expected = \$_ + 1;

So never mind. :)

Brian

--
(\$a='%Q\$yW0se3%qhggfIi')=~s,([f-y]),qq;"\c\$1";,ege,@l=unpack'a5a5a*',\$a;for\$i(
@l){\$\$i.=sprintf"%lx",\$_ for
unpack'C*',\$i;push@n,\$\$i;}\$"=',',\$_="\c`",\$p=eval"
pack'VVN',@n",@b=unpack'C12',\$p;\$m=4054314,\$a=96;(++\$a,\$m>>=1)&1?s@\$@chr\$a-!(\$a
%6-4)*32@e:\$;while\$m;@z=split m
&&;for\$j(@b)

Re: Reading next line, finding missing number in sequence

>>(Assuming, of course, that you modified it to print the missing number.)

I would expect to print out all the missing numbers when the increase
is more than two.  Why limit it to always start at 1?

linux% cat count.pl
#!/usr/bin/perl

my \$expected;
while(<DATA>) {
if (defined \$expected) {
die "Bad data: expected=\$expected, read=\$_" if \$_ < \$expected;
print "Missing ",\$expected++,"\n" while \$expected < \$_;
}
\$expected = \$_ + 1;
}

__DATA__
2
3
5
9
10
linux% perl count.pl
Missing 4
Missing 6
Missing 7
Missing 8

-Joe

Re: Reading next line, finding missing number in sequence

> open (FILE, "FILE.txt");
> open (MISSING, ">Missing.txt");
>
> while (<FILE>) {
>    \$currline = \$_;
>    \$nxtline=\$currline++ ;
>    if (\$_ != \$nxtline) {
>        print MISSING "Missing occurrence is \$_  \n";
>    }
> }
>
> close FILE;
> close MISSING;

In your script, you set \$currline equal to \$_, then set \$nxtline equal to
\$currline, then increment \$currline by one, then compare \$_ to \$nxtline.
Since \$nxtline gets its value from \$currline (before \$currline is
incremented), and \$currline gets its value from \$_, \$_ will always equal
\$nxtline.

What you need to do is write a textual description of how to solve the
problem, then try to translate that into Perl.

Here's an idea, assuming the numbers in the file are in order from least
to greatest, and no number is repeated:

1. Open the file.
2. Read the first line from the file.
3. Set \$prev_line_num to the line number you read in step 2.
4. Read the next line from the file, if there is one.
5. Set \$line_num to the line number you read in step 4.
6. Compare \$line_num to (\$prev_line_num + 1).
7. If they are different, then you're missing all numbers from
(\$prev_line_num + 1) to (\$line_num - 1).
8. Set \$prev_line_num to the value of \$line_num.
9. Go to step 4.

So in Perl:

#####

open FILE, "<FILE.txt" or die "Can't open FILE.TXT: \$!\n";
open MISSING, ">Missing.txt" or die "Can't open Missing.txt: \$!\n";

\$prev_line_num = <FILE>;
while (\$line_num = <FILE>) {
if (\$line_num != \$prev_line_num + 1) {
print MISSING "Missing occurrence is \$_\n" for (\$prev_line_num
+ 1) .. (\$line_num - 1);
}
\$prev_line_num = \$line_num;
}

close FILE;
close MISSING;

#####

In fact, you don't really need that if statement in there; the for loop is
enough, since it will loop 0 times if there are no missing line numbers.
But it's in there for clarity.

Brian

-----

(\$a='%Q\$yW0se3%qhggfIi')=~s,([f-y]),qq;"\c\$1";,ege,@l=unpack'a5a5a*',\$a;for\$i(
@l){\$\$i.=sprintf"%lx",\$_ for
unpack'C*',\$i;push@n,\$\$i;}\$"=',',\$_="\c`",\$p=eval"
pack'VVN',@n",@b=unpack'C12',\$p;\$m=4054314,\$a=96;(++\$a,\$m>>=1)&1?s@\$@chr\$a-!(\$a
%6-4)*32@e:\$;while\$m;@z=split m
&&;for\$j(@b)

Re: Reading next line, finding missing number in sequence

> Hello,
> I know this has been discussed before and I've tried some of the
> solutions too.  But I've been unsuccessful so far.  I have a simple
> text file of numbers, one on each line and just need a script that
> will find the missing number.  Example of file:
> 1
> 2
> 3
> 5
>
> My script:
>

Where are strictures and warnings?

use strict;
use warnings;

> open (FILE, "FILE.txt");

Always check that your file even gets opened:

open(my \$infile, '<', 'FILE.txt') or die "Could not open the input file:
\$!";

> open (MISSING, ">Missing.txt");
>

Once again:

open(my \$outfile, '>', 'Missing.txt') or die "Could not open the output
file: \$!";

> while (<FILE>) {
>    \$currline = \$_;
>    \$nxtline=\$currline++ ;
>    if (\$_ != \$nxtline) {
>        print MISSING "Missing occurrence is \$_  \n";
>    }
> }
>

In the above, you assign the value of the line to \$currline, then add 1 and
assign it to \$nxtline. You then test whether the value on the input line
equals the number you just incremented? This should fail for *all* cases
(i.e., 1+1 != 1, 2+1 != 2, etc.).:

my \$cnt = 1;

while (my \$num = <\$infile>) {

if (\$num != \$cnt) {

print \$outfile "Missing number(s): ";

for ((\$num - (\$num - \$cnt)) .. (\$num - 1)) {
print \$outfile "\$_,";
}

print \$outfile " at line \$.\n";

\$cnt = \$num;
}

\$cnt++;
}

Matt

Re: Reading next line, finding missing number in sequence

On Tue, 10 Aug 2004 19:46:26 -0400, Matt Garrish

>> while (<FILE>) {
>>    \$currline = \$_;
>>    \$nxtline=\$currline++ ;
>>    if (\$_ != \$nxtline) {
>>        print MISSING "Missing occurrence is \$_  \n";
>>    }
>> }
>>
>
> In the above, you assign the value of the line to \$currline, then add 1
> and
> assign it to \$nxtline. You then test whether the value on the input line
> equals the number you just incremented? This should fail for *all* cases
> (i.e., 1+1 != 1, 2+1 != 2, etc.).:

Not quite; \$nxtline will get the value of \$currline before the increment.
So it will *succeed* for all cases.

Brian

--
(\$a='%Q\$yW0se3%qhggfIi')=~s,([f-y]),qq;"\c\$1";,ege,@l=unpack'a5a5a*',\$a;for\$i(
@l){\$\$i.=sprintf"%lx",\$_ for
unpack'C*',\$i;push@n,\$\$i;}\$"=',',\$_="\c`",\$p=eval"
pack'VVN',@n",@b=unpack'C12',\$p;\$m=4054314,\$a=96;(++\$a,\$m>>=1)&1?s@\$@chr\$a-!(\$a
%6-4)*32@e:\$;while\$m;@z=split m
&&;for\$j(@b)

Re: Reading next line, finding missing number in sequence

> On Tue, 10 Aug 2004 19:46:26 -0400, Matt Garrish
>
> >> while (<FILE>) {
> >>    \$currline = \$_;
> >>    \$nxtline=\$currline++ ;
> >>    if (\$_ != \$nxtline) {
> >>        print MISSING "Missing occurrence is \$_  \n";
> >>    }
> >> }
> >>
> >
> > In the above, you assign the value of the line to \$currline, then add 1
> > and
> > assign it to \$nxtline. You then test whether the value on the input line
> > equals the number you just incremented? This should fail for *all* cases
> > (i.e., 1+1 != 1, 2+1 != 2, etc.).:
>
> Not quite; \$nxtline will get the value of \$currline before the increment.
> So it will *succeed* for all cases.
>

First Dr. Seuss and now this. I need a vacation...

Matt

Re: Reading next line, finding missing number in sequence

>
>
>       for ((\$num - (\$num - \$cnt)) .. (\$num - 1)) {
>

That, of course, would be the long way of writing:

for (\$cnt .. (\$num -1)) {

I was trying something else and never looked twice at it when I switched
gears... : )

Matt

Re: Reading next line, finding missing number in sequence

> Hello,
> I know this has been discussed before and I've tried some of the
> solutions too.  But I've been unsuccessful so far.  I have a simple
> text file of numbers, one on each line and just need a script that
> will find the missing number.  Example of file:
> 1
> 2
> 3
> 5
>
> My script:
>
> open (FILE, "FILE.txt");
> open (MISSING, ">Missing.txt");
>
> while (<FILE>) {
>    \$currline = \$_;
>    \$nxtline=\$currline++ ;
>    if (\$_ != \$nxtline) {
>        print MISSING "Missing occurrence is \$_  \n";
>    }
> }
>
> close FILE;
> close MISSING;

my @data = <FILE>;
my @missing = 0 .. \$data[ -1];
@missing[ @data] = ();
print "@{[ grep defined, @missing]}\n";

That reports 0 as missing too, but that's easy to correct.  A variant
delivers the missing elements without intervening undef's:

my @data = <FILE>;
my @missing = 0 .. \$data[ -1];
splice @missing, \$_, 1 for reverse @data;
print "@missing\n";

Anno