# fuzzy problem - Page 3

•  Subject
• Author
• Posted on

## Re: fuzzy problem

"That's how it was supposed to be".

## Re: fuzzy problem

On 7/8/2015 01:21, Rainer Weikusat wrote:

should not.
even if I post a solution was my post never exist.
anyway here is a more generic one pass

#!/usr/bin/perl
use strict; use warnings; use feature 'say';

use Data::Dumper;
my %result;

%result = MaxPatternLength('ab0001111abab', 'ab', '00', '1');
print Dumper \%result;

%result = MaxPatternLength('aaaaa', 'aa', 'a');
print Dumper \%result;

#   string, pattern1, pattern2, ....
sub MaxPatternLength
{
return {} if 0 >= \$#_;
my %hash;
@hash=1;
my @pattern = sort {length \$b <=> length \$a} keys %hash;
my \$reg     = join '|', @pattern;
\$reg        = qr/(?:\$reg)/;
my \$last    = '';
my %max;
my %cur;

map { \$max = \$cur = 0 } @pattern;

while (\$_[0] =~/\$reg/g)
{
++\$cur > \$max and \$max = \$cur;

\$cur = 1 if \$last ne \$&;

\$last = \$&
}
%max
}

## Re: fuzzy problem

No, it should because I intentionally (look that up if you don't
understand the meaning) wrote a subroutine searching for a single (look
that up as well) argument. You - as intentionally - tried to write one
looking for more than one pattern in parallell, however, you could as
well claim that my code is deficient because it doesn't return PI while

sub georgeit
{
log(1868.68078718655) / log(11);
}

[please not that I'm trying to match you 'obfuscate in order to make it
look more thant it actually is' style]

would at least be accurate to four decimal places.

[...]

Leaving all the other issue aside (this will fail miserably if any of
the patterns contains a |), this doesn't work: It should return aa => 2
and a => 5 for the second case, however, since the aa are eating the a,
it returns aa => 2, a => 1. Not to mention that it doesn't return the
longest substring at all.

## Re: fuzzy problem

you have to make some decisions in order to provide solutions to
difficult cases
e.g. if the longest is should be in precedence of the shortest
so everything is correct

also the results are correct.

## Re: fuzzy problem

If 'correct results' are defined as 'whatever the code happens to
return', it must obviously be correct but the term makes no sense
then. Otherwise, you'll have to produce some kind of description what it
is supposed to do which could be used to verify that it actually does
that. For the given problem, which I understood as 'find the longest
substring composed of a repeating pattern of ...' (to determine its
length), your code obviously doesn't work because the patterns passed to
your subroutine are not tested independent of each other.

## Re: fuzzy problem

On 7/8/2015 12:05, Rainer Weikusat wrote:

if you want the lengths you could use the following.
But the number of concurent founds for is much more logical
--------------------------------

#!/usr/bin/perl
use strict; use warnings; use feature 'say';

my %result;

%result = MaxPatternLength('ab0001111a|babab', 'a|b', '00', '1');
foreach (keys %result) { say "\$_ = \$result" }

%result = MaxPatternLength('aaaaa', 'aa', 'a');
foreach (keys %result) { say "\$_ = \$result" }

#   string, pattern1, pattern2, ....
sub MaxPatternLength
{
return if 0 >= \$#_;
my \$last = '';
my %max;
my %cur;
my \$reg;
my %hash;
@hash=1;

foreach (sort {length \$b <=> length \$a} keys %hash) {
\$max = \$cur = 0;
\$reg .= "\Q\$_\E|"
}

chop \$reg;
\$reg = qr/(?:\$reg)/;

while (\$_[0] =~/\$reg/g) {
++\$cur > \$max and \$max = \$cur;
\$cur = 1 if \$last ne \$&;
\$last = \$&
}

map { \$_ , \$max * length \$_ } keys %max
}

## Re: fuzzy problem

[...]

Since this is not related to anything else posted in this thread, I'll
treat it as 'code someone wrote and posted for review'.

[...]

This check belongs into the caller: It's pointless to call the
subroutine unless it's supposed to do something, hence, it shouldn't be
called unless it was supposed to do something.

Dito for eliminating duplicates in this way.

You can use qr// to turn the input into something which can be
interpolated into a regex as sub-pattern, thus enabling (to the best of
my knowledge) the caller to use a regex instead of a plain string. This
still won't play well with assemling an alternatives match in this way
but that's a different conversation and not actually necessary unless
this features is needed for something. Using qr// to 'pre-compile' the
regex in this way is pointless: While the documentation suggests that
this improves the speed of matching if the regex is held in a variables,
past experiments have confirmed that this isn't really the case: perl
doesn't recompile the regex unless the string changes and using this
built-in caching is faster than doing it by hand via qr//.

Also already pointed out: The documentation for \$& states

The use of this variable anywhere in a program imposes a
considerable performance penalty on all regular expression
matches.

I didn't confirm this myself but if it is true, this means \$& ought to
be avoided, especially in general-purpose subroutines, unless it is
known that there are no other regex matches which could be impaired by
it.

I'm also going to be so unkind to point out that both joining the bits
via | and reverse-sorting by length in order to find longer matches in
preference to short ones could have been nicked from past postings of me
(and possibly other people as well since that's not exactly rocket
science).

## Re: fuzzy problem

[...]

Since I was curious, I did now test this. Running the code below with
amp as first argument,

-------
use Benchmark;

if (\$ARGV[0] =~ /amp/) {
eval('print("\$&\n")');
}

timethese(-3,
{
match => sub {
\$ARGV[0] =~ /.*/;
}});
--------

caused the average speed of the regex match (4 runs) drop from
5163837.655/s to 4084284.7625/s, a fall of about 21% (relative to the
higher one), surely nothing to worry about in absolute terms (1.94e-7s
for a match vs 2.45e-7s) but nevertheless quite measurable.

## Re: fuzzy problem

On Sunday, August 9, 2015 at 3:29:18 AM UTC-7, Rainer Weikusat wrote:

Perl introduced  the /p switch to avoid the global speed penalty.
From perlre:

As a workaround for this problem, Perl 5.10.0 introduces
"\$", "\$" and "\$", which are equivalent
to "\$`", \$& and "\$'", except that they are only guaranteed to be
defined after a successful match that was executed with the "/p"
(preserve) modifier.  The use of these variables incurs no global
performance penalty, unlike their punctuation char equivalents, however
at the trade-off that you have to tell perl when you want to use them.
--
Charles DeRykus

## Re: fuzzy problem

Sometimes I use subroutine prototypes , other times I do just a quick
check . do you find also prototypes as a bad idea ?

this is the faster method to remove duplicates from a list. Of course
should not be any, so it can be removed

well yes in general, I will have a look how multiple user input regexs
can be implemented in one pass

Correct. replacing  with   /\$reg/gp   and  \$cur} ... is faster

can I grand the copyright for "+" ?

## Re: fuzzy problem

When (ab-)using prototypes for 'was the correct number of arguments
passed' checks, the compiler will flag non-compliant it encounters while
turning the source code in an optree (a set of optrees, actually). This
is a so-called 'static analysis' which happens once before any program
code is executed. OTOH, the runtime check will occur every time when the
function is called, an added cost guaranteed to be useless almost all of
the time, and it will only find erroneous calls in code which is
actually executed, ie, there might be bugs lingering on rarely used
codepaths which then suddenly bite for a very long time. Unless it is
known that the calling code can cope with this, silently ignoring the
nonsense call is also a very bad strategy as this may lead to the
program working along with bad data.

[...]

The way you framed this suggested some kind of 'revolutionary, novel
idea' in your code but there wasn't any, as I pointed out above. This
approach is not suitable for solving the 'find the longest sequence of
...' problem for multiple ... arguments unless it is guaranteed that no
argument is a prefix of another argument. This was true for the
originally posted problem but something similar to

-------
my \$str = '111100011100101011111111111110101000011';

my \$l;
length() > \$l and \$l = length() for \$str =~ /0+|1+/g;

print("\$l\n");
------

## Re: fuzzy problem

On 8/4/2015 8:37 AM, bubunia2000ster@gmail.com wrote:

Normally I wouldn't respond to a post from 1/2 year ago, but I'm bored,
so what the heck.

Others here seem to be concentrating on pattern-based solutions,
but I tend use a finite state machine to solve a problem like that,
because all we're really interested in is the relationship between
"current character" and "previous character" as we read through
the string from left to right.

With that in mind, I wrote a program that solves the problem by
handling all 9 possible combinations of "current/previous":
zero  / zero
zero  / one
zero  / other
one   / zero
one   / one
one   / other
other / zero
other / one
other / other

Having programmed that, I already hate it; it's too long and plodding
and hard to read, so I'm not even going to post it here.

Instead, I'll post my "Version 2", which does the same thing using
patterns, split, map, and sort:

#! /usr/bin/perl
# "zeros-ones-V2.perl"
use strict;
use warnings;
our \$string = \$ARGV[0];
our @zeros  = split /[^0]+/ , \$string , -1;
our @ones   = split /[^1]+/ , \$string , -1;
our \$max_zer_cnt = (sort map @zeros )[0];
our \$max_one_cnt = (sort map @ones  )[0];
print "Max zeros count = ", \$max_zer_cnt, "\n";
print "Max ones  count = ", \$max_one_cnt, "\n";

Results:

%zeros-ones-V2.perl '1000000111111110101010100000000'
Max zeros count =  8
Max ones  count =  8

%zeros-ones-V2.perl '0030111fji000001   ejsflj asdf 1111001001 '
Max zeros count =  5
Max ones  count =  4

%zeros-ones-V2.perl '1000 011AA11010QGD10    101000001111111111'
Max zeros count =  5
Max ones  count =  10

--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'
http://www.well.com/user/lonewolf/

## Re: fuzzy problem

On Saturday, 6 February 2016 18:32:49 UTC+5:30, Robbie Hatley  wrote:
[snip]

Actually a regex is in itself a finite state machine, and we can use it to compute our desired results. Also, we need to keep track of just 2 transitions
1->0 and 0->1.

perl -Mstrict -Mwarnings -le '
my \$string = \$ARGV[0] // die "Insufficient args";
local \$b;
my @m = (0) x 2; # holds the max counts
1 while \$string =~ m{
(?{ local \$a = 1 })
(?:(0)|(1))(?:(?(1)0|1)(?{ \$a++ }))*
(?{
*b = \$m[\$1//\$2];
(\$a > \$b) && (\$b = \$a);
})
(?=(?(1)1|0)|\$)
}xg;
print \$string;
print "max overall=", \$b//0;
print "max zeros count=", \$m[0];
print "max ones  count=", \$m[1];
' \$

## Re: fuzzy problem

On 2/6/2016 12:33 PM, sharma__r@hotmail.com wrote:

Hmmm. Sort of.

Aye.

Actually, the OP's problem was (in his words):

> I have a string which contains binaries(1s and 0s).
> The objective is to find the maximum number of consecutive
> 1's/0's in the bitvector using perl.

He said his string contains 0s and 1s, but he never said it
didn't contain any other characters! So if we want to solve
the problem as he stated it, we have to handle "other".
My program does that by making two splits:

First  split uses "other than 0" as separator:
our @zeros  = split /[^0]+/ , \$string , -1;
This yields an array of all clusters of 0.

Second split uses "other than 1" as separator:
our @ones = split /[^1]+/ , \$string , -1;
This yields an array of all clusters of 1.

Thus "other" is handled implicitly, rather than searching
directly for clusters of "other" characters. However, I could
easily also separate out all clusters of "characters other than [01]":
our @others = split /[01]+/ , \$string , -1;
Which yields an array of all clusters of non-0, non-1 characters.
However, the OP's problem as stated doesn't ask for this info.

It would take me a while to analyze that RE. But if it's not
accounting for transitions such as "jeu111" or "000pex"
(strings which the OP's problem does not preclude and your
program does not reject) then it's not going to give correct
results for input strings containing characters other than [01].

Whereas the following (updated) version of my program handles
all mixes of 0, 1, other and gives max cluster lengths for all:

#! /usr/bin/perl
# "zeros-ones-V2.perl"
use feature 'say';
use strict;
use warnings;
our \$string = \$ARGV[0];
our @zeros  = split /[^0]+/ , \$string , -1; # Clusters of 0
our @ones   = split /[^1]+/ , \$string , -1; # Clusters of 1
our @others = split /[01]+/ , \$string , -1; # Clusters of "other"
say "Max zeros  count = ", (sort map @zeros )[0];
say "Max ones   count = ", (sort map @ones  )[0];
say "Max others count = ", (sort map @others)[0];

Results:

%zeros-ones-V2.perl '1000000111111110101010100000000'
Max zeros  count = 8
Max ones   count = 8
Max others count = 0

%zeros-ones-V2.perl '0030111fji000001   ejsflj asdf 1111001001 '
Max zeros  count = 5
Max ones   count = 4
Max others count = 15

%zeros-ones-V2.perl '1000 011AA11010QGD10    101000001111111111'
Max zeros  count = 5
Max ones   count = 10
Max others count = 4

--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'
http://www.well.com/user/lonewolf/

## Re: fuzzy problem

On Sunday, 7 February 2016 04:08:19 UTC+5:30, Robbie Hatley  wrote:
[snip]

>
> > Also, we need to keep track of just 2 transitions 1->0 and 0->1.
>
> Actually, the OP's problem was (in his words):
>
>  > I have a string which contains binaries(1s and 0s).
>  > The objective is to find the maximum number of consecutive
>  > 1's/0's in the bitvector using perl.
>
> He said his string contains 0s and 1s, but he never said it
> didn't contain any other characters! So if we want to solve
> the problem as he stated it, we have to handle "other".
> My program does that by making two splits:
>

I think you did not run the code I posted. Irrespective of non-1s/0s it ext
racts out islands of 1s and 0s from the string. That's where the regular ex
pressions come in the picture!

While looking at this problem imagine the string to be like a digital clock
waveform and the only points of interest are those where the clock makes a
0->1 or a 1->0 transition. The rest of the details will be filled in by th
e regex. e.g., the the 0->0, 1->1 will be subsumed in the * quantifier of t
he regex.

e.g., 001100011110000000 -> __/''\___/''''\_______

>
> It would take me a while to analyze that RE. But if it's not
> accounting for transitions such as "jeu111" or "000pex"
> (strings which the OP's problem does not preclude and your
> program does not reject) then it's not going to give correct
> results for input strings containing characters other than [01].
>
You need to run my code with these inputs. Like I said earlier, it doesn't
matter for non-1, non-0 inputs, as in a regex when I say /[01]/ the regex e
ngine
jumps over the non-1,non-0 inputs in the string anyway, so that feature is
builtin into the regex engine & we make use of that here.

As you yourself noted the OP did not want the max count of others,
hence I did not generate that.

Here's an updated version of the code to compute the various stats:

perl -Mstrict -Mwarnings -le '

my \$string = \$ARGV[0] // die "[ERROR] Insufficient arguments";

my @m = (0) x 4; # holds the max counts

1 while \$string =~ m{
(?:
(?: # run of 1s/0s
(?{ local \$a = 1 })             # init kount for length run o
f 1s/0s
(?:(0)|(1))(?>(?(1)0|1)(?{ \$a++ }))*  # compute length of 1s/0s
(?{                                   # compute max of run of 1
s/0s
local *b = \$m[\$1//\$2];
\$a > \$b and \$b = \$a;
\$b > \$m[2] and \$m[2] = \$b;
})
) |
(?: # run of others
(?{ local \$a = 1 })         # init kount for length of run of
others
(?:[^01])(?>[^01](?{ \$a++ }))*    # compute length of run of ot
hers
(?{ \$a > \$m[3] and \$m[3] = \$a })  # compute max of run of oth
ers
)
)
(?{ \$m[4] = (\$m[3] > \$m[2]) ? \$m[3] : \$m[2] }) # compute the overal
l max
}xg;

print \$string;
print "max 0s run=",       \$m[0];
print "max 1s run=",       \$m[1];
print "max of 0s/1s run=", \$m[2];
print "max others run=",   \$m[3];
print "max overall=",      \$m[4];

' \$

Results:
% perl -e '...' '1000000111111110101010100000000'
max 0s run=8
max 1s run=8
max of 0s/1s run=8
max others run=0
max overall=8

% perl -e '...'  '0030111fji000001   ejsflj asdf 1111001001 '
max 0s run=5
max 1s run=4
max of 0s/1s run=4
max others run=15
max overall=15

% perl -e '...' '1000 011AA11010QGD10    101000001111111111'
max 0s run=5
max 1s run=10
max of 0s/1s run=10
max others run=4
max overall=10

## Re: fuzzy problem

On 2/7/2016 12:36 AM, sharma__r@hotmail.com wrote:

True. Let's do that. But I'll put it into a file. I don't even
know how to run it with an argument the way it is.

#! /usr/bin/perl
#  /rhe/scripts/test/sharma-zero-one-V1.perl
use v5.022;
use strict;
use warnings;
my \$string = \$ARGV[0] // die "Insufficient args";
local \$b;
my @m = (0) x 2; # holds the max counts
1 while \$string =~ m{
(?{ local \$a = 1 })
(?:(0)|(1))(?:(?(1)0|1)(?{ \$a++ }))*
(?{
*b = \$m[\$1//\$2];
(\$a > \$b) && (\$b = \$a);
})
(?=(?(1)1|0)|\$)
}xg;
say \$string;
say "max overall=", \$b//0;
say "max zeros count=", \$m[0];
say "max ones  count=", \$m[1];

%sharma-zero-one-V1.perl '1000000111111110101010100000000'
1000000111111110101010100000000
max overall=8
max zeros count=8
max ones  count=8

Aragorn@Ketch
/rhe
%sharma-zero-one-V1.perl '0030111fji000001   ejsflj asdf 1111001001 '
0030111fji000001   ejsflj asdf 1111001001
max overall=4
max zeros count=5
max ones  count=4

Aragorn@Ketch
/rhe
%sharma-zero-one-V1.perl '1000 011AA11010QGD10    101000001111111111'
1000 011AA11010QGD10    101000001111111111
max overall=10
max zeros count=5
max ones  count=10

Ok, that all looks correct, except that the "max overall" is wrong for
the second test case. (Should be 5, not 4, right?)

Hmmm. I'm not seeing how that could work, because if you just "skip over"
a character such as 'A' in the center of '11111A11111' it would give you
a string of 10 ones, which isn't correct. And yet your code does work
correctly in that test case:

%sharma-zero-one-V1.perl '100101000011111AAA11111'
100101000011111AAA11111
max overall=5
max zeros count=4
max ones  count=5

Ok, let me file that:

#! /usr/bin/perl
#  /rhe/scripts/test/sharma-zero-one-V2.perl
use v5.022;
use strict;
use warnings;
my \$string = \$ARGV[0] // die "[ERROR] Insufficient arguments";
my @m = (0) x 4; # holds the max counts
1 while \$string =~ m{
(?:
(?: # run of 1s/0s
(?{ local \$a = 1 })             # init kount for length run of 1s/0s
(?:(0)|(1))(?>(?(1)0|1)(?{ \$a++ }))*  # compute length of 1s/0s
(?{                                   # compute max of run of 1s/0s
local *b = \$m[\$1//\$2];
\$a > \$b and \$b = \$a;
\$b > \$m[2] and \$m[2] = \$b;
})
) |
(?: # run of others
(?{ local \$a = 1 })         # init kount for length of run of others
(?:[^01])(?>[^01](?{ \$a++ }))*    # compute length of run of others
(?{ \$a > \$m[3] and \$m[3] = \$a })  # compute max of run of others
)
)
(?{ \$m[4] = (\$m[3] > \$m[2]) ? \$m[3] : \$m[2] }) # compute the overall max
}xg;
say \$string;
say "max 0s run=",       \$m[0];
say "max 1s run=",       \$m[1];
say "max of 0s/1s run=", \$m[2];
say "max others run=",   \$m[3];
say "max overall=",      \$m[4];

Results:

%sharma-zero-one-V2.perl '1000000111111110101010100000000'
1000000111111110101010100000000
max 0s run=8
max 1s run=8
max of 0s/1s run=8
max others run=0
max overall=8

%sharma-zero-one-V2.perl '0030111fji000001   ejsflj asdf 1111001001 '
0030111fji000001   ejsflj asdf 1111001001
max 0s run=5
max 1s run=4
max of 0s/1s run=5
max others run=15
max overall=15

%sharma-zero-one-V2.perl '1000 011AA11010QGD10    101000001111111111'
1000 011AA11010QGD10    101000001111111111
max 0s run=5
max 1s run=10
max of 0s/1s run=10
max others run=4
max overall=10

Ok, that works, and also gets rid of the one error in the first version.

That Regex, however, is something I'm going to have to study another
day. I'll have to look up info on some of those RE techniques you're
using.

--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'
http://www.well.com/user/lonewolf/

## Re: fuzzy problem

On Sunday, 7 February 2016 17:18:02 UTC+5:30, Robbie Hatley  wrote:

>
> Ok, that works, and also gets rid of the one error in the first version.
>
> That Regex, however, is something I'm going to have to study another
> day. I'll have to look up info on some of those RE techniques you're
> using.
>

perl -Mstrict -Mwarnings -le '
my \$string = \$ARGV[0] // die "[ERROR] Insufficient arguments";

my @m = (0) x 5; # holds the max counts

1 while \$string =~ m{
(?:
(?:
(([01])*)
(?{
length(\$1) > \$m[\$2] and \$m[\$2] = length(\$1);
\$m[\$2] > \$m[2] and \$m[2] = \$m[\$2];
})
) |
(?:(.+)(?{ length(\$3) > \$m[3] and \$m[3] = length(\$3) }))
)
(?{ \$m[4] = (\$m[3] > \$m[2]) ? \$m[3] : \$m[2] })
}xg;

print \$string;
print "max 0s run=",       \$m[0];
print "max 1s run=",       \$m[1];
print "max of 0s/1s run=", \$m[2];
print "max others run=",   \$m[3];
print "max overall=",      \$m[4];
' \$

## Re: fuzzy problem

On Sat, 6 Feb 2016 05:02:47 -0800, in article <FZCdnW_idIJscyjLnZ2dnUU7_
8ydnZ2d@giganews.com>, Robbie Hatley wrote:

[snip]

Ugly and almost obfuscated way:

#!/usr/bin/perl

sub max { \$_[0] > \$_[1] ? \$_[0] : \$_[1] }
@x = (shift // die 'No args') =~ /(([01])*)/g;
while ((\$r, \$i) = splice @x, 0, 2) {
\$m = max \$m, length \$r;
}
print "zeros: \$m\nones: \$m";

## Re: fuzzy problem

On Sun, 7 Feb 2016 21:14:19 +0100, in article <MPG.3121c3ef816cf75d9896a7
@news.eternal-september.org>, Wasell wrote:

Anyone up for a round of golf? First attempt:

#!/usr/bin/perl
@_=pop=~/(([01])*)/g;\$m=length>\$m?length:\$m{\$
i}while(\$_,\$i)=splice@_,0,2;print"0s: \$m\n1s: \$m"

Test:
\$ ./runs.pl '11110011111100010'
0s: 3
1s: 6

## Re: fuzzy problem

[snip]

>
> Anyone up for a round of golf? First attempt:
>
> #!/usr/bin/perl
> @_=pop=~/(([01])*)/g;\$m=length>\$m?length:\$m{\$
> i}while(\$_,\$i)=splice@_,0,2;print"0s: \$m\n1s: \$m"
>

#!/usr/bin/perl
%h=%};(length>\$m)&&(\$m=length)while(\$_,\$i)=each%h;print"0s: \$m\n1s: \$m"

The ternary operator is superfluous under non-strictness.
"each" in place of "splice".