Click here to get back home

Can someone 'splain why this regex won't work both ways?

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Can someone 'splain why this regex won't work both ways? spydox 04-14-2008
Posted by A. Sinan Unur on April 14, 2008, 5:43 pm
Please log in for more thread options
spydox@gmail.com wrote in
news:e6278092-e663-4ea6-8f07-40d65faeb551
@f63g2000hsf.googlegroups.co
m:

[ please do not snip attributions ]

>> > I guess LLR parsing is to blame,
>>
>> I don't look at this as a parsing issue. Rather, it is a "the
>> universe must make sense" kind of issue: The first match does not
>> exist before the first match. That makes sense to me. It may not
>> make sense to you.
>>
>
> To me, like conventional pattern-recognition, of say two tanks
> next to each other, the system should accept it whether the match
> is described either way:
>
> find a tank with another identical tank to it's left
>
> *or*
>
> find a tank with another identical tank to it's right
>
>
> The system should have no *context-sensitivity* where only one of
> the two matches. Sure, internally an algorithm may be scanning L
> to R or R to L or whatever, but the user should not even be
> concerned with that, at least in this case. I still think it gave
> up too soon- it should have tried R to L (backtracking) when L to
> R failed.

What you seem to want is a "match two identical characters"
operator. For this particular case, you can achieve that by using:

=for example

my @strings = qw( 1222345 1233345 );

s/00|11|22|33|44|55|66|77|88|99// for @strings;

print "$_\n" for @strings;

=cut

When you use a character class, every element of that class is
considered equivalent to every other one. So, for example, when you
write

/\d/

that does find two characters that are in the same equivalence
class.

The tank analogy works perftectly here because there are no two
identical tanks in the world. Instead, there are equivalence classes
of tanks. Tanks that are the same model, tanks in the same unit etc.

If what you want is to say,

find a tank, then find another tank that is the same
model as the one you just found

well, that is equivalent to /(\d)/

J. D. Baldwin gives perfect examples of why /(\d)/ does not make
sense: Finding another tank in the same equivalence class as the one
you first found comes after first finding a tank.

> Just IMHO, thank-you for your thoughts. This area seems just a bit
> gray to me I'd be very interested in Damain or Mark's thoughts.

s/Damain/Damian/

My feeble mind looks at the following:

#!/usr/bin/perl

use strict;
use warnings;

use 5.010;

for ( my @a = qw( 1222345 1233345 ) ) {
s/(?<tank>\d)\K\k<tank>// and print "$_\n";
}

for ( my @a = qw( 1222345 1233345 ) ) {
s/(?<tank>\d)\K\k<tank>+// and print "$_\n";
}

for ( my @a = qw( 1222345 1233345 ) ) {
s/(?<tank>\d)\k<tank>// and print "$_\n";
}

__END__

thinks that the third one is the most natural (that is, find a tank,
then find another tank in the same equivalence class) to the other
ones.

Sinan

--
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Posted by Peter J. Holzer on April 14, 2008, 5:53 pm
Please log in for more thread options
>> I don't look at this as a parsing issue. Rather, it is a "the
>> universe must make sense" kind of issue: The first match does not
>> exist before the first match. That makes sense to me. It may not
>> make sense to you.
>>
>
> To me, like conventional pattern-recognition, of say two tanks next to
> each other, the system should accept it whether the match is described
> either way:
>
> find a tank with another identical tank to it's left
>
> *or*
>
> find a tank with another identical tank to it's right
>
>
> The system should have no *context-sensitivity* where only one of the
> two matches. Sure, internally an algorithm may be scanning L to R or R
> to L or whatever, but the user should not even be concerned with that,
> at least in this case. I still think it gave up too soon- it should
> have tried R to L (backtracking) when L to R failed.

Backtracking doesn't mean scanning right to left. Backtracking means to
go back to the last point where you had a choice and try the other
alternative(s).

So, for example if you have a pattern /foo(bar|baz)/, after matching
"foo", you have a choice between trying to match "bar" or "baz". The
regex engine will try to match "bar" first, and if that fails, it will
backtrack to the point before it tried that and then try to match "baz"
instead.

But in a pattern like /(a)/ there is no choice: It needs to start by
matching the string in the first capture group, but that hasn't been
defined yet, so it must fail. (Well, it could try all possible strings,
but that would be extremely inefficient).

        hp


Posted by Willem on April 14, 2008, 3:11 pm
Please log in for more thread options
spydox@gmail.com wrote:
) Understood, and I appreciate the insight. It makes sense.
) Yet, when all else apparently *fails*, in my experience, and I've
) heard MJD and others say this, Perl will "do its best" to match. To
) me, unless it *also* tried backtracking, it gave up too soon..

That's not what backtracking means.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Posted by Ilya Zakharevich on April 14, 2008, 6:03 pm
Please log in for more thread options
[A complimentary Cc of this posting was sent to

>
> I'm trying to find a repeated number in a string, like 122345 finds
> 22.
>
> This works:
>
> /(\d)/
>
> This doesn't:
>
> /(\d)/

This depends on what you mean by "works". It works in the sense that
it does not match (as it should not). I do not find it documented in
perlre, but will fail to match if group 3 did not match "yet".

Hope this helps,
Ilya

P.S. perl -Mre=debugcolor -wle "q(aa) =~ /(a)/"

Similar ThreadsPosted
please splain dis scoping issure June 22, 2007, 12:34 pm
Ways to find MTU and MSS February 22, 2006, 12:25 am
Hrs of work on regex: please help July 26, 2004, 6:51 pm
Regex won't work February 9, 2006, 5:14 pm
Hidden overload and It is a better ways to save to Mysql, it is? October 5, 2005, 11:51 pm
Regex: Backreferences do not work inside quantifiers? March 7, 2006, 1:56 pm
regex bug (comments within regex not as robust) October 27, 2005, 12:01 pm
FAQ 4.6 Why doesn't & work the way I want it to? May 13, 2005, 5:03 am
FAQ 4.6: Why doesn't & work the way I want it to? November 7, 2004, 6:03 am
FAQ 4.6: Why doesn't & work the way I want it to? December 2, 2004, 12:03 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap