|
Posted by Robbie Hatley on May 3, 2008, 1:31 am
Please log in for more thread options
"Dr.Ruud" wrote:
> Robbie Hatley schreef:
>
> > I needed a regex that says "either at the start of a line, OR
> > preceded by some whitespace".
>
> Maybe you are looking for \b?
That won't work. I'm basically looking for an assertion that
says "preceded by some space", where "space" can mean "begining of
line", as well as whitespace characters such as space, tab, etc.
Basically, "is a separate token, or the prefix of a separate
token".
\b, on the other hand, could be a symbol.
Say $Regex1 was "[a-z]+\.acme\.com". Say you were looking for
"bare" instances of that, *NOT* preceeded by, say, "ftp://".
Say you want to match "i saw it on fred.acme.com today", but
*NOT* "i saw it on ftp://fred.acme.com today". Using \b would
match both.
> > my $Regex2 = qr;
>
> The alternation was: BOL or whitespace. So why not write that first?
>
> (?:\A|(?<=\s))
Hmmm. What's "\A"?
( ::: looks in camel book ::: )
Ah, start of string. Ok. Let me try that....
( ::: tries it ::: )
COOL! That works!
> Ah, now I see, you just forgot to group it.
It seems to do the trick, all right, though I don't see why.
Why does adding in the one extra grouping make it start working
right? Looks to me like these two regexes should behave the
same, but they don't:
BROKEN: my $Regex2 = qr;
WORKS: my $Regex2 = qr;
The first always matches the empty string at the beginning of
ANY input string. I wonder if the RE engine interprets that
as meaning:
^ *OR* ( (?<=\s) $Regex1 )
(And hence all strings match, and the extent of the match is
always the empty string, which matches the "^".)
That would explain why the extra grouping makes it work right:
( ^ *OR* (?<=\s) ) $Regex1
It's almost as if the alternation operator "|" has lower
precedence that juxtapostion....
(::: reads http://perldoc.perl.org/5.8.8/perlre.html :::)
GULP. It does. DOH. Never mind.
--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant
|