Click here to get back home

Regex for "at start of line OR preceded by space".

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Regex for "at start of line OR preceded by space". Robbie Hatley 04-27-2008
Get Chitika Premium
Posted by Dr.Ruud on April 28, 2008, 3:06 am
Please log in for more thread options
Robbie Hatley schreef:

> I needed a regex that says "either at the start of a line, OR
> preceded by some whitespace".

Maybe you are looking for \b?


> my $Regex2 = qr;

The alternation was: BOL or whitespace. So why not write that first?

(?:\A|(?<=\s))

Ah, now I see, you just forgot to group it.

--
Affijn, Ruud

"Gewoon is een tijger."

Posted by Robbie Hatley on May 3, 2008, 1:31 am
Please log in for more thread options
"Dr.Ruud" wrote:

> Robbie Hatley schreef:
>
> > I needed a regex that says "either at the start of a line, OR
> > preceded by some whitespace".
>
> Maybe you are looking for \b?

That won't work. I'm basically looking for an assertion that
says "preceded by some space", where "space" can mean "begining of
line", as well as whitespace characters such as space, tab, etc.
Basically, "is a separate token, or the prefix of a separate
token".

\b, on the other hand, could be a symbol.

Say $Regex1 was "[a-z]+\.acme\.com". Say you were looking for
"bare" instances of that, *NOT* preceeded by, say, "ftp://".
Say you want to match "i saw it on fred.acme.com today", but
*NOT* "i saw it on ftp://fred.acme.com today". Using \b would
match both.

> > my $Regex2 = qr;
>
> The alternation was: BOL or whitespace. So why not write that first?
>
> (?:\A|(?<=\s))

Hmmm. What's "\A"?

( ::: looks in camel book ::: )

Ah, start of string. Ok. Let me try that....

( ::: tries it ::: )

COOL! That works!

> Ah, now I see, you just forgot to group it.

It seems to do the trick, all right, though I don't see why.
Why does adding in the one extra grouping make it start working
right? Looks to me like these two regexes should behave the
same, but they don't:

BROKEN: my $Regex2 = qr;

WORKS: my $Regex2 = qr;

The first always matches the empty string at the beginning of
ANY input string. I wonder if the RE engine interprets that
as meaning:

^ *OR* ( (?<=\s) $Regex1 )

(And hence all strings match, and the extent of the match is
always the empty string, which matches the "^".)

That would explain why the extra grouping makes it work right:

( ^ *OR* (?<=\s) ) $Regex1

It's almost as if the alternation operator "|" has lower
precedence that juxtapostion....

(::: reads http://perldoc.perl.org/5.8.8/perlre.html :::)

GULP. It does. DOH. Never mind.

--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant



Similar ThreadsPosted
Matching spaces at start of line November 24, 2005, 9:30 pm
start printing at the end of the previous line July 18, 2006, 10:37 am
how to add a space using a regex August 31, 2005, 2:01 am
Leading Space with REGEX March 29, 2007, 12:20 pm
matching a pattern with a space or no space?? November 9, 2005, 7:45 am
Multiple regex on one line June 6, 2005, 10:38 am
Multi Line Match and Regex November 27, 2006, 10:08 pm
Regex matching a integer in a line February 21, 2007, 1:54 am
using regex to select line matches June 15, 2007, 4:21 pm
multiple regex pattern matching per line? September 4, 2004, 2:24 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap