handling hypens(-) in word boundary matching

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hi Perl gurus,

Iam trying to replace the sub string "feat-ha" with 1 in a string but
it's failing because of hypen (-) please
help me how to handle it. I have pasted my code snippet below.


use strict;

my $string = qq((feat-bgp-mpls-vpn  AND feat-ha-sso  AND span));

my $pattern = qq(feat-ha);


print $string;

output : (feat-bgp-mpls-vpn  AND 1-sso AND span)

In the above programme iam trying to replace the word "feat-ha" with 1
if it matches the exact word in the string "(feat-bgp-mpls-vpn  AND
feat-ha-sso  AND span)" but it's replacing feat-ha-sso as 1-sso which
shouldn't happen  please help me.

Thanks in Advance..


Re: handling hypens(-) in word boundary matching

Quoted text here. Click to load it

Use a negative lookahead assertion with a positive char class:

   $string =~ s/\b$pattern(?![\w-])/1/g;

or use a positive lookahead assertion with a negated char class:

   $string =~ s/\b$pattern(?=[^\w-])/1/g;

If $pattern might occur at the end of the string, then you'll need
to account for that as well:

   $string =~ s/\b$pattern(?!([\w-]|$))/1/g;

You will likely need to do something similar to the beginning of your pattern.

Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

Re: handling hypens(-) in word boundary matching

Quoted text here. Click to load it
From perlretut:
An anchor useful in basic regexps is the word anchor  \b.
This matches a boundary between a word character and a non-word character \w\W
or \W\w

\d is a digit and represents [0-9]

\s is a whitespace character and represents [\ \t\r\n\f]

\w is a word character (alphanumeric or _) and represents [0-9a-zA-Z_]

\D is a negated \d; it represents any character but a digit [^0-9]

\S is a negated \s; it represents any non-whitespace character [^\s]

\W is a negated \w; it represents any non-word character [^\w]

The period '.' matches any character but ``\n''


So feat-ha-sso matches because between 'a' and '-' is a word boundry going from

Visually, it appears you want this ->  s/(\s)$pattern(\s)/$1$2/g
if say you don't wan't to do extended pattern constructs.

Possibly ->  s/\b$pattern([^\w-]|$))/1$1/g

But doing it this way negates the variability of $pattern since the last
in the pattern must be a word character.

Still though, its better to be specific. To just use \b, a word boundry as
delimeters might yeild more than you want (or less than you want). It all
depends upon
what data you expect to be fed to it.

Oh I guess you could force a prevailing rule of \b, boundries, test the pattern,
pre/post delimeter character classes then put it all together in a regular

Generally, \b is not something reliable given the variable nature of
possiblillities it could
match with complex source text.


Site Timeline