Click here to get back home

Variables interpolated in character classes?

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Variables interpolated in character classes? Robbie Hatley 03-16-2008
Posted by Robbie Hatley on March 16, 2008, 1:08 am
Please log in for more thread options

I just wrote a Perl program to linkify text files with http URLs,
by generating an html file with the same content, but with the
URLs imbeded in a and p elements. Here's an edited-for-brevity
version:

# (snip code here for printing opening lines of HTML file)
while (<>)
{
# regex for recognizing URLs:
my $Regex = qr;

# wrap URLs in "a" and "p" elements, and put them on their own lines:
s{\n<p><a href=""></a></p>\n}g;

# Print the edited line:
print ($_);
}
# (snip code here for printing closing lines of HTML file)


To my surprise, I was getting error messages like this:

illegal [] range error i-b in "cgi-bin"

Huh??? There's no "cgi-bin" in the regex!!!

Then I realized, the regex contains "$_", which was embedding
the entire line of text to be searched inside the regex!

I had thought that character classes removed the special
meanings of all characters, with the exception of:
^ (inverts class; but only when first char.)
- (character range; but only if not first or last char.)
\ (for escaping ^ and -)

I got the program to work by replacing "$_" with "$_",
and by moving the declaration of $Regex to top of program
to prevent having to recompile it every iteration.
But I'm still puzzled as to why I have escape the $.
Don't character classes prevent variable interpolation?

--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant



Posted by John W. Krahn on March 16, 2008, 1:18 am
Please log in for more thread options
Robbie Hatley wrote:
> I just wrote a Perl program to linkify text files with http URLs,
> by generating an html file with the same content, but with the
> URLs imbeded in a and p elements. Here's an edited-for-brevity
> version:
>
> # (snip code here for printing opening lines of HTML file)
> while (<>)
> {
> # regex for recognizing URLs:
> my $Regex = qr;
>
> # wrap URLs in "a" and "p" elements, and put them on their own lines:
> s{\n<p><a href=""></a></p>\n}g;
>
> # Print the edited line:
> print ($_);
> }
> # (snip code here for printing closing lines of HTML file)
>
>
> To my surprise, I was getting error messages like this:
>
> illegal [] range error i-b in "cgi-bin"
>
> Huh??? There's no "cgi-bin" in the regex!!!
>
> Then I realized, the regex contains "$_", which was embedding
> the entire line of text to be searched inside the regex!
>
> I had thought that character classes removed the special
> meanings of all characters, with the exception of:
> ^ (inverts class; but only when first char.)
> - (character range; but only if not first or last char.)
> \ (for escaping ^ and -)
>
> I got the program to work by replacing "$_" with "$_",
> and by moving the declaration of $Regex to top of program
> to prevent having to recompile it every iteration.
> But I'm still puzzled as to why I have escape the $.
> Don't character classes prevent variable interpolation?

Read the "Gory details of parsing quoted constructs" section of perlop.pod:

perldoc perlop

The gist of it is that qr//, m// and s/// are first interpolated as
double quoted strings before they are processed by the regular
expression engine. For example:

$ perl -le'$x = "abc"; @y = "D".."F"; $z = qr/[-($x)~^]/; print $z'
(?-xism:[-(abc)~{D E F}^])


You can avoid interpolation by using single quote delimiters instead:

my $Regex = qr'(s?https?://[[:alnum:];/?:@=&#%$_.+!*\'(),-]+)';


Also the back-references , , etc. should only be used *inside* a
regular expression, but not in a character class. Outside of a regular
expression you should use $1, $2, etc. instead. So:

s{\n<p><a href=""></a></p>\n}g;

Should be:

s{\n<p><a href="$1">$1</a></p>\n}g;




John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Posted by Abigail on March 16, 2008, 6:07 am
Please log in for more thread options
_
Robbie Hatley (lonewolf@well.com) wrote on VCCCXI September MCMXCIII in
`'
`' I just wrote a Perl program to linkify text files with http URLs,
`' by generating an html file with the same content, but with the
`' URLs imbeded in a and p elements. Here's an edited-for-brevity
`' version:
`'
`' # (snip code here for printing opening lines of HTML file)
`' while (<>)
`' {
`' # regex for recognizing URLs:
`' my $Regex = qr;
`'
`' # wrap URLs in "a" and "p" elements, and put them on their own lines:
`' s{\n<p><a href=""></a></p>\n}g;
`'
`' # Print the edited line:
`' print ($_);
`' }
`' # (snip code here for printing closing lines of HTML file)
`'
`'
`' To my surprise, I was getting error messages like this:
`'
`' illegal [] range error i-b in "cgi-bin"
`'
`' Huh??? There's no "cgi-bin" in the regex!!!
`'
`' Then I realized, the regex contains "$_", which was embedding
`' the entire line of text to be searched inside the regex!
`'
`' I had thought that character classes removed the special
`' meanings of all characters, with the exception of:
`' ^ (inverts class; but only when first char.)
`' - (character range; but only if not first or last char.)
`' \ (for escaping ^ and -)

It does.

However, interpolation goes first. So, first $_ is interpolated, then
any [] parsing is done. If it then finds a $, it's just a dollar sign.

`' I got the program to work by replacing "$_" with "$_",
`' and by moving the declaration of $Regex to top of program
`' to prevent having to recompile it every iteration.
`' But I'm still puzzled as to why I have escape the $.
`' Don't character classes prevent variable interpolation?

Nope.


Abigail
--
perl -we '$| = 1; $_ = "Just another Perl Hacker\n"; print
substr $_ => 0, 1 => "" while $_ && sleep 1 => 1'

Similar ThreadsPosted
Please give me a good "rule-of-thumb" for back-slashing in character classes May 9, 2007, 11:06 am
Regular Expressions: "Negated Strings" instead of "Negated Character Classes" June 7, 2007, 4:50 pm
Hash value not being interpolated February 3, 2005, 11:01 am
suppress regex parsing in interpolated string February 28, 2006, 2:08 pm
FAQ 4.31 How can I split a [character] delimited string except when inside [character]? January 18, 2005, 12:03 am
FAQ 4.31 How can I split a [character] delimited string except when inside [character]? April 10, 2005, 5:03 pm
FAQ 4.31 How can I split a [character] delimited string except when inside [character]? June 26, 2005, 11:03 am
FAQ 4.31 How can I split a [character] delimited string except when inside [character]? September 19, 2005, 10:03 pm
FAQ 4.31 How can I split a [character] delimited string except when inside [character]? December 7, 2005, 5:03 pm
FAQ 4.31 How can I split a [character] delimited string except when inside [character]? April 16, 2006, 3:03 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap