|
Posted by John W. Krahn on March 16, 2008, 1:18 am
Please log in for more thread options
Robbie Hatley wrote:
> I just wrote a Perl program to linkify text files with http URLs,
> by generating an html file with the same content, but with the
> URLs imbeded in a and p elements. Here's an edited-for-brevity
> version:
>
> # (snip code here for printing opening lines of HTML file)
> while (<>)
> {
> # regex for recognizing URLs:
> my $Regex = qr;
>
> # wrap URLs in "a" and "p" elements, and put them on their own lines:
> s{\n<p><a href=""></a></p>\n}g;
>
> # Print the edited line:
> print ($_);
> }
> # (snip code here for printing closing lines of HTML file)
>
>
> To my surprise, I was getting error messages like this:
>
> illegal [] range error i-b in "cgi-bin"
>
> Huh??? There's no "cgi-bin" in the regex!!!
>
> Then I realized, the regex contains "$_", which was embedding
> the entire line of text to be searched inside the regex!
>
> I had thought that character classes removed the special
> meanings of all characters, with the exception of:
> ^ (inverts class; but only when first char.)
> - (character range; but only if not first or last char.)
> \ (for escaping ^ and -)
>
> I got the program to work by replacing "$_" with "$_",
> and by moving the declaration of $Regex to top of program
> to prevent having to recompile it every iteration.
> But I'm still puzzled as to why I have escape the $.
> Don't character classes prevent variable interpolation?
Read the "Gory details of parsing quoted constructs" section of perlop.pod:
perldoc perlop
The gist of it is that qr//, m// and s/// are first interpolated as
double quoted strings before they are processed by the regular
expression engine. For example:
$ perl -le'$x = "abc"; @y = "D".."F"; $z = qr/[-($x)~^]/; print $z'
(?-xism:[-(abc)~{D E F}^])
You can avoid interpolation by using single quote delimiters instead:
my $Regex = qr'(s?https?://[[:alnum:];/?:@=&#%$_.+!*\'(),-]+)';
Also the back-references , , etc. should only be used *inside* a
regular expression, but not in a character class. Outside of a regular
expression you should use $1, $2, etc. instead. So:
s{\n<p><a href=""></a></p>\n}g;
Should be:
s{\n<p><a href="$1">$1</a></p>\n}g;
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
|