Clickable link conversion regex?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Can anyone suggest a solution to enclose bare urls with href tags?

open(my $fh, 'urls.txt') or die $!;

while (my $line = <$fh>) {
$line =~ s[...]    # match http or https instances
            [...]s;  # replace with enclosing hrefs
print $line;

The input format may be one or more URLs p/line.

Each scheme begins with either http:// or https:// but not necessarily as a
first string on a line.

Each URL ends with either the end of a line or a whitespace.

The input file would look like for example:

---------- urls.txt ------- /

bla plus a string not part of the URL


If an http or https string already has a preceding occurrence of a closing
html tag ">", such as:
<a href= </a>
... then it should be excluded with no replacement.

Two conditions exist in the input file:

The 'http' or 'https' bit will always begin at the first character on a new
line or have a preceding whitespace immediately before itself, like: line w/ whitespace before
hello also w/ a whitespace before

The match and replace output on the above three lines would then be:

 <a href= </a> line w/ whitespace before
<a href= </a>
hello <a href= </a> also w/ a whitespace before

In case something may written as http://bla, which as in this sentence
isn't a link, it would inadvertently end up being converted into a link,
but that would be a rare occurrence. In other words, without additional
validity checking, the regex would be a best-guess procedure. For a more
strict procedure, each match could perhaps be checked against a
is_web_uri($...) function using Data::Validate::URI that validates http or
https URIs specifically. That said, any example that illustrates a basic
search and replace concept be much appreciated, even if it's only a
best-guess URL type of procedure.

Many thanks for any bright ideas!


Site Timeline