Simple problem

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I am a newbie to perl and its regular expression. I would like to know
the regular expression which would allow me to match anything in a
string except for "textarea" or "script" within an angle bracket

string = "abracadabra<this should match><textarea><script><textarea

In the above string it should match
<this should match>
<textarea abracadabra>
and not match
<script> and ofcourse the beginning abracadabra

Thanks in advance,

Re: Simple problem

Your subject "Simple problem" is at once vacuous and annoying.  If you
can't solve it, don't tell us it's simple!

Quoted text here. Click to load it

This isn't simple, it's impossible.

A regex match must be definite.  It isn't enough to decide whether
a match occurs, but it must say where it occurs and how many characters
are involved.  So where in the string "abracadabra" is the place where
/<textarea>|<script>/ isn't contained, and for how many characters
isn't it contained?

The answer is, your regex would have to match every substring that
isn't exactly one of "<textarea>" or "<script>".  That are a lot
of substrings, including strings like "match><textar", "textarea>",
and so on.  A single regex simply won't do it.

You could first extract the parts that are enclosed in <> and
then de-select the unwanted matches, as in

    print "$_\n" for grep ! /<textarea>|<script>/, /(<.*?>)/g;

but that isn't a single regex, and it will break easily on more
complicated HTML.

What you really should do is get yourself an HTML parser from CPAN
and use that.


Re: Simple problem

Jayashree wrote:
Quoted text here. Click to load it

     perldoc perlrequick

Gunnar Hjalmarsson

Re: Simple problem

Jayashree wrote:
Quoted text here. Click to load it

What do you mean by "should match"?
Usually you match an entire line (or parts thereof) and extract the
interesting portions. So: what exactly do you want to do with the line?
If you are interested in the various parts, you might want to have a
look at "split":

#! /usr/bin/perl
use warnings;
use strict;

my $string = "abracadabra<this should match><textarea><script><textarea>
my @f = split(/<textarea>|<script>/, $string);
print join("\n", @f);

Note that this will extract the empty string between "<textarea>" and
You can avoid this if you use
my @f = split(/(<textarea>|<script>)+/, $string);


Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
                        -- T.  Pratchett

Site Timeline