Simple problem

I am a newbie to perl and its regular expression. I would like to know
the regular expression which would allow me to match anything in a
string except for "textarea" or "script" within an angle bracket

string = "abracadabra<this should match><textarea><script><textarea

In the above string it should match
<this should match>
<textarea abracadabra>
and not match
<script> and ofcourse the beginning abracadabra

Thanks in advance,

Re: Simple problem

Your subject "Simple problem" is at once vacuous and annoying.  If you
can't solve it, don't tell us it's simple!

This isn't simple, it's impossible.

A regex match must be definite.  It isn't enough to decide whether
a match occurs, but it must say where it occurs and how many characters
are involved.  So where in the string "abracadabra" is the place where
/<textarea>|<script>/ isn't contained, and for how many characters
isn't it contained?

The answer is, your regex would have to match every substring that
isn't exactly one of "<textarea>" or "<script>".  That are a lot
of substrings, including strings like "match><textar", "textarea>",
and so on.  A single regex simply won't do it.

You could first extract the parts that are enclosed in <> and
then de-select the unwanted matches, as in

    print "$_\n" for grep ! /<textarea>|<script>/, /(<.*?>)/g;

but that isn't a single regex, and it will break easily on more
complicated HTML.

What you really should do is get yourself an HTML parser from CPAN
and use that.


Re: Simple problem

Jayashree wrote:
     perldoc perlrequick

Gunnar Hjalmarsson

Re: Simple problem

Jayashree wrote:
What do you mean by "should match"?
Usually you match an entire line (or parts thereof) and extract the
interesting portions. So: what exactly do you want to do with the line?
If you are interested in the various parts, you might want to have a
look at "split":

#! /usr/bin/perl
use warnings;
use strict;

my $string = "abracadabra<this should match><textarea><script><textarea>
my @f = split(/<textarea>|<script>/, $string);
print join("\n", @f);

Note that this will extract the empty string between "<textarea>" and
You can avoid this if you use
my @f = split(/(<textarea>|<script>)+/, $string);


Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize
                        -- T.  Pratchett

