FAQ 6.11 Can I use Perl regular expressions to match balanced text?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

This message is one of several periodic postings to comp.lang.perl.misc
intended to make it easier for perl programmers to find answers to
common questions. The core of this message represents an excerpt
from the documentation provided with Perl.


6.11: Can I use Perl regular expressions to match balanced text?

    Historically, Perl regular expressions were not capable of matching
    balanced text. As of more recent versions of perl including 5.6.1
    experimental features have been added that make it possible to do this.
    Look at the documentation for the (??{ }) construct in recent perlre
    manual pages to see an example of matching balanced parentheses. Be sure
    to take special notice of the warnings present in the manual before
    making use of this feature.

    CPAN contains many modules that can be useful for matching text
    depending on the context. Damian Conway provides some useful patterns in
    Regexp::Common. The module Text::Balanced provides a general solution to
    this problem.

    One of the common applications of balanced text matching is working with
    XML and HTML. There are many modules available that support these needs.
    Two examples are HTML::Parser and XML::Parser. There are many others.

    An elaborate subroutine (for 7-bit ASCII only) to pull out balanced and
    possibly nested single chars, like "`" and "'", "{" and "}", or "(" and
    ")" can be found in
    http://www.cpan.org/authors/id/TOMC/scripts/pull_quotes.gz .

    The C::Scan module from CPAN also contains such subs for internal use,
    but they are undocumented.


Documents such as this have been called "Answers to Frequently
Asked Questions" or FAQ for short.  They represent an important
part of the Usenet tradition.  They serve to reduce the volume of
redundant traffic on a news group by providing quality answers to
questions that keep coming up.

If you are some how irritated by seeing these postings you are free
to ignore them or add the sender to your killfile.  If you find
errors or other problems with these postings please send corrections
or comments to the posting email address or to the maintainers as
directed in the perlfaq manual page.

Note that the FAQ text posted by this server may have been modified
from that distributed in the stable Perl release.  It may have been
edited to reflect the additions, changes and corrections provided
by respondents, reviewers, and critics to previous postings of
these FAQ. Complete text of these FAQ are available on request.

The perlfaq manual page contains the following copyright notice.


    Copyright (c) 1997-2002 Tom Christiansen and Nathan
    Torkington, and other contributors as noted. All rights

This posting is provided in the hope that it will be useful but
does not represent a commitment or contract of any kind on the part
of the contributers, authors or their agents.

Site Timeline