FAQ 6.19 What good is "\G" in a regular expression?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

This message is one of several periodic postings to comp.lang.perl.misc
intended to make it easier for perl programmers to find answers to
common questions. The core of this message represents an excerpt
from the documentation provided with Perl.


6.19: What good is "\G" in a regular expression?

    You use the "\G" anchor to start the next match on the same string where
    the last match left off. The regular expression engine cannot skip over
    any characters to find the next match with this anchor, so "\G" is
    similar to the beginning of string anchor, "^". The "\G" anchor is
    typically used with the "g" flag. It uses the value of pos() as the
    position to start the next match. As the match operator makes successive
    matches, it updates pos() with the position of the next character past
    the last match (or the first character of the next match, depending on
    how you like to look at it). Each string has its own pos() value.

    Suppose you want to match all of consective pairs of digits in a string
    like "1122a44" and stop matching when you encounter non-digits. You want
    to match 11 and 22 but the letter <a> shows up between 22 and 44 and you
    want to stop at "a". Simply matching pairs of digits skips over the "a"
    and still matches 44.

            $_ = "1122a44";
            my @pairs = m/(\d\d)/g;   # qw( 11 22 44 )

    If you use the \G anchor, you force the match after 22 to start with the
    "a". The regular expression cannot match there since it does not find a
    digit, so the next match fails and the match operator returns the pairs
    it already found.

            $_ = "1122a44";
            my @pairs = m/\G(\d\d)/g; # qw( 11 22 )

    You can also use the "\G" anchor in scalar context. You still need the
    "g" flag.

            $_ = "1122a44";
            while( m/\G(\d\d)/g )
                    print "Found $1\n";

    After the match fails at the letter "a", perl resets pos() and the next
    match on the same string starts at the beginning.

            $_ = "1122a44";
            while( m/\G(\d\d)/g )
                    print "Found $1\n";

            print "Found $1 after while" if m/(\d\d)/g; # finds "11"

    You can disable pos() resets on fail with the "c" flag. Subsequent
    matches start where the last successful match ended (the value of pos())
    even if a match on the same string as failed in the meantime. In this
    case, the match after the while() loop starts at the "a" (where the last
    match stopped), and since it does not use any anchor it can skip over
    the "a" to find "44".

            $_ = "1122a44";
            while( m/\G(\d\d)/gc )
                    print "Found $1\n";

            print "Found $1 after while" if m/(\d\d)/g; # finds "44"

    Typically you use the "\G" anchor with the "c" flag when you want to try
    a different match if one fails, such as in a tokenizer. Jeffrey Friedl
    offers this example which works in 5.004 or later.

        while (<>) {
          PARSER: {
               m/ \G( \d+\b    )/gcx   && do { print "number: $1\n";  redo; };
               m/ \G( \w+      )/gcx   && do { print "word:   $1\n";  redo; };
               m/ \G( \s+      )/gcx   && do { print "space:  $1\n";  redo; };
               m/ \G( [^\w\d]+ )/gcx   && do { print "other:  $1\n";  redo; };

    For each line, the PARSER loop first tries to match a series of digits
    followed by a word boundary. This match has to start at the place the
    last match left off (or the beginning of the string on the first match).
    Since "m/ \G( \d+\b )/gcx" uses the "c" flag, if the string does not
    match that regular expression, perl does not reset pos() and the next
    match starts at the same position to try a different pattern.


Documents such as this have been called "Answers to Frequently
Asked Questions" or FAQ for short.  They represent an important
part of the Usenet tradition.  They serve to reduce the volume of
redundant traffic on a news group by providing quality answers to
questions that keep coming up.

If you are some how irritated by seeing these postings you are free
to ignore them or add the sender to your killfile.  If you find
errors or other problems with these postings please send corrections
or comments to the posting email address or to the maintainers as
directed in the perlfaq manual page.

Note that the FAQ text posted by this server may have been modified
from that distributed in the stable Perl release.  It may have been
edited to reflect the additions, changes and corrections provided
by respondents, reviewers, and critics to previous postings of
these FAQ. Complete text of these FAQ are available on request.

The perlfaq manual page contains the following copyright notice.


    Copyright (c) 1997-2002 Tom Christiansen and Nathan
    Torkington, and other contributors as noted. All rights

This posting is provided in the hope that it will be useful but
does not represent a commitment or contract of any kind on the part
of the contributers, authors or their agents.

Site Timeline