How to debug a regex with (?DEFINE)?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I'm trying to extract the nested namespace in the following code. But
the code can only extract the inner namespace. It is very hard for me
to see what is wrong. Does anybody know some tricks how to debug a
regex like the following? Thanks!

~/linux/test/perl/man/perlre/(?/(/DEFINE$ cat
#!/usr/bin/env perl

use strict;
use warnings;

my $text=<<'EOF';
namespace A {
  namespace B {

# Build pattern that matches only namespaces...
my $namespace_pattern = qr{
    ((?&namespace))        # Match and capture (possibly nested)

    # Define each component...
          \b [A-Za-z_]\w* \b

            \b namespace \b

        # Namespace is keyword + name + block...
            (?&namespace_keyword) \s+ (?&namespace_token) \s*


                (?: (?&block) | . )*?

my ($extracted) = $text =~ $namespace_pattern;

print "text = $text\n";
print "extracted = $extracted\n";


Re: How to debug a regex with (?DEFINE)?

Quoted text here. Click to load it

The best way to debug something like this is to run it under 'use re
"debug"'. The output is rather verbose and a little arcane, so it takes
a bit of patience to pick through it, but it tells you *exactly* what
the regex engine is doing and where it fails.

In this case the important bit (reformatted slightly) is

  29 <e B {> <%n  }%n}%n>    | 62:  OPEN5 'namespace_body'(64)
  29 <e B {> <%n  }%n}%n>    | 64:  BRANCH(72)
  29 <e B {> <%n  }%n}%n>    | 65:    STAR(67)
                                      SPACE can match 3 times out of
  32 <e B {%n  > <}%n}%n>    | 67:      GOSUB4[-26](70)
  32 <e B {%n  > <}%n}%n>    | 41:        OPEN4 'namespace'(43)
  32 <e B {%n  > <}%n}%n>    | 43:        GOSUB3[-12](46)
  32 <e B {%n  > <}%n}%n>    | 31:          OPEN3 'namespace_keyword'(33)
  32 <e B {%n  > <}%n}%n>    | 33:          BOUND(34)

[snip some attempts to claw back spaces from the \s*; using (?>) or \s*+
might be a good idea...]

  29 <e B {> <%n  }%n}%n>    | 72:  BRANCH(76)
  29 <e B {> <%n  }%n}%n>    | 73:    GOSUB6[+5](76)
  29 <e B {> <%n  }%n}%n>    | 78:      OPEN6 'block'(80)
  29 <e B {> <%n  }%n}%n>    | 80:      EXACT <{>(82)
                                    BRANCH failed...

Here it's matched as far as 'namespace B {' and it's trying to match
(?&namespace_body), but that requires either a sub-namespace or a
(?<block>) (with explicit braces) so it doesn't match. You need to add
an empty case here:

Quoted text here. Click to load it

            | \s*

Quoted text here. Click to load it

though I suspect that you actually want to allow whitespace around
blocks, so what you want is

        (?: (?&namespace)
        |   (?&block)
        |   # empty

or perhaps

        (?&namespace_keyword) \s+ (?&namespace_token) \s*
        \{ \s* (?: (?&namespace_body) \s* )* \}

        (?&namespace) | (?&block)

to allow multiple blocks-or-namespaces within one namespace.

Quoted text here. Click to load it

You want to be careful about using .*? to mean 'match anything
until...'. This will correctly match good input, but it is too
permissive; for instance, this

    namespace A { { { } }

will match, despite not having balanced braces. You want

    (?: (?&block) | [^\{] )*

to prevent that, and if you also want to forbid namespaces within blocks
you need

    (?: (?&block)
    |   (?! (?&namespace_keyword) | \{ ) .


Site Timeline