Click here to get back home

find which subgroups don't match in regex

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
find which subgroups don't match in regex Shoryuken 07-17-2008
Posted by Shoryuken on July 17, 2008, 3:39 pm
Please log in for more thread options
Hello gents, here's the thing been confusing me for a while:

$regex="(\w+)\s([0-9]+)";

$a="Tom 1990"; # it's a match
$b="Jack xyz"; # not a match, because of $2 doesn't match ... but
here's my question, exactly how to inform the users of this unmatched
subgroup? (i.e. $2 is the problem, $1 is fine, etc.)

For a regex matching, is there a way to find which subgroups don't
match?

thanks in advance.

Posted by Ben Morrow on July 17, 2008, 4:02 pm
Please log in for more thread options

> Hello gents, here's the thing been confusing me for a while:
>
> $regex="(\w+)\s([0-9]+)";
>
> $a="Tom 1990"; # it's a match
> $b="Jack xyz"; # not a match, because of $2 doesn't match ... but
> here's my question, exactly how to inform the users of this unmatched
> subgroup? (i.e. $2 is the problem, $1 is fine, etc.)
>
> For a regex matching, is there a way to find which subgroups don't
> match?

You can use /gc and \G to match one piece at a time, without losing your
place; something like

my @matches = qw/ \w+ \s [0-9]+ /;
my $string = 'Jack xyz';

for my $match (@matches) {
$string =~ /\G$match/gc
or print "$match failed at position " . pos $string;
}

Ben

--
Outside of a dog, a book is a man's best friend.
Inside of a dog, it's too dark to read.
ben@morrow.me.uk Groucho Marx

Posted by Shoryuken on July 18, 2008, 1:53 pm
Please log in for more thread options
>
> > Hello gents, here's the thing been confusing me for a while:
>
> > $regex="(\w+)\s([0-9]+)";
>
> > $a="Tom 1990"; # it's a match
> > $b="Jack xyz"; # not a match, because of $2 doesn't match ... but
> > here's my question, exactly how to inform the users of this unmatched
> > subgroup? (i.e. $2 is the problem, $1 is fine, etc.)
>
> > For a regex matching, is there a way to find which subgroups don't
> > match?
>
> You can use /gc and \G to match one piece at a time, without losing your
> place; something like
>
> my @matches = qw/ \w+ \s [0-9]+ /;
> my $string = 'Jack xyz';
>
> for my $match (@matches) {
> $string =~ /\G$match/gc
> or print "$match failed at position " . pos $string;
> }
>
> Ben
>
> --
> Outside of a dog, a book is a man's best friend.
> Inside of a dog, it's too dark to read.
> b...@morrow.me.uk Groucho Marx

This is a great idea, thanks!

And thanks the other guys for the good input, too!

Posted by Leon Timmermans on July 17, 2008, 4:07 pm
Please log in for more thread options
On Thu, 17 Jul 2008 12:39:58 -0700, Shoryuken wrote:

> Hello gents, here's the thing been confusing me for a while:
>
> $regex="(\w+)\s([0-9]+)";
>

Regular expressions aren't strings in Perl, please don't make them
strings. There is absolutely no reason to do so. Also, [0-9] can be
better written as \d. Also, you could consider anchoring the regexp to
the beginning and the end of the string.

> $a="Tom 1990"; # it's a match
> $b="Jack xyz"; # not a match, because of $2 doesn't match ... but here's
> my question, exactly how to inform the users of this unmatched subgroup?
> (i.e. $2 is the problem, $1 is fine, etc.)
>

In this case, you could match for /\w+\s/. If that is present then the
absence number is the problem.

> For a regex matching, is there a way to find which subgroups don't
> match?
>

In the general case, no. That's because they fail all of the time, until
they succeed. There is no definitive moment of failure.

Leon Timmermans

Posted by xhoster on July 17, 2008, 4:15 pm
Please log in for more thread options
> Hello gents, here's the thing been confusing me for a while:
>
> $regex="(\w+)\s([0-9]+)";
>
> $a="Tom 1990"; # it's a match
> $b="Jack xyz"; # not a match, because of $2 doesn't match ... but
> here's my question, exactly how to inform the users of this unmatched
> subgroup? (i.e. $2 is the problem, $1 is fine, etc.)
>
> For a regex matching, is there a way to find which subgroups don't
> match?

There isn't a built-in way. You'd have to build it yourself, and that
will probably be non-trivial, as it would pretty much have to be an expert
system in your exact context, not just some standard Perl feature.

For example, whose "fault" is it that this doesn't match:

"1990 Tom" =~ /(\w+)\s(\d+)/;

Both subgroups will match individually, just not when put together.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Similar ThreadsPosted
get the number of subgroups in a regex September 8, 2005, 7:36 am
Multi-Match (to Array) Regex with a precodition match? August 5, 2007, 2:43 pm
find last match in a string? May 29, 2007, 6:26 am
Fastest way to find a match? March 12, 2008, 7:34 pm
Idiot Q: How to find index number of HASH match? July 21, 2006, 1:58 pm
RegEx Help, Please? (match after n) June 26, 2005, 10:49 pm
regex to match any url February 14, 2006, 4:02 pm
Printing regex match September 25, 2004, 1:27 pm
regex: match at least one of two expression October 12, 2004, 10:07 am
match regex split January 5, 2005, 9:09 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap