Click here to get back home

[regex] grep for chars in any order

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
[regex] grep for chars in any order viki 06-18-2008
Posted by viki on June 18, 2008, 3:06 am
Please log in for more thread options
How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

Thanks
vkm

Posted by Ben Bullock on June 18, 2008, 5:47 am
Please log in for more thread options
On Wed, 18 Jun 2008 00:06:12 -0700, viki wrote:

> How can I build regex that matches all characters of the string $STR
> in any order with .* added between any two characters: ?
> And without generating all N! transpositions (where N is length of
> $STR) ?
> Example.
> For $STR "abc", I want to match equivalent to:
> /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
>
> Generating all transpositions is not feasible for larger legths of
> $STR.
> /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
> What is good solution ?

I don't think a single regular expression can do that, because there is
some logic involved which doesn't fit the regular expression mentality -
you have to work out what the first character matched was, then change
the second character to match depending on that, and so on.

I would use the easy and fast method and then remove the false positives
by checking that the matched string contained all n characters, perhaps
using s or tr n times and dropping out of the loop if one of my
substitutions failed.

The following seems to work, although I haven't tested it extensively:

#!/usr/local/bin/perl
use warnings;
use strict;

sub matches
{
my ($s, $STR) = @_;
my %chars = map {$_ => 1} split ('', $STR);
my @chars = sort keys %chars;
my $anychar = join '', @chars;
my $matchany = join '.*',map "[$anychar]", @chars; # there's a better
way
if ($s =~ /$matchany/) {
        my $copy = $s;
        for my $c (@chars) {
         return unless $copy =~ s/$c//g;
        }
        return 1;
}
return;
}

print "OK\n" if (matches('naninuneno','aeiou'));
print "OK\n" if (matches('naninunene','aeiou'));

Posted by Glenn Jackman on June 18, 2008, 7:04 am
Please log in for more thread options
At 2008-06-18 03:06AM, "viki" wrote:
> How can I build regex that matches all characters of the string $STR
> in any order with .* added between any two characters: ?
> And without generating all N! transpositions (where N is length of
> $STR) ?
> Example.
> For $STR "abc", I want to match equivalent to:
> /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

print 'matched' if $STR =~ /a(?=.*b)(?=.*c)/
or $STR =~ /b(?=.*a)(?=.*c)/
or $STR =~ /c(?=.*a)(?=.*b)/
or
print 'matched' if $STR =~ /a(?=.*b)(?=.*c)|b(?=.*a)(?=.*c)|c(?=.*a)(?=.*b)/

So you don't have to create all the combinations but you do need all the
permutations (if I have my terminology correct)

--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous

Posted by Ben Bullock on June 18, 2008, 9:01 am
Please log in for more thread options
On Wed, 18 Jun 2008 11:04:21 +0000, Glenn Jackman wrote:

> print 'matched' if $STR =~ /a(?=.*b)(?=.*c)|b(?=.*a)(?=.*c)|c(?=.*a)
(?=.*b)/

Great! That is better than the solution I posted. But I have an
improvement:

/(?=.*a)(?=.*b)(?=.*c)/

without any actual matching string also works, reducing the regex length
from O(n^2) to O(n), where n is the number of characters.

> So you don't have to create all the combinations but you do need all the
> permutations (if I have my terminology correct)

You mean that you need all the combinations of initial characters, but
not all the permutations (which would be O(n!)).


Posted by Glenn Jackman on June 18, 2008, 12:04 pm
Please log in for more thread options
At 2008-06-18 09:01AM, "Ben Bullock" wrote:
> Great! That is better than the solution I posted. But I have an
> improvement:
>
> /(?=.*a)(?=.*b)(?=.*c)/

Nice. I wonder (without bothering to benchmark it) if anchoring the
expression would be an optimization:

/^(?=.*a)(?=.*b)(?=.*c)/

--
Glenn Jackman
Write a wise saying and your name will live forever. -- Anonymous

Similar ThreadsPosted
cockroach race: grep for characters in any order June 19, 2008, 8:10 am
regex for chars 192 to 255 February 29, 2008, 5:45 am
Regex for special chars.. April 18, 2006, 10:10 am
regex: how to %hash2 = grep %hash1 January 17, 2005, 7:51 am
Matching Multiple Patters In A Regex In Any Order September 26, 2005, 4:35 pm
Delete nonprinting chars September 6, 2004, 10:04 pm
Permuting using any number of given chars May 17, 2005, 10:51 pm
parsing UTF-8 chars out of POST data September 8, 2004, 11:00 am
Matching escaped delimiter chars November 28, 2005, 9:21 pm
Match a number of repeated chars, but NO MORE. December 2, 2005, 2:16 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap