Click here to get back home

[regex] grep for chars in any order

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
[regex] grep for chars in any order viki 06-18-2008
Posted by Mario D'Alessio on June 18, 2008, 2:20 pm
Please log in for more thread options

> How can I build regex that matches all characters of the string $STR
> in any order with .* added between any two characters: ?
> And without generating all N! transpositions (where N is length of
> $STR) ?
> Example.
> For $STR "abc", I want to match equivalent to:
> /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
>
> Generating all transpositions is not feasible for larger legths of
> $STR.
> /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
> What is good solution ?
>
> Thanks
> vkm

The way I see the solution, you can have any of the $STR characters,
followed by .*, followed by another of any of the $STR characters:

/[$STR].*[$STR]/

Or am I missing something?

Mario



Posted by Mario D'Alessio on June 18, 2008, 2:22 pm
Please log in for more thread options
Ignore my post. I realize my mistake. I missed the
part about the regex matching ALL of the characters.

Mario

>
>> How can I build regex that matches all characters of the string $STR
>> in any order with .* added between any two characters: ?
>> And without generating all N! transpositions (where N is length of
>> $STR) ?
>> Example.
>> For $STR "abc", I want to match equivalent to:
>> /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
>>
>> Generating all transpositions is not feasible for larger legths of
>> $STR.
>> /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
>> What is good solution ?
>>
>> Thanks
>> vkm
>
> The way I see the solution, you can have any of the $STR characters,
> followed by .*, followed by another of any of the $STR characters:
>
> /[$STR].*[$STR]/
>
> Or am I missing something?
>
> Mario
>
>



Posted by jl_post@hotmail.com on June 18, 2008, 7:16 pm
Please log in for more thread options
> How can I build regex that matches all characters of the string $STR
> in any order with .* added between any two characters: ?
> And without generating all N! transpositions (where N is length of
> $STR) ?
> Example.
> For $STR "abc", I want to match equivalent to:
> /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/


Dear Viki,

If you don't mind using several regular expressions (one for each
letter), you can easily write:

/a/ and /b/ and /c/

You can even put it in a Perl grep() statement (which I presume is
what you intend to use it for) like this:

my @firstList = ('cab', 'back', 'cat', 'crab', 'dog', 'baby');
my @secondList = grep { /a/ and /b/ and /c/ } @firstList;

In this way, @secondList would contain 'cab', 'back', and 'crab',
but not 'baby' (which would have been a false positive in your
previous example).

Of course, this approach uses one regular expression for each
letter that you're looking for (instead of just one last regular
expression), but depending on how you're writing your code, that may
be acceptable.

I hope this helps, Viki.

-- Jean-Luc

Posted by John W. Krahn on June 19, 2008, 12:57 pm
Please log in for more thread options
viki wrote:
> How can I build regex that matches all characters of the string $STR
> in any order with .* added between any two characters: ?
> And without generating all N! transpositions (where N is length of
> $STR) ?
> Example.
> For $STR "abc", I want to match equivalent to:
> /(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/
>
> Generating all transpositions is not feasible for larger legths of
> $STR.
> /[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
> What is good solution ?

I haven't tested this but this may do what you want:

( Assuming the data you are searching is in $data )

$data =~ s/[^\Q$STR\E]+//g;
print "matched!\n" if join( '', sort split //, $data ) eq join( '', sort
split //, $STR );



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Posted by Ben Bullock on June 19, 2008, 7:17 pm
Please log in for more thread options
On Thu, 19 Jun 2008 16:57:15 +0000, John W. Krahn wrote:

> ( Assuming the data you are searching is in $data )
>
> $data =~ s/[^\Q$STR\E]+//g;
> print "matched!\n" if join( '', sort split //, $data ) eq join( '',
sort
> split //, $STR );

This fails (gives a false negative) if $data = "abcabc" and $STR = "ab",
because the result of the first "join" is "aabb" and the second "join" is
"ab". You need to do some kind of unique sort.

Similar ThreadsPosted
cockroach race: grep for characters in any order June 19, 2008, 8:10 am
regex for chars 192 to 255 February 29, 2008, 5:45 am
Regex for special chars.. April 18, 2006, 10:10 am
regex: how to %hash2 = grep %hash1 January 17, 2005, 7:51 am
Matching Multiple Patters In A Regex In Any Order September 26, 2005, 4:35 pm
Delete nonprinting chars September 6, 2004, 10:04 pm
Permuting using any number of given chars May 17, 2005, 10:51 pm
parsing UTF-8 chars out of POST data September 8, 2004, 11:00 am
Matching escaped delimiter chars November 28, 2005, 9:21 pm
Match a number of repeated chars, but NO MORE. December 2, 2005, 2:16 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap