|
Posted by A. Sinan Unur on May 1, 2008, 10:16 pm
Please log in for more thread options benkasminbullock@gmail.com (Ben Bullock) wrote in
>
>>> If the strings to swap are longer than a single character,
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> s/A/unlikely/g;
>>> s/C/A/g;
>>> s/unlikely/C/g;
>>> s/G/unlikely/g;
>>> s/U/G/g;
>>> s/unlikely/U/g;
>>>
>>> where "unlikely" is a string which is unlikely to occur in your
>>> data.
>>
>> A simple lookup table driven solution would obviate the need to make
>> assumptions about the unlikeliness of a given character as well as
>> getting rid of the multiple substitutions.
>
> And a simple tr/// based solution would obviate the need to for you to
> write a lookup table solution. But if the strings to swap are longer
> than a single character, the lookup table solution is going to be
> somewhat complex.
Granted.
> Here is an example of a badly-written lookup table solution:
>
<snipped for brevity>
>
> The problem here is that the writer has put the same data, the list of
> stuff to swap, in three different places. Maybe that kind of clumsy
> solution is OK for an example program,
and that was the spirit in which those lines were written.
> but for the real world it's not. If one uses a lookup table, then the
> swapping data should only be in exactly one place:
>
> my %subst = qw/A C G U/; # Do not repeat this data anywhere!!!!!
> %subst = (%subst, reverse %subst);
> my $substkeys = join ('|',keys %subst); # We want to swap strings so use |
> my @strings = qw( ACGU GUACCGU );
> s/($substkeys)/$subst/g for @strings;
>
> If one uses the original solution proposed above, as the list of data
> to swap changes, (and since the strings consist of more than one
> character, remember), bugs will occur if the programmer is not
> extremely careful about updating both parts of the list of stuff to
> swap and the left hand side of the substitution.
>
> So I don't recommend a lookup table, unless one knows what one is
> doing.
Well, if one uses the solution you proposed above and the list of data
to swap changes to
my %subst = qw( A|C C|A G|U U|G );
there will be issues with the way you build the search string.
So:
#!/usr/bin/perl
use strict;
use warnings;
my %replace = qw( A|C C|A G|U U|G A$A Z$Z);
%replace = (%replace, reverse %replace);
my $search = join ('|', map { "(?:\Q$_\E)" } keys %replace);
my @strings = qw( A|C G|U G|UA|CC|AG|U Z$Z A$A );
print "Before:\t@strings\n";
s/($search)/$replace/g for @strings;
print "After\t@strings\n";
__END__
--
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
|