Anyone care to explain this one?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

So, on my machine this gives me

$ echo "abc" | perl -pe 'tr/a-z/a-m/cd'

From reading the man pages it seems to me it should have deleted the
complement of a-z unless a character is in the replacement list, but
where did the "k" come from?

Even more perplexing is

$ echo "abc123op" | perl -pe 'tr/a-z/0-k/cd'

My LC_COLLATE is "C", LANG is "en_US.utf8", and I'm running Perl

Re: Anyone care to explain this one?

Quoted text here. Click to load it

echo "abc" generates 4 characters including the newline. So you have the
equivalent of

perl -e '$_ = "abc\n" ; tr/a-z/a-m/cd; print'

Notice that your "abck" was not followed by a newline on perl's stdout! The
result when I ran it actually looked like this:

$ echo "abc" | perl -pe 'tr/a-z/a-m/cd'

with the shell prompt glued to the k.

Why k? Well, what is the complement of the set a-z? It's the set of all
characters, that aren't a-z. The first character that's not in a-z is "".
The next is "", then "", etc. So the next equivalent to your original
operation is:

perl -e '$_ = "abc\n" ; tr/-`{-7/a-m/d; print'

(I'm not sure 7 is the proper upper limit in this age of large charsets,
but you get the idea.) The "`" is the character before "a" in ASCII, and the
"{" is the character after "z".

So what happened? The 13 replacement characters a-m were matched up against
the first 13 characters in the search list:

   a  b  c  d  e  f  g  h   i   j   k   l   m

(octal 12, decimal 10) is also known as \n, the newline character. So it
got translated to k. The "abc" input characters didn't match anything in the
search list (they belong to the a-z set that was complemented out) so they
pass through unchanged. If you had provided any input characters that were
neither a-z nor - they would have been matched and removed because of
the /d modifier.

I don't know if it would ever be a good idea to use the /c modifier, the /d
modifier, and a non-empty replacement list all in a single tr/// operation.
Having explained in detail what it did and why, it seems like even if that's
what you wanted to do, you should find a less obfuscated way to do it.

Quoted text here. Click to load it

Just like above, this means a-z pass through unchanged, but this time the
replacement list is much longer. "0" is "\x30" and "k" is "\x6b" in ASCII, so
you have input characters "" through "\x3b" being translated to 0-k. This
happens to include 0-9 ("\x30" through "\x39") being translated to "\x60"
through "\x69" (a-i). And the newline became a colon this time. The complete
translation table you've asked for is:

 0  1  2  3  4  5  6  7   8   9   :   ;   <   =   >   ?

  @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O

SPC ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ;
  P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k

and anything in the input that's neither a-z nor -\x3b would be deleted,
but once again you didn't include any of those.

Alan Curry

Site Timeline