pack 'C3U*' not same as pack 'C3(xC)*'

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


I have a small card game. The clients are Java-applets and the
server is written in C, mostly forwarding data from applet to applet.

The message format is:

    1 byte:             Number of unicode chars (s. below)
    2 byte:             Player number
    3 byte:             Event id
    up to 510 bytes:    A Java unicode string

Now I'm trying to rewrite my C-server to perl, because that way
it's easier to add features (syslog, auth against an SQL-db, etc.)

I have problems to understand what would be the best pack-format for
my messages. I have read "perldoc -f pack" numerous times and also
the many O'Reilly books I have, but the best I've come up with is

   pack "C3(xC)*", length $ascii_str, $num, $id, unpack "C*",

for the cases, when I need to send an ASCII string (like an IP address
string) from the server to the Java-applet and thus have to stuff the
upper bytes of that ASCII with zeros (that's why the "x" above).

I wonder, why doesn't pack "C3U*" do the same? Here is a demo:

    # perl -e '$str=pack "C3(xC)*", 4, 0, 14, unpack "C*", "test"; \
        print join " ", unpack "C*", $str'

    4 0 14 0 116 0 101 0 115 0 116

    # perl -e '$str=pack "C3U*", 4, 0, 14, unpack "C*", "test"; \
        print join " ", unpack "C*", $str'

    4 0 14 116 101 115 116

As you see, the stuffing zeros are missing in the second output.
But why? Doesn't "perldoc -f pack" say

    If you don't want this [UTF8] to happen, you can
    begin your pattern with "C0" (or anything else) to force
    Perl not to UTF8 encode your string, and then follow
    this with a "U*" somewhere in your pattern.


PS: Also I wonder, if there are any nicer ways to communicate
    Java-strings to Perl. "perldoc -f pack" mentions "n/..."
    for Java-Strings, but doesn't elaborate. Is it "n/U*" ?

Re: pack 'C3U*' not same as pack 'C3(xC)*'

Alexander Farber wrote:
Quoted text here. Click to load it

Quoted text here. Click to load it

I'd be tempted to use XML as the data format, in fact, I'd probably use


Re: pack 'C3U*' not same as pack 'C3(xC)*'

Quoted text here. Click to load it

Your "Java unicode string" is presumably in (big-endian) UCS-2, which
is the representation used internally by Java.  This is not how perl
normally encodes Unicode strings.

Quoted text here. Click to load it

This is indeed a perfectly good way to convert ASCII (or ISO Latin 1)
text to UCS-2.  If you want to handle characters above 255 as well,
may I suggest something like:

  pack "C3n*", length($string), $num, $id, unpack "U*", $string;

Quoted text here. Click to load it

Because pack("U*") encodes the characters in UTF-8, not in UCS-2.
UTF-8 is a variable-length format which encodes ASCII characters in a
single byte and other characters in 2 or more bytes.  So if your
original string only contains ASCII characters, it makes no difference
whether you use "U*" or "C*".

UTF-8 is also the format used by perl to store Unicode strings
internally, although perl hides this fact reasonably well -- in
theory, at least.  As perl's Unicode support matures, practice is
gradually starting to approach theory here.

For more information, try googling for UTF-8 and UCS-2.

Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.

Site Timeline