utf8, length and syswrite are killing me

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


I have a russian card game at
http://apps.facebook.com/video-preferans /
which I've recently moved from using urlencoded data
to XML data in UTF-8. Since then it often hangs
for the users and I suspect, that my subroutine:

sub enqueue {
        my $child    = shift;
        my $data     = shift;
        my $fh       = $child->;
        my $response = $child->;

        # flash.net.Socket.readUTF() expects 16-bit prefix in network
        my $prefix   = pack 'n', length $data;

        # append to the end of the outgoing queue
        push @, $prefix . $data;

packs wrong number of bytes for cyrillic messages.

I'm using perl v5.10.0 at OpenBSD 4.5 and
"perldoc -tf length" suggests using

But when I put the line:

use Encode::Encoding;
        my $prefix   = pack 'n', length(Encoding::encode_utf8($data));

then it borks with

Undefined subroutine &Encoding::encode_utf8 called at Child.pm line

Any help please?

Also I have to mention, that when users chat
in Russian, my server just passes their cyrillic
messages around (with sysread - poll - syswrite).

But for their cyrillic words in my program (I "use utf8;")
I have to call  utf8::encode($cyrillic_word) before I can
write it away with syswrite or it would die ("wide char").

I've tried moving utf8::encode($data) into the
enqueue subroutine above but it doesn' allow me
(maybe because parts of $data are not utf8??)


Re: utf8, length and syswrite are killing me

On Wed, 17 Feb 2010 10:28:59 -0800 (PST), "A. Farber"

Quoted text here. Click to load it
If '$data' is still a Perl string,
I would encode() to UTF-8 octets then
push @outarray, pack ('n a*', length($octets), $octets);
But, you could do it a couple of different ways. Basically
you want the length to be of the encoded data, not the length
of the perl string (if it's in Perl character semantics).

You really don't want to push '$prefix . $data' if $data is
not yet encoded utf-8. If it is already encoded utf-8, then
the length would be correct because its already bytes (octets),
not character semantics.

You should read the Unicode docs: perluniintro, perlunicode, unicode, etc.
Each have links that take you to each other documentation.

Below is some examples of a couple of ways to do it. See what works
for you.


use strict;
use warnings;
use Encode;

binmode (STDOUT, ':encoding(UTF-8)');

my $perlstring = "This is a string <\x>...";
my $utf8octets = encode('UTF-8', $perlstring);
my $packd_string = pack('n', length($utf8octets));
my $unpackd_string = unpack('n', $packd_string);
print "** Perl string : '$perlstring', length = ", length($perlstring),"\n\n";
print "UTF-8 octets: '$utf8octets', length = ", length($utf8octets),"\n\n";
print "Packed length of encoded string is $unpackd_string\n\n";

my $len_plus_octets = $packd_string . $utf8octets;
print "Length.UTF-8 octets: '$len_plus_octets'\n\n";

my $packd_all = pack ('n a*', length($utf8octets), $utf8octets);
print "Packed all   : '$packd_all', length = ",length($packd_all),"\n\n";

my ($len,$octets) = unpack ('n a*',  $packd_all);
print "Unpacked all : '$octets', length = ",length($octets),"\n";
print "             :  read packed length = $len\n\n";
my $decoded_string = decode('UTF-8', $octets);
print "** Perl string : '$decoded_string', length = ", length($decoded_string),
if ($decoded_string eq $perlstring) {
    print "** Perl strings are equal.\n";
else {
    print "** Perl strings are not equal.\n";
** Perl string : 'This is a string <G>...', length = 23

UTF-8 octets: 'This is a string <+-->...', length = 25

Packed length of encoded string is 25

Length.UTF-8 octets: ' ?This is a string <+-->...'

Packed all   : ' ?This is a string <+-->...', length = 27

Unpacked all : 'This is a string <+-->...', length = 25
             :  read packed length = 25

** Perl string : 'This is a string <G>...', length = 23

** Perl strings are equal.

Re: utf8, length and syswrite are killing me

Thank you! I've ended up with encode($data) and after that the
length() gives me the number of bytes for the syswrite (I hope)

Site Timeline