XML::LibXML UTF-8 toString() -vs- nodeValue()

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

I need to send data across the network and I'm confused by the
UTF-8ness of the values returned by toString() and nodeValue().

I know that toString() will give me what I need -octets regardless of
the underlying encoding- yet I can't understand how the character is
represented by the output of each method.

For example (note that the mangled char is the starting single char
quote) :

use strict;
use warnings;
use XML::LibXML;
use Encode;


my $parser = XML::LibXML->new;
my $dom = $parser->parse_file(shift);
my $node = ($dom->getElementsByTagName('title'))[0];

print $dom->actualEncoding;
print 'is utf-8: ' . Encode::is_utf8($node->firstChild->nodeValue,1);
print "node value";
print $node->firstChild->nodeValue;
print "to string";
my $txt = $node->firstChild->toString(0,1);
print $txt;
print 'is utf-8: ' . Encode::is_utf8($txt,1);


is utf-8: 1
txt content
Wide character in print at ./utf8-lib-xml.pl line 18.
to string
is utf-8:

Why is toString no longer UTF-8?

And, since the wide char has been broken down into octets, how does
one know that it's composed of 2 octets when its interpreted on the
receiving end (or even in my terminal)?

On the surface it seems as if I'd be breaking the UTF-8.

Is the toSting() method the preferred way to send the value of a
TextNode across the network?

Site Timeline