Do you have a question? Post it now! No Registration Necessary. Now with pictures!
April 7, 2009, 11:07 pm
rate this thread
UTF-8ness of the values returned by toString() and nodeValue().
I know that toString() will give me what I need -octets regardless of
the underlying encoding- yet I can't understand how the character is
represented by the output of each method.
For example (note that the mangled char is the starting single char
my $parser = XML::LibXML->new;
my $dom = $parser->parse_file(shift);
my $node = ($dom->getElementsByTagName('title'));
print 'is utf-8: ' . Encode::is_utf8($node->firstChild->nodeValue,1);
print "node value";
print "to string";
my $txt = $node->firstChild->toString(0,1);
print 'is utf-8: ' . Encode::is_utf8($txt,1);
is utf-8: 1
Wide character in print at ./utf8-lib-xml.pl line 18.
Why is toString no longer UTF-8?
And, since the wide char has been broken down into octets, how does
one know that it's composed of 2 octets when its interpreted on the
receiving end (or even in my terminal)?
On the surface it seems as if I'd be breaking the UTF-8.
Is the toSting() method the preferred way to send the value of a
TextNode across the network?