CJK Unified unicode translator

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Does any one know of a translator, preferably one implemented in perl,
that will translate CJK Unified code points to their respective language
code points.  If I understand the concept of the CJK Unified code
points, these code points render as glyphs that are basically the same
in Chinese, Japanese or Korean.

My problem is that the target application I'm working with can't render
the CJK Unified code point because it is expecting, and can only handle,
JIS code points, but some of the data being fed to it is in CJK Unified
code points.

My search through CPAN didn't show anything obvious, at least to me.
Any pointers or suggestions would be appreciated.

d underscore roesler at agilent dot com

Re: CJK Unified unicode translator

Dennis Roesler wrote:
Quoted text here. Click to load it

Have you looked at the Encode module? It might be as simple as opening
an input file specifying the CJK encoding, an output file specifying
JIS, and reading and writing. See "Encoding via PerlIO for this
particular slant on things.

Tom Wyant

Re: CJK Unified unicode translator

harryfmudd [AT] comcast [DOT] net wrote:
Quoted text here. Click to load it

I found Unicode::Unihan after more research introduced the Unihan term :-(.

Quoted text here. Click to load it

I've looked at this, but there doesn't seem to be an encoding that is
CJK Unified specific.

I've tried the following using this example from the Encode docs but
when I write the data out it complains about the CJK stuff that isn't
shiftjis.  I toss the xml encoding line and rewrite it with encoding as
shiftjis, but besides the above errors XML::Simple complains that it
can't find Shift_JIS encoding and won't parse the file.

use Encode;

open my $in,  "<:encoding(utf8)", $infile  or die "In $infile: $!";
open my $out, ">:encoding(shiftjis)", $outfile or die "Out $outfile:  $!";
my $fline = <$in>;

print $out qq~<?xml version="1.0" encoding="Shift_JIS"?>~;

while(<$in>){ print $out $_;  }

I could change the workflow and have XML::Simple handle the UTF-8 file
and then use Encode's from_to function, or Unicode::Unihan, to do the

d underscore roesler at agilent dot com

Site Timeline