|
Posted by RedGrittyBrick on February 28, 2008, 4:47 am
Please log in for more thread options maria wrote:
> On Wed, 27 Feb 2008 22:45:02 -0500, "John W. Kennedy"
>
>> maria wrote:
>>> I am using a CGI program to read XML files and extract their various
>>> items. Somehow, my program converts the apostrophe "’" to ...
>>> "\â\€\™". How do I program my CGI program to convert "’" to
>>> an apostrophe, "'"? Is there a little CGI code that will convert
>>> all these different strings (including dagger, ellipsis,
>>> euro symbol, double quote, etc.) to their ASCII equivalents?
>>> Thank you very much.
>>>
>>> maria
>> You have a serious misunderstanding that is much too complicated to
>> explain here. Learn about Unicode.
>
> The whole modern world is filled with people who feel compelled to
> respond to other people's messages when they have absolutely nothing
> to say.
>
Oh dear. Replying to percieved rudeness with more rudeness just puts off
potential helpers.
John's reply *did* contain something useful to you.
AIUI John is pointing out that "\â\€\™" is your Unicode apostrophe
encoded in UTF-8 but displayed using an incorrect encoding such as Latin-1.
Unicode code-point u2019 is represented in UTF8 as the byte sequence e2
80 99 (shown here in hexadecimal), that same byte sequence, when
interpreted as Latin-1 is the three characters ’ (a acute, euro,
trademark).
You can learn more about Perl's handling of unicode by typing the
command `perldoc perlunicode`
It's a while since I've read the posting guidelines for this newsgroup
but I'm pretty sure they suggest you include a short example program
that demonstrates your problem. That would make it easier for people to
help you identify what you are doing wrong.
|