UTF-8 garbage characters

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Pierre Goiffon        Oct 6 2004, 4:29 am     show options
Newsgroups: comp.infosystems.www.authoring.html
Quoted text here. Click to load it

I saw this old post and decided that I did not understand it.

Suppose I have a form on a webpage and that form has a UTF-8 charset
header. Suppose there is also a textarea in that form, and a submit
button. Suppose I write something in Microsoft Word and use lots of
strange characters, then I copy and paste it into the textarea and hit
the submit button. At the other end, receiving the form, is a PHP
script which takes that text and makes it a webpage, with a UTF-8
charset header.

If I understand what Pierre Goiffon is saying, then it sounds as if no
garbage characters will appear on that page, no matter how many strange
characters I used in the Word document. It sounds to me as if he is
saying that everything will magically get transformed into a character
that makes sense in UTF-8.

Am I missing something? Surely that is not how it works?

Re: UTF-8 garbage characters

On 27 May 2005 12:19:25 -0700,
lkrubner@geocities.com posted:

Quoted text here. Click to load it

That is what the user's system *should* have done (any conversions as it
cut and paste, as was necessary), and the data sent properly encoded.  With
the recipient handling it however they do.

However, *some* computers do not do that.  If you copy data from one
application that was using Windows1252 encoding into something else that
was using UTF-8, the cut-and-paste function doesn't translate.

It should, because only it's there as an intermediary, and only it (that
computer) knows the two different encoding methods being used.

If you insist on e-mailing me, use the reply-to address (it's real but
temporary).  But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.

Re: UTF-8 garbage characters

Thanks. That's good to know.

Re: UTF-8 garbage characters

   at 12:19 PM, lkrubner@geocities.com said:

Quoted text here. Click to load it

No. He is saying that garbage characters will only appear if you input
garbage characters. He said nothing about whether a cut-and-paste from
m$ word works correctly. If the result of the paste is to place the
correct characters in the form, then you will not get garbage
characters. If the effect of the paste is to put garbage characters in
the form, then the problem is with microsoft, not with the use of

Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org

Site Timeline