I saw this old post and decided that I did not understand it.

Suppose I have a form on a webpage and that form has a UTF-8 charset
header. Suppose there is also a textarea in that form, and a submit
button. Suppose I write something in Microsoft Word and use lots of
strange characters, then I copy and paste it into the textarea and hit
the submit button. At the other end, receiving the form, is a PHP
script which takes that text and makes it a webpage, with a UTF-8
charset header.

If I understand what Pierre Goiffon is saying, then it sounds as if no
garbage characters will appear on that page, no matter how many strange
characters I used in the Word document. It sounds to me as if he is
saying that everything will magically get transformed into a character
that makes sense in UTF-8.

Am I missing something? Surely that is not how it works?

On 27 May 2005 12:19:25 -0700,
lkrubner@geocities.com posted:

Quoted text here. Click to load it

That is what the user's system *should* have done (any conversions as it
cut and paste, as was necessary), and the data sent properly encoded.  With
the recipient handling it however they do.

However, *some* computers do not do that.  If you copy data from one
application that was using Windows1252 encoding into something else that
was using UTF-8, the cut-and-paste function doesn't translate.

It should, because only it's there as an intermediary, and only it (that
computer) knows the two different encoding methods being used.

No. He is saying that garbage characters will only appear if you input
garbage characters. He said nothing about whether a cut-and-paste from
m$ word works correctly. If the result of the paste is to place the
correct characters in the form, then you will not get garbage
characters. If the effect of the paste is to put garbage characters in
the form, then the problem is with microsoft, not with the use of

