|
Posted by lawrence on August 11, 2004, 9:38 am
Please log in for more thread options
I was told in another newsgroup (about XML, I was wondering how to
control user input) that most modern browsers empower the designer to
cast the user created input to a particular character encoding. This
arose in answer to my question about how to control user input. I had
complained that I had users who wrote articles in Microsoft Word or
WordPerfect and then input that to the web through a textarea box on a
form I'd created.
I've run google searches on this and I get tons of info but none to
the point. Can anyone here give me pointers on converting form input
to a particular character encoding?
|
|
Posted by Kris on August 11, 2004, 8:20 pm
Please log in for more thread options
lkrubner@geocities.com (lawrence) wrote:
> I was told in another newsgroup (about XML, I was wondering how to
> control user input)
Authors on the WWW cannot control.
> that most modern browsers empower the designer to
> cast the user created input to a particular character encoding. This
> arose in answer to my question about how to control user input. I had
> complained that I had users who wrote articles in Microsoft Word or
> WordPerfect and then input that to the web through a textarea box on a
> form I'd created.
>
> I've run google searches on this and I get tons of info but none to
> the point. Can anyone here give me pointers on converting form input
> to a particular character encoding?
Serve the page that holds your form in the character encoding you
envision. Use an HTTP Content-type header firstly, the Meta
'content-type' element as an addition.
Why not use utf-8? I hear utf-16 has problems on the Web. Perhaps I am
misinformed, so take the comments of better-informed people more
seriously.
--
Kris
|
|
Posted by Pierre Goiffon on August 12, 2004, 11:42 am
Please log in for more thread options > I was told in another newsgroup (about XML, I was wondering how to
> control user input) that most modern browsers empower the designer to
> cast the user created input to a particular character encoding. This
> arose in answer to my question about how to control user input. I had
> complained that I had users who wrote articles in Microsoft Word or
> WordPerfect and then input that to the web through a textarea box on a
> form I'd created.
This is a very vast subject. You should read a lot of docs to fix your
ideas... And then read things that will help you fix your particular
problem.
Firstly, go to the W3C :
http://www.w3.org/TR/html401/charset.html And point also to the internationalization part :
http://www.w3.org/International/articles/
Then you could go and read the lots of complementary documents, for exemple
:
http://www.cs.tut.fi/~jkorpela/chars/index.html http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html
Then you will notice theory and pratice are, as commonly observed on the
web, not very identical. Especially, strange things happens when the user do
very common things, like insert a euro sign under IE6 on Windows, on a form
that was sent specifying an UTF-8 charset. I didn't do a lot of testing with
Office in Windows, but I suspect such strange behaviors like this to happen
when doing a simple copy/paste... If anyone have experience about that, I
would be very pleased to ear from her/him !
Oh by the way, using UTF-16 in a web context isn't to recommend, even if
your document contains a lot of non latin characters... You should use an
usual 8 bit charset (iso latin-1 or 9 for exemple, depends on the main
language you use), or UTF-8 if you really need it. But be aware of making
your choice knowing exactly all the consequences of them !
|
|
Posted by Alan J. Flavell on August 14, 2004, 1:34 am
Please log in for more thread options On Wed, 11 Aug 2004, lawrence wrote:
> I was told in another newsgroup (about XML, I was wondering how to
> control user input) that most modern browsers empower the designer to
> cast the user created input to a particular character encoding.
If you have a captive browser population, then that might be feasible;
but in a WWW context this is rarely the case - you have to make the
best use of what you get.
> This arose in answer to my question about how to control user input.
With respect: in a WWW context you do better to pay attention to the
options that are open to you to interpret what you've been sent, since
in the final analysis you can't literally "control" anything. In your
server-side process, you need to be able to cope with ( it means:
either accept if you can, or gracefully refuse if you can't ) just
anything that a client will send to you, including what will be sent
by malicious or just plain broken clients.
> I had complained that I had users who wrote articles in Microsoft
> Word or WordPerfect and then input that to the web through a
> textarea box on a form I'd created.
I think you'll need to be more specific. In Word alone, I've seen so
many variations (including Mac Word users who had created Mac-coded
characters which didn't exist in the Windows encoding!) that I could
write a whole thesis on the topic.
> I've run google searches on this and I get tons of info but none to
> the point. Can anyone here give me pointers on converting form input
> to a particular character encoding?
I see that you've already been pointed to my tutorial-ish page at
http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html
My impression is that your best chance with modern browsers is to send
the form page with utf-8 encoding and to expect the form submission to
come back in utf-8 encoding. But if you need to deal with NN4.* this
will go horribly wrong, and some other browsers with limited scope
will do that too. Content negotiation unfortunately doesn't help
here, since NN4 *claims* to support utf-8 - and, as far as display of
web pages is concerned, that's sort-of true; but when it comes to form
submission, it goes desperately wrong.
There may be other browsers (e.g WebTV) to worry about, but at least
they don't make Accept-charset claims which they're unable to fulfil.
My web page cited above is certainly incomplete. And there are some
more-recent notes on this topic at the W3C, I think. Feel free to
share your experiences and see if we can improve this area of
coverage, if not in the software, then at least in documentation and
tutorials, OK?
btw I don't know any reason to favour utf-16 encodings - every browser
which I'm aware of supporting utf-16 can also support utf-8, which
seems to me to be better supported (and advertised as supported via
accept-charset) in general. So I'd go for utf-8 if it's advertised by
the client, except where it's known not to work (NN4.*).
good luck
|
|
Posted by Lachlan Hunt on August 14, 2004, 8:29 am
Please log in for more thread options Alan J. Flavell wrote:
> btw I don't know any reason to favour utf-16 encodings - every browser
> which I'm aware of supporting utf-16 can also support utf-8, which
> seems to me to be better supported (and advertised as supported via
> accept-charset) in general.
UTF-16 is better for documents written in a language where the majority
of characters used would be more than 2 bytes in UTF-8. So, for
documents that mostly use ASCII characters, with the occasional
puncutation (such as an Em-dash —), dingbat ☺, or other symbol outside
the ASCII range, then UTF-8 is better. AFAIK, the main reason to choose
one over the other on the web is file size, and UA support.
> So I'd go for utf-8 if it's advertised by
> the client, except where it's known not to work (NN4.*).
Anyone who hasn't upgraded from NN4 will have difficulty with more than
just character encodings, so I wouldn't consider that a problem worth
worrying about.
--
Lachlan Hunt
http://www.lachy.id.au/
Please direct all spam to abuse@127.0.0.1
Thank you.
|
| Similar Threads | Posted | | A Browser Specific Problems text does not appear in IE 7 | May 31, 2008, 5:16 pm |
| anyway to programatically disable autocomplete for Firefox and IE for specific form fields? | July 25, 2007, 8:08 pm |
| TextArea - Formatting text | August 24, 2004, 11:57 am |
| Textarea Text Invisible | December 16, 2004, 7:16 am |
| Text alignment in | November 23, 2004, 10:05 am |
| can't get text field to align with textarea field | March 9, 2005, 11:09 am |
| Pre-populating a Text Box in a form | November 7, 2004, 2:20 am |
| text fields in a form | February 11, 2007, 2:33 pm |
| text/plain form enctype | September 22, 2005, 6:28 am |
| text/plain form enctype | September 22, 2005, 7:00 am |
|