Click here to get back home

HTML entities from input fields

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
HTML entities from input fields chernyshevsky 01-11-2006
Get Chitika Premium
Posted by chernyshevsky on January 11, 2006, 9:25 am
Please log in for more thread options


How do I force IE to encode characters outside of the current code-page
as HTML entities? Right now, when I enter some Cyrillic text into a
ISO-8859-1 form, the text submitted ends up being CP1251. If I enter
some Polish letters, the text is CP1252. This behavior is too weird! I
need IE to do things one way and one way only.


Posted by Jukka K. Korpela on January 11, 2006, 6:28 pm
Please log in for more thread options


chernyshevsky@hotmail.com wrote:

> How do I force

You don't, on the WWW.

Please don't crosspost pointlessly. Either your question is about HTML
authoring for the WWW and therefore belongs to c.i.w.a.h., or it is not and
does not belong here (there). Please make up your mind so that others won't
need to do that for you. I'm guessing this is about WWW authoring, so I set
followups to c.i.w.a.h.

> IE to encode characters outside of the current code-page
> as HTML entities?

You cannot force such grossly incorrect behavior. You just need to be
prepared to getting form data encoded that way, from IE and perhaps other
browsers as well.

What is your _original_ question, as opposite to an assumed solution that
itself aims at forcing browsers to misbehave?

> Right now, when I enter some Cyrillic text into a
> ISO-8859-1 form, the text submitted ends up being CP1251. If I enter
> some Polish letters, the text is CP1252. This behavior is too weird!

It's weird too, but maybe not technically incorrect (the specs are fuzzy).

> I need IE to do things one way and one way only.

You can't. You just need to live with it.

If you wish to be prepared to getting arbitrary character data (as a form
designer should be, right?), make the page containing the form UTF-8 encoded.
Browsers will then send the data in UTF-8 format (though of course, some old
browsers may fail to do this - but there is little hope with them anyway).

The usual tutorian in matters like this is Alan's
http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html


--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html


Posted by Alan J. Flavell on January 11, 2006, 7:00 pm
Please log in for more thread options


On Wed, 11 Jan 2006, Jukka K. Korpela wrote:

> chernyshevsky@hotmail.com wrote:
>
> > IE to encode characters outside of the current code-page
> > as HTML entities?
>
> You cannot force such grossly incorrect behavior.

Well, in as much as behaviour in this situation isn't defined, it's
hard to say that it's "incorrect", but it's certainly
counter-intuitive, and I'd rate it as distinctly sub-optimal, because
the results are ambiguous.

But over and above that, I would criticise the specification writers
for failing to grasp the fact that you can't prevent users from
submitting whatever characters they care to paste into the submission
fields: they should have made some kind of unambiguous provision for
what ought to happen in that case. I've seen at least two unambiguous
ways that it *could* be done (or "could have been done" if the
spec-writers had got there first) - but it seems too late to remedy
that now.

> You just need to be prepared to getting form data encoded that way,
> from IE and perhaps other browsers as well.

That's the reality of it, indeed.

> If you wish to be prepared to getting arbitrary character data (as a
> form designer should be, right?), make the page containing the form
> UTF-8 encoded. Browsers will then send the data in UTF-8 format
> (though of course, some old browsers may fail to do this - but there
> is little hope with them anyway).

Based on the observation that search services like Google have been
doing this for a couple of years already, it seems that they, at
least, rate this as practically feasible nowadays. Though they might
still have a fallback if they detect that NN4.* is calling (NN4.*
versions are quite capable of *rendering* utf-8, generally speaking:
and they indicate that capability in their Accept-charset header, but
when it comes to submitting utf-8 data, they get it horribly wrong).

thanks for the cite. I think it says everything else I could want to
add on the topic...

cheers

Similar ThreadsPosted
Upper limit on number of fields? May 5, 2005, 11:25 pm
HTML Form "Lookup" Fields May 8, 2005, 4:23 am
Is there a way to change the color of the text fields in html? March 9, 2006, 11:51 am
HTML Forms, file input. How extensible is it? October 19, 2005, 2:12 am
REQ: HTML editor to reorganize directories suitable for a CHM eBook compiler input. January 21, 2005, 1:24 pm
REQ: HTML editor to reorganize directories suitable for a CHM eBook compiler input. January 28, 2005, 5:51 am
& in one action? April 4, 2005, 6:34 am
What happens if several text fields have the same name? May 24, 2005, 10:00 am
non-remembering fields November 8, 2007, 1:29 pm
form fields and no-cache July 22, 2004, 11:10 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap