|
Posted by Simon on June 24, 2008, 2:00 pm
Please log in for more thread options > Scripsit Simon:
>
> > I'm working on a team that is planning to add Welsh language support
> > to a large existing IT system which is partially web-based and
> > English-language-only so far.
>
> Do you plan to add other languages later? Is this about names only or
> also about prose texts? After all, ISO-8859-1 is insufficient even for
> normal English prose; think about dashes and proper quotations marks.
>
> > I've heard that 2 characters in Welsh
> > (w-circumflex and y-circumflex) are not supported in our default
> > ISO-8859-1 character set,
>
> Right. They are included in ISO-8859-14 (a.k.a. ISO Latin 8, or
> "Celtic"), but thats not a feasible option on the WWW (IE does not
> recognize that encoding).
>
> > so a partial move to Unicode for internal
> > storage of text might be required.
>
> That might be easy, or it might be extremely complicated. But that's
> really beyond the scope of these groups. As far as WWW authoring is
> concerned, Unicode - specifically UTF-8 - is a good option, but you
> could keep using ISO-8859-1 and represent those letters using character
> references like ŵ for w with circumflex. But you might have to deal
> with the encoding problem of the data bases involved, for example, and
> with data entry.
>
> > I haven't yet found a Welsh-language website that uses these 2
> > characters, so are they actually used much in Welsh?
>
> I don't know Welsh, but I expect those characters to be so rare that
> using some clumsy notation like character references for them wouldn't
> be a major problem.
>
> > Is not supporting them likely to cause problems?
>
> Some people might say that it is tolerable to omit the circumflex, but
> it may be distinctive (i.e. the only difference between otherwise
> identical words, thought the context usually resolves the issue). And in
> 2008, I think it is inappropriate to add support to languages to IT
> systems without supporting them properly, with all the characters needed
> for their correct writing.
>
> --
> Jukka K. Korpela ("Yucca")
> http://www.cs.tut.fi/~jkorpela/
>
Thanks for your reply.
Unfortunately multi-lingual support has not really been a priority in the
system design up to now,
although it has always been a possible future requirement. The system is a
complex mixture of
databases, Windows applications and web applications. I believe all the
databases and programming
languages we use already support Unicode , so I would aim to use that
support, rather than character
references which would be clumsy as you say.
|