Charsets on multi-language website

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

I recently discovered that the web server I use has started to specify
Latin-1 as the default charset, with the result that my Greek, Russian,
Persian, etc pages failed to display properly.  I had previously used
the deprecated <META ... charset ...> header tags, which worked for a
time -- presumably because the server didn't originally specify a
default charset.

My learning curve over the last few days has been quite steep: thank
you, Alan, Jukka et al (how are things, Al?) for your useful & clearly
expressed postings on this topic.

I had assumed -- erroneously -- that charset/encoding instructions
acted something like CSSs, with specifications on a webpage overriding
any centrally-specified default.

FWIW, & in the hope that it may be useful for someone in the same
position, here is the (Apache) .htaccess file I finally came up with:

AddCharset UTF-8 .htm
<Files ~ "^g(reek|s|c).+\.htm$">
AddCharset Windows-1253 .htm
<Files ~ "^ro.+\.htm$">
AddCharset Windows-1250 .htm
<Files ~ "^ru?s.+\.htm$">
AddCharset Windows-1251 .htm
<Files ~ "^t(ur|s).+\.htm$">
AddCharset Windows-1254 .htm

It looks a bit messy, & if I were starting from scratch I would have
organized the files into language folders.  But the file may be of
interest as a sort of template.  Briefly, for the benefit of anyone
unfamiliar with the format:

1. I start by making UTF-8 the default encoding.

2. I specify the encodings for Greek, Romanian, Russian and Turkish, in
that order.

3. I use regular expressions to cover the file names for each language
(of course these should have been rationalized, but I didn't want to
have to rewrite hundreds of links!).

HTH someone ...


ScriptMaster language resources (Chinese/Modern & Classical

Site Timeline