i18n - how best to provide multilingual content

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I have a small, embedded app that uses a webserver to serve up pages
showing status, etc.

Right now all the pages are hard-coded in English.  We need to provide
multi-lingual support.

All of the pages are PHP generated.  Ideally, I'd like for the PHP
backend to serve up the language based a) the user's locale, and if that
is not set, its own locale.

The PHP backend creates the pages on the fly from XML templates, so it
wouldn't be that hard for us to change the language.

But... I don't know the best way to do that.  What is the current 'state
of the art' for language on demand in web content?


Re: i18n - how best to provide multilingual content

CptDondo wrote:

Quoted text here. Click to load it

Do you mean automatic translation; or do you mean serving up the best
choice of human-written translations?

Automatic translations are rubbish -- they are laughably bad, and will
present an entirely unprofessional image. Do not even consider using them,
except on a site that's indented to be ridiculed.

That's not to say that their completely useless -- tools like Babelfish
are useful for the *visitor* if they find a foreign site that they would
like to read -- you can usually get the gist of it. But for the author,
they are rubbish.

For human-written translations, assuming you have got good translators,
the situation is much better. Catering for a visitor in their own language
shows that you're willing to make the extra effort to do business with

Many companies will offer entirely different sites for each language. If
you have the resources to manage such a layout, it is often the best
choice because:

    1. It allows URLs to be tailored to the language. e.g.
       which should help with multi-lingual search engine optimisation.

    2. It allows for a different information focus in each language.
       For example, I was once told by a translator that translating
       technical manuals between cultures involves so much more than
       word-for-word translation. People of different cultures expect
       to find different things in their documentation. Americans expect
       the manual to be a tour-de-force of the product's unique features,
       virtually an advertisement for the product; Western Europeans
       expect a fairly dry step-by-step explanation of how to use the
        product to accomplish different aims; Eastern Europeans expect
       information on how to repair the product when it breaks, as in
       their experience, these things inevitably do.

    3. It allows you to take baby-steps. Say, you've decided you want
       to expand into the German market, but you're not sure how much
       business you'll do there, so don't want to invest a lot of money
       having your entire site translated into German. You may want to
       just create a single page site in German, with basic information
       about your company, explain that the site's German translation is
       still pending, that there is more information on the English
       version of the site, and provide the telephone extension for Gunther,
       who works in your New York office, but was born and raised in Munich.
       As your German sales take off, you then plough back some of the money
       into improving the German site. Perhaps one day, the German market
       will be so important to you that you open an office in Berlin, and
       allow them to maintain the German site directly.

The other approach with human-written translations is to have a single
site available in multiple languages. For example, you ask/detect a user's
preferred language, and then when they go to:


a PHP script serves up the information in the correct language. If a
translation is not available for that particular page (say, it's a new
product, so the translators haven't finished with it yet), then you just
serve up the English page. This is a reasonably good method, but it
doesn't have advantages #1 and #2 above. It kind of has #3, but your
baby-steps look a little silly because they end up as a mixture of, in the
above example, German and English. This method can ease maintenance though.

Always be careful not to let the translated versions of the site fall too
far behind the English version in updatedness.

Toby A Inkster BSc (Hons) ARCS
Contact Me  ~ http://tobyinkster.co.uk/contact

Re: i18n - how best to provide multilingual content

CptDondo wrote:

Quoted text here. Click to load it

For serious non-XSLT work, consider JSP instead of PHP. The i18n tools
are vastly better. Read the O'Relliy Java internationalization book,
just for a guide to web i18n.

Quoted text here. Click to load it

Make the selection completely user-selectable, with cookie persistence,
with the methods you describe setting the default. It works just the
same by default, but it's more flexible for casual users finding
themselves using other people's computers' It's a real nuisance

Quoted text here. Click to load it

XML or XSLT ?  If you structure the data model reasonably well, it's
not hard to extract text strings stored in groups for each function,
one for each language. It's easier to manage the translation and
deployment though if the text are grouped by language into separate
files and identified by a short identifier.  The XSLT document()
function is especially handy.

Re: i18n - how best to provide multilingual content

Andy Dingley wrote:
Quoted text here. Click to load it

Check, the order in which I determine language:
- Explicitly set (by a GET variable, or pseudo one like /en/ or /de/ etc.
taken into a rewrite)
- Cookie
- HTTP-Accept-Language in the header
- Geo-IP info (there are free databases available, which are mostly
accurate enough to determine the coutry most of the time)
- System default

After determining the language the cookie will be sent/overwritten with the
current choice.

Rik Wasmus

Re: i18n - how best to provide multilingual content

V Sat, 20 Jan 2007 14:27:37 +0100, Rik napsal(a):

Quoted text here. Click to load it

Thanks.  I'll probably do something like that - I've thought about
using the 'HTTP-Accept-Language' var from the header.  I just don't know
how many people actually set those correctly.

I guess I didn't phrase my question accurately enough; it has been a long

I have XML templates that define item labels in a form.  The XML has
various tags that provide nav info and so on.  This is on an embedded
system, with only a small number of phrases that would need translation; I
probably have less than 200 phrases, mostly one and two words.

I have XML templates of the following form:

<item id="myname" value="" index="5" type="text">My Name</item>

The PHP backend reads that line, and creates a form entry for myname, with
the label "My Name".  What I want to do is to replace the english "My
Name" with the appropriate words in the user's language.

I'm thinking of a mechanism similar to the .po files, where the PHP
backend look up the text in a translation file.  Or even something like

<item id="myname" value="" index="5" type="text" text="My Name"/>

and the PHP backend would look up the text value for "My Name" in a lookup
table for the user's language.

(Aside:  I guess I failed to use Google correctly yesterday.... PHP has
support for gettext!  <http://us3.php.net/gettext So that's how I think I
will go...)


Re: i18n - how best to provide multilingual content

Captain Dondo wrote:
Quoted text here. Click to load it

Not that many set it themselves, however, most browsers will set during
installation to the most probable language (based in install-languages
choice or for instance OS locale).

Quoted text here. Click to load it

Check, with a limited amount of frases that would be my choice. A lot
harder to maintain in translating entire pages/documents though.
Rik Wasmus

Re: i18n - how best to provide multilingual content

Andy Dingley wrote:

Quoted text here. Click to load it

Seen such files in the Gnome2 application desktop icons, they only have one
short line in each language, the application description, but those files are
big, think how large files will become if you have 20-30 languages and you
have to replace the big file each time a language is updated or added, it's
easier IMHO to handle files that has only one language and on the server side
script it's easy select the right language file and use a backup if a
translation would be missing.



Re: i18n - how best to provide multilingual content

Quoted text here. Click to load it

If  by XML templates you mean structired contents in in defferent languages,
then all you need is just a presentatinal template in Unicode.
The problem usually arises with non-european languages which probably would
not fit into a european page layout.
As to language selection, you might want to consider an explicit selection
of a language in the menu for the language detected automatically, is not
always what a visitor wants.

Re: i18n - how best to provide multilingual content

Quoted text here. Click to load it

        Like you said, go with gettext().  We just finished a
fairly large app that was to be multilingual.   We used gettext
for the small stuff like "Login here".  In cases where there
were larger blocks of text we would set a variable $defLang
based on the language the user was using, and in the code,


whenever we needed it.  There was a root TEXT_DIR, with sub
dirs for each locale.  File names were the same and it made
it easy for the end users to update each file for each language.

For graphic buttons it was a similar approach:

<img src=<?=BUTTON_DIR."/$defLang/login.gif"?> ......>

        It worked easy for us. Probably the biggest thing we
did for the end user was create a simple PHP page that would
scan all the source files for gettext and put up a tabular
display of each phrase and the translation, if any, in each
of the target languages, and they could enter the translations
right there.  They did not have to deal with the raw message

        Hope this helps.        
John Murtari                              Software Workshop Inc.
jmurtari@following domain 315.635-1968(x-211)  "TheBook.Com" (TM)
http://thebook.com /

Site Timeline