Storing multiple character set types (or a representation of em) in a table column

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I'm currently building an online tool designed to help users learn a
foreign language. It's aim is to allow the user to enter their own
vocabulary and they can quiz themselves. This involved entrering their
native word and then the foriend language equivelent. Kind of a
flashcard system but restricted to the words that the user has
previously entered - perhaps specific to the textbook they're
currently learning.

One of the things I want to allow is to support multiple character
sets though (e.g. alphanumeric, japanese, chinese...). I don't want
the system to be specific to a single character set (I'm currently
learning Japanese and want to use the system myself - so want to be
able to support english and japanese charcter sets). when designing my
database table columns I have to specify which character set the text
is though as I don't know any better. I understand why this is but
this is the first time I've ever wanted to have a single text field
support multiple character sets - or at least some representation of
them so I can turn them back to the original format when required.

What's the best way to do this? One idea is some way that I can store
the text in standard abc123... characters but specify in another field
what character set it is. So, when I am inserting my script will
detect which charcter set it is, take a note, encode to a abc123...
representation of it and then do the INSERT. Along side the abc123..
entry in another field I'll state the original character set so when i
need the data I'll decode the abc123... representation to it's
original form.

Anyway I've posted this in the PHP forum as with this technique above
it doesn't require me to do anything different to the database but
instead handle the encoding and decoding in the php script .. if
that's the best technique? How might this done in PHP? If anyone has
any suggestions I'd really appreciate it if you could reply.


Re: Storing multiple character set types (or a representation of em) in a table column

El 25/02/2011 14:27, bizt escribió/wrote:
Quoted text here. Click to load it

I think you are confusing alphabets with computer character sets. You
need to support many alphabets but I don't think you want to deal with
more than one encoding; it'd be crazy and unnecessary. Just pick a
Unicode encoding (UTF-8 is a popular option but it's not the only one)
and make your life easier.

Quoted text here. Click to load it

Not all database engines handle encodings the same way. E.g., in Oracle
you must use the same encoding all around your app.

Quoted text here. Click to load it

If you are talking about storing different encodings *in the same
column*, well, it can be done (you just need to store it as binary data)
but you won't be able to use any of the text handling features of your
DB engine. For instance, a search for "foo" will never find "Foo".

Quoted text here. Click to load it

I suggest you read the famous "The Absolute Minimum Every Software
Developer Absolutely, Positively Must Know About Unicode and Character
Sets (No Excuses!)" article:

Then give a second thought to your specs.

-- - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web:
-- Mi web de humor satinado:

Re: Storing multiple character set types (or a representation of em) in a table column

Yeh I'm not really sure what's the best means and yeh I need to read
up a bit more on this kinda thing. Thanks for pointing me in the right
direction, I'll certainly check that link out. Cheers

Site Timeline