Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
February 25, 2011, 1:27 pm
rate this thread
I'm currently building an online tool designed to help users learn a
foreign language. It's aim is to allow the user to enter their own
vocabulary and they can quiz themselves. This involved entrering their
native word and then the foriend language equivelent. Kind of a
flashcard system but restricted to the words that the user has
previously entered - perhaps specific to the textbook they're
One of the things I want to allow is to support multiple character
sets though (e.g. alphanumeric, japanese, chinese...). I don't want
the system to be specific to a single character set (I'm currently
learning Japanese and want to use the system myself - so want to be
able to support english and japanese charcter sets). when designing my
database table columns I have to specify which character set the text
is though as I don't know any better. I understand why this is but
this is the first time I've ever wanted to have a single text field
support multiple character sets - or at least some representation of
them so I can turn them back to the original format when required.
What's the best way to do this? One idea is some way that I can store
the text in standard abc123... characters but specify in another field
what character set it is. So, when I am inserting my script will
detect which charcter set it is, take a note, encode to a abc123...
representation of it and then do the INSERT. Along side the abc123..
entry in another field I'll state the original character set so when i
need the data I'll decode the abc123... representation to it's
Anyway I've posted this in the PHP forum as with this technique above
it doesn't require me to do anything different to the database but
instead handle the encoding and decoding in the php script .. if
that's the best technique? How might this done in PHP? If anyone has
any suggestions I'd really appreciate it if you could reply.
February 25, 2011, 1:38 pm
Re: Storing multiple character set types (or a representation of em) in a table column
I think you are confusing alphabets with computer character sets. You
need to support many alphabets but I don't think you want to deal with
more than one encoding; it'd be crazy and unnecessary. Just pick a
Unicode encoding (UTF-8 is a popular option but it's not the only one)
and make your life easier.
Not all database engines handle encodings the same way. E.g., in Oracle
you must use the same encoding all around your app.
If you are talking about storing different encodings *in the same
column*, well, it can be done (you just need to store it as binary data)
but you won't be able to use any of the text handling features of your
DB engine. For instance, a search for "foo" will never find "Foo".
I suggest you read the famous "The Absolute Minimum Every Software
Developer Absolutely, Positively Must Know About Unicode and Character
Sets (No Excuses!)" article:
Then give a second thought to your specs.
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://borrame.com
-- Mi web de humor satinado: http://www.demogracia.com
- » Extralight browser-webserver communication via cookies (+)
- — Previous thread in » PHP Scripting Forum