Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
June 13, 2006, 4:33 pm
rate this thread
This post is essentially a reply a previous post/thread
here on this mailing.database.myodbc group titled:
MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
[This version has a couple subtle edits from the orginial I posted
on mailing.database.myodbc - I'm cross posting here on this
topic/subject related newsgroup]
I was wondering if anybody has experienced the same issues
challenges I'm experiencing I'll describe shortly. Once
resolved some fascinating and powerful multi-lingual
apps incorporating non-English/latin character sets can be
realized by many developers.
I have a Unicode utf8 English - Arabic - Hebrew - Greek (and
several other languages) database in Microsoft Excel. I KNOW
that it is Unicode utf8 data because MySQL tells me it
recognizes the encoding as such but not in the context I want.
Allow me to explain ...
I can search the Unicode utf8 encoding with no problem in
Excel. While in Excel I highlight a complete word or a
partial string of an Arabic word copy it to the clipboard
(i.e. memory). I then do a find and the process is the
same successful result as if it was an English string.
MySQL 5.0 is supposed to handle Unicode utf8
I created a MySQL database I named: languages
CREATE DATABASE languages ;
and I implemented the following command on a MySQL
ALTER DATABASE languages DEFAULT CHARACTER SET utf8;
No problem (so far) MySQL seemingly recognized utf8 and
accepted it. My understanding is with the ALTER command
the tables I create against languages will be utf8.
I now created a table I named mainlang which denotes it
will be the main table for my languages.
mysql>CREATE TABLE mainlang
->primary key (langNumID, colB)
Again so far no problem: Table successfully created.
My third column 'colC' is where the Unicode data
will be stored.
I now attempt to import the database from my
Excel file into my MySQL database as follows:
mysql>load data infile 'c:\arabicdictionary.csv'
->into table mainlang
->fields terminated by ','
->lines terminated by '\n'
->(langNumID, colB, colC);
ERROR 1406 (22001): Data too long for 'colC' at row 1
So what to do? I did a search and found other
people seemingly had the same problem and someone
ALTER DATABASE languages DEFAULT CHARACTER SET cp1250;
I dropped mainlang, recreated it, redid the load and
Lo and behold ... it seemed to work. No Data too long
error occurred and when I did the following query:
mysql>select langNumID, colB, colC
->where colB = '4994';
I see colA have a correct numeric value, colB a
correct numeric value (4994) and for colC a string of
unintelligible characters with diacritical marks,
oomlats etc. which I know is the cp1250 encoding
interpretation of the Unicode utf8 data which is
similarly unintelligible in its own regard.
Now what I try is: do a copy of the obscure colC
cp1250 character string into the clipboard/memory
and then do the following tweak on the original
select statement to see if I can search on the
(now) cp1250 character string:
mysql>select langNumID, colB, colC
->where colc = 'paste of the cp1250 character string';
The computer would not allow a paste unless I pressed
the escape key. On initiating this select command
I got an empty set (no match)
My questions are:
Has anyone been successful creating a Unicode utf8
MySQL database that accepts Arabic?
If yes, how did you get around or not encounter the
Data too long issue?
Have you tried the cp1250 (or cp1251 - same mechanics
same results) work around as I have? Are you
able to search the cp1250 character string (my colC)?
If yes, how did you successfully manage to do it?
Lastly, if I take the cp1250 encoded string and paste
it into Excel ... I can string search the cp1250
encoding with no problem.
Also, here's how I know my Unicode utf-8 data is
correct apart from my own manual cross-referencing
and being recognized by MySQL in some respect:
When I copy the Unicode utf8 encoding and try to
paste it into the select command to see what would
happen I get the following error:
ERROR 1257 (HY000): Illegal mix of collations
(cp1250_general_ci, IMPLICIT) and
(utf8_general_ci, COERCIBLE) for operation '='
So what I have here is a situation where MySQL
is recognizing Unicode utf8 encoding but not
from the respect of packing a table!
Go Figure ...
Anyone wishing to share any insight or solution would
be GREATLY appeciated. I promise if I find a solution
I will share it.
Thank you Very Much, Shukran Jiddan, Todah Rabah,
Muchos Gracias ...
jrs_14618 at yahoo.com
Re: MySQL 5.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
No idea, Joel. Why don't you try asking in a mysql database newsgroup - such as
comp.databases.mysql. This newsgroup is for PHP programming.
Remove the "x" from my email address
JDS Computer Training Corp.
- » Recommend an AJAX autocomplete component. (cost $50)
- — Previous thread in » PHP Scripting Forum