MySQL 5.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hello All,

This post is essentially a reply a previous post/thread
here on this mailing.database.myodbc group titled:

MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode

[This version has a couple subtle edits from the orginial I posted
on mailing.database.myodbc - I'm cross posting here on this
topic/subject related newsgroup]

I was wondering if anybody has experienced the same issues
challenges I'm experiencing I'll describe shortly.  Once
resolved some fascinating and powerful multi-lingual
apps incorporating non-English/latin character sets can be
realized by many developers.

I have a Unicode utf8 English - Arabic - Hebrew - Greek (and
several other languages) database in Microsoft Excel.  I KNOW
that it is Unicode  utf8 data because  MySQL tells me it
recognizes the encoding as such but not in the context I want.

Allow me to explain ...

I can search the Unicode utf8 encoding with no problem in
Excel.  While in Excel I highlight a complete word or a
partial string of an Arabic word copy it to the clipboard
(i.e. memory).  I then do a find and the process is the
same successful result as if it was an English string.

MySQL 5.0 is supposed to handle Unicode utf8

I created a MySQL database I named: languages


and I implemented the following command on a MySQL
command prompt:


No problem (so far) MySQL seemingly recognized utf8 and
accepted it.  My understanding is with the ALTER command
the tables I create against languages will be utf8.

I now created a table I named mainlang which denotes it
will be the main table for my languages.

mysql>CREATE TABLE mainlang
->langNumID varchar(30),
->colB varchar(30),
->colC varchar(30),
->primary key (langNumID, colB)

Again so far no problem: Table successfully created.
My third column 'colC' is where the Unicode data
will be stored.

I now attempt to import the database from my
Excel file into my MySQL database as follows:

mysql>load data infile 'c:\arabicdictionary.csv'
->into table mainlang
->fields terminated by ','
->lines terminated by '\n'
->(langNumID, colB, colC);
ERROR 1406 (22001): Data too long for 'colC' at row 1

So what to do?  I did a search and found other
people seemingly had the same problem and someone


I dropped mainlang, recreated it, redid the load and
Lo and behold ... it seemed to work.  No Data too long
error occurred and when I did the following query:

mysql>select langNumID, colB, colC
->from mainlang
->where colB = '4994';

I see colA have a correct numeric value, colB a
correct numeric value (4994) and for colC a string of
unintelligible characters with diacritical marks,
oomlats etc. which I know is the cp1250 encoding
interpretation of the Unicode utf8 data which is
similarly unintelligible in its own regard.

Now what I try is: do a copy of the obscure colC
cp1250 character string into the clipboard/memory
and then do the following tweak on the original
select statement to see if I can search on the
(now) cp1250 character string:

mysql>select langNumID, colB, colC
->from mainlang
->where colc = 'paste of the cp1250 character string';

The computer would not allow a paste unless I pressed
the escape key.  On initiating this select command
I got an empty set (no match)

My questions are:

Has anyone been successful creating a Unicode utf8
MySQL database that accepts Arabic?

If yes, how did you get around or not encounter the
Data too long issue?

Have you tried the cp1250 (or cp1251 - same mechanics
same results) work around as I have?  Are you
able to search the cp1250 character string (my colC)?
If yes, how did you successfully manage to do it?

Lastly, if I take the cp1250 encoded string and paste
it into Excel ... I can string search the cp1250
encoding with no problem.

Also, here's how I know my Unicode utf-8 data is
correct apart from my own manual cross-referencing
and being recognized by MySQL in some respect:

When I copy the Unicode utf8 encoding and try to
paste it into the select command to see what would
happen I get the following error:

ERROR 1257 (HY000): Illegal mix of collations
(cp1250_general_ci, IMPLICIT) and
(utf8_general_ci, COERCIBLE) for operation '='

So what I have here is a situation where MySQL
is recognizing Unicode utf8 encoding but not
from the respect of packing a table!

Go Figure ...

Anyone wishing to share any insight or solution would
be GREATLY appeciated.  I promise if I find a solution
I will share it.

Thank you Very Much, Shukran Jiddan, Todah Rabah,
Muchos Gracias ...

Joel S
(585) 255-0997
jrs_14618 at

Re: MySQL 5.0, FULL-TEXT Indexing and Search Arabic Data, Unicode wrote:
Quoted text here. Click to load it

No idea, Joel.  Why don't you try asking in a mysql database newsgroup - such as  
comp.databases.mysql.  This newsgroup is for PHP programming.

Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.

Site Timeline