php5 mssql utf-8 problem

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
hi all

since nearly 2 days i fight with mssql and utf-8 as i need to store and
retrieve arabic characters.

i tried the com approach

i tried as well the odbtp library, odbc and obviously the mssql native
extension.. but i never succeded so far to store utf-8 data in the
mssql server (column type nvarchar). if anybody has an idea, an
approach or thinks he knows what i might have done wrong.. it is all
highly appreciated.

ou yes the content type is specified as follows in the document

<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

cheers ralphie

Re: php5 mssql utf-8 problem

I was fighting with this problem today(i tested romanian deutch and
russian characters).
The main requiremnts are:
1. Your document must be UTF-8
2. Your database table or column must be UTF-8
3. Before calling any queries you should execute the following query

SET NAMES 'utf8'

this will set conection to mysql in utf-8 mode if it wasn't set by the
default MySql config.
4. Just Insert your data in the table and then you will be able to
select it.

Hope this will help.

P.S. I tested all this stuff on PHP 5 and Mysql 4.1 so if your config
is different sorry.

Re: php5 mssql utf-8 problem

thanks alex i will try to do something similar tomorrow.. but am not
really hopeful, because i use microsoft sql server.. and the connection
between php and mssql seems to be realy unfriendly ot utf-8 :-(

Re: php5 mssql utf-8 problem

ralphie wrote:
Quoted text here. Click to load it

At work, we also have to deal with the Arabic script on a daily basis
and we also happen to use MSSQL. I have found is that there's no good
way to make use of the nvarchar, ntext column type. Maybe there's a
secret knob somewhere. I didn't find it in any event. So what we do is
just store the text as UTF-8 in regular varchar and text columns.

Varchar columns are a bit problematic, since you risk chopping a
character in half when truncation happens. Retrieving varchar columns
wider than 255 is also somewhat tricky using the extension (COM on the
other hand, leaks memory). So we usually use a text column when  there
is no natural limit on text length.

MSSQL will sometimes complain when you put a UTF-8 string into a query
because it doesn't think that what you have is valid text. What you
have to do is just CONVERT() or CAST() to force it to accept the

Re: php5 mssql utf-8 problem

thanks chung, i'm really happy to find a mate fighting the same

you suggest using normal varchar or text fields. how to store utf-8 in
them? well i know if i insert it through php it works and it turns out
in some wired characters (at least displayed like that in the
enterprise manager). the nice thing is if i fetch it again via php it
comes back as arabic text. that so far is cool. the problem arises if i
want to import data via the enterprise manager, this one would convert
the arabic signs to ?? only. if i want to copy past the data into it,
it does not accept it at all in normal fields but displays it correctly
in the n-types..  the ohter thing i wonder about is the sorting and
stuff.. if the data are stored in this wired (sorry don't know the name
for what that is, just some symbols) format i wonder about the
collation (sorting, etc) stuff? would it still work fine?

any further advise is greatly appreciated!

Re: php5 mssql utf-8 problem

ralphie wrote:
Quoted text here. Click to load it

Yeah, that's the problem with storing it as UTF-8: the text will show
up as garbage in Enterprise Manager and Query Analyser. Collation will
also be weird. If you set the column to do binary sort, the result
would be somewhat reasonable, as the alef will still come before the
alef maksura encoded in UTF-8. The digits come after the letters
though, which might not be what you'd want. The order would be
incorrect too if you need to use pe or gaf.

The n- columns offer the best solution. I would give ADODB another
shot. COM is problematic in PHP4. Support might have improved in PHP5.
In theory, if you pass CP_UTF8 as codepage to COM(), you'd get UTF-8
text out of the resultsets. I remember that when I tried it though,
nothing happened. This was a couple years ago. Maybe newer versions of
ADODB would work better.

Re: php5 mssql utf-8 problem

after another day of attempts without success (i think i succeded to
write utf-8 into mssql, showed only ??? though but this is probably
because of enterprise manager not beeing able to properly display) i
give it up (never got proper utf-8 out of the db, only straight after
import but that was mostlikely not even utf-8.. whatever i'm tired and
will join your way to use the "strange sign" method and write manually
an import script as it works only if i stay whithin php.. whatever
thanks for walking along with me and your inputs chung.

Re: php5 mssql utf-8 problem

ralphie wrote:
Quoted text here. Click to load it

Yeah, the lack of support is rather disappointing. One of these days
I'm going to sit down and right an improved version of the mssql

Re: php5 mssql utf-8 problem

if you get to this point let me know me at yeah? maybe its
easier to improve the odbtp though as it seems to be a pretty straight
forward thing, but still didnt help me.. well the neatest is definetly
if the native extension works anyway..  or we just need to leave
mssql.. but probably its bound to customers in your case as well..

Re: php5 mssql utf-8 problem

ralphie wrote:
Quoted text here. Click to load it

Well, since things are working the way they're now, I won't start
working on it anytime soon. My thinking is to develop something with
identical API to the current mssql extension, but takes to the server
using OLE-DB instead of the old DB library.

Re: php5 mssql utf-8 problem

i see.. eventhough i'd rather have proper utf8 in the database what?
whatever somethimes you have to cut down your expectation to reality :-)

Re: php5 mssql utf-8 problem

ralphie wrote:
Quoted text here. Click to load it

Ok, after playing around with this a bit more I have somewhat of a
solution. It's a ugly, nasty hack and quite limited. Basically, since
the mssql extension won't return the data as UTF-8, we will ask MSSQL
to return the text as binary, and then we convert it to UTF-8 with

To insert some text, we first convert from UTF-8 to UCS-2, little

$ucs2 = iconv("UTF-8", "UCS-2LE", $utf8);

Then we put it into binary representation:

$a = unpack('H*hex', $s);
$hex = '0x' . $a['hex'];

Now stick that into a INSERT statement. MSSQL will do an implicit
conversion to nvarchar if that's the type of the column:

mssql_query("INSERT INTO table (nvarchar_col) VALUES($hex)");

Retrieval is a bit tricky, as the mssql extension is too dumb to get
varbinary bigger than 255. It handles image columns with no problem
however. So what we do is cast from nvarchar to varbinary, then from
there to image:

mssql_query("SELECT CAST(CAST(nvarchar_col AS varbinary(8000)) AS
image) AS col FROM table");

The data comes out in UCS-2, so we have to convert it to UTF-8:

$utf8 = iconv("UCS-2LE", "UTF-8", $ucs2);

It's not a great solution but seems to work in my tests. Conversion is
only possible between nvarchar and varbinary. You can't cast a ntext
column to image. So you're limited to nvarchar's 4000 maximum size.

Site Timeline