Advice needed; php5, utf-8, mb_*

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I am transferring large php application which also uses few third
party php libraries to UTF-8.

And now, of course I have problems with string functions which are not
multi-byte safe, especially in those third party libraries.

My first, optimistic attempt was to automatically override "ordinary"
string functions with its multi-byte versions
 (.htaccess: php_value mbstring.func_overload   7).

 But that didn't work out. For example phpmailer class failed.
Intrestingly enough it SEEMS that it works just fine with "ordinary"
string functions and UTF-8. But those bugs (which can be manifested
when using "ordinary" string functions with multi-bytes characters)
are note easy to catch especially when my primary language uses mostly
1byte characters.

So, for now only thing I can do is go through all third party
libraries and try to figure out which string functions are should work
on bytes and which should work on characters and replace them

For example, when strlen is supposed to return length in bytes I
should leave it as is, but when it's supposed to return number of
characters I should replace it with mb_strlen ... and so on for all
multi-byte unsafe string functions.

The problem is that it is not easy to find out which functions should
be replaced and which not, also I have to repeat the process each time
new version is released.

So, those anybody have any ideas how these problems can be solved more


Re: Advice needed; php5, utf-8, mb_* wrote:
Quoted text here. Click to load it

Perhaps donate some of your work to the opensource projects?  I suspect  
phpmailer would appreciate your efforts, for instance.

Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.

Re: Advice needed; php5, utf-8, mb_*

Quoted text here. Click to load it

So, in your opinion the only solution is to analyze  php code and try
to figure out what was the author intentions? To analyze for example
is strlen supposed to return number of bytes or number of characters?
If it is supposed to return number of bytes  - then leave it as is,
otherwise rename it in mb_strlen?

This is major undertaking for someone not familiar with inner working
of this class. Also I am not sure that phpmailer would really
appreciate this. MB is just an extension and there is possibility that
it is not even installed on all shared hosting servers. Also, for most
westerners iso-8859-1 is good enough, and multi-byte functions are
slower then ordinary functions. PHP6 is around the corner and I am not
really sure how many people would benefit from this. If I decide add
full multi-byte support to phpmailer I will released it, but I can not
promise that it will be in synch with current version of original
phpmailer class.

And again, this is not about phpmailer class. I am interested in other
people opinions and experiences with multi byte string functions and
third party libraries and what is the best thing to do.

Re: Advice needed; php5, utf-8, mb_* wrote:
Quoted text here. Click to load it

No, work with them to come out with a multibyte version of the code.

If you're going to change the code yourself anyway to make it work, why  
not work with the people who developed the product to make it work right  
and make it available to those who need it?

Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.

Re: Advice needed; php5, utf-8, mb_* wrote:

Quoted text here. Click to load it

Setting a php_value by htaccess for apache will only work if php runs as  
module. Youre php is used as a module? For LightHttp Webserver it also  
works for CGI. -v please

Quoted text here. Click to load it

Failed? Didn't work? Nice description for the problem!

  _(_p>   Ulf [Kado] Kadner
  \<_)    Mitglied der Freizeitv√∂gel? ;-)

Re: Advice needed; php5, utf-8, mb_*

Quoted text here. Click to load it

There is no need for detail description of the problem with
I am not asking how to port phpmailer to full UTF-8 php application.
It was just one real-word example.

I am asking about more general advice how to "attack" this problem
with multi-byte strings and third party-libraries. Phpmailer is not
the only class I am using.

Setting php_value  works just fine. When func_overload is on multi
byte versions are used instead of "ordinary" string functions.

(I can describe problem with phpmailer when func_overload is on but
this is not really relevant for this thread.

I am sending both html and txt versions of the message in UTF-8.
Func_overload is on. (so php uses mb_strlen instead of strlen , and so
on .. for all string functions which are multi-byte unsafe).

In this scenario html is send like some sort of attachment but since
func_overload is on strlen does not return number of bytes but number
of characters (which is not equal when using UTF-8) so attachments are
not properly sent. So instead of viewing only html or only txt message
I can see (in my e-mail client) both version (html and txt) one below
the other with some damaged headers.

Also there is possibility that strlen is not the only function which
causes this kind of behavior in phpmailer.

When func_overload is off and "ordinary" string functions are used it
SEEMS that phpmailer works fine but I didn't test enough to be sure. )

Re: Advice needed; php5, utf-8, mb_*

working_boy wrote:

Quoted text here. Click to load it

Personally I'm of the stick-my-head-in-the-sand-and-hope-PHP-6-fixes-this
school of thought.

Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 54 days, 19:46.]

                  Fake Steve is Dead; Long Live Fake Bob!

Site Timeline