shell_exec does not pass UTF-8 as is

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Hello all,

Let's consider the following call:

$result = shell_exec("$path_to_command $arguments");

The character encoding used for the string inside $result is obviously
the one used by the operating system. I tested this under two different
OSes, namely Windows XP and Debian Linux.

Under WinXP, the e grave character is present in the string as a single
character which value is 0x82 as expected because for command line WinXP
uses Codepage 850

Under Debian I would have expected to receive a UTF-8 codepoint for that
same character as the shell is configured to use UTF-8. However, I get
this kind of string:


Clearly, these are some sort of escape codes that match the expected
UTF-8 values but the binary values are not there. As a result, it's not
possible to convert this string from UTF-8 to any other encoding.

Is there a setting that I should turn on somewhere to have those
sequences automatically decoded?
If not, is there a function that can be called that will convert those
for me?

Thanks in advance

Re: shell_exec does not pass UTF-8 as is

On Thu, 08 Apr 2010 21:42:25 +0200, OBones wrote:
Quoted text here. Click to load it

Nope. Especially not once it's getting into shell_exec(). That'll take
whatever you feed it, basically at a binary level, and it won't know
anything about charsets that you don't explictly tell it.

Quoted text here. Click to load it

That depends on what you want to convert it to. And what it actually IS
that the shell is handing back. And that will very likely vary quite a
bit as long as you're using variant characters. That is, characters that
have different code points from one charset to another. (Some could
argue that that's pretty much *all* characters since you COULD BE
running your PHP interpreter on an EBCDIC machine, but we'll assume for
the sake of discussion that we're dealing with at least ascii.)

Yes, Java is so bulletproofed that to a C programmer it feels like
being in a straightjacket, but it's a really comfy and warm
straightjacket, and the world would be a safer place if everyone was
straightjacketed most of the time.  -- Mark 'Kamikaze' Hughes

Re: shell_exec does not pass UTF-8 as is

El 08/04/2010 21:42, OBones escribió/wrote:
Quoted text here. Click to load it

Strings in PHP contain no info at all about the character encoding they
are using. Perhaps *you* know what charset it is, but PHP doesn't.

How could a third party tool (a command shell) get enough information to
do such conversion? And why would it want to? It's a hard and pointless
job: the command that will be eventually using that piece of data can
expect an infinite variety of charsets (or even binary data) and the
shell has no way to find out. Blindly manipulating user input can only
lead to corrupted data.

Quoted text here. Click to load it

That's how the WindowsXP command line console *displays* the binary
digits you pass. It depends on the font as well and it's even
configurable per session (see help for chcp).

Quoted text here. Click to load it

It's not clear to me what you want to do exactly. If you want to pass
$arguments to $path_to_command, just use whatever charset
$path_to_command expects. If you want to configure your shell to display
strings in a specific charset:

1. In Windows, set an appropriate font and codepage for the command prompt.

2. In bash, use export to set an appropriate LC_* variable.

3. In other shells, well, I don't know :)

-- - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web:
-- Mi web de humor satinado:

Re: shell_exec does not pass UTF-8 as is

Álvaro G. Vicario wrote:

Quoted text here. Click to load it

Well, I had to add a "LANG=fr_FR.UTF-8" in front of the call to the
executable because the shell_exec call apparently starts with an empty
environment. This meant the application called was using "C" as the
locale which is ASCII only.
At least know I know where it comes from.

Site Timeline