iconv and charset conversion troubles

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!


I am trying to aggregate content on a website into a database, and
getting severe encoding troubles with the mdash character (—
U+2014) as well as a bullet point (•, U+2022) and probably other
special characters too.

The remote website declares its charset as ISO-8859-1, and when viewing
it as such in the browser, I can see the — and • characters
just fine. When looking at aggregated content (HTTP via fsock_open) on
my own website, which declares UTF-8, of course the characters do not
display correctly.

Leaving aside the database for later, first I wanted to convert the
string such that it would display properly on my UTF-8 website. I
assumed this would be done with

$data = iconv('ISO-8859-1', 'UTF-8', $data);

However, the converted content will not display properly either, so it's
clear I need some more advice.

To avoid ambiguity or encoding troubles, I am showing all the characters
in base 64 encoding.

The character that the remote website sends is "lw==" in base 64.

Converted with the above iconv() command, it becomes "wpc=".

When I copy-paste the rendered character into a PHP script and encode
that, it becomes "4oCU". Not sure which encoding that is.

How should I approach this problem? Thanks

Christoph Burschka

Site Timeline