Treating text copied from MS Word

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I've built a MySQL database for a client and a web interface to be able to
add/edit/delete records in it.  When he's adding stuff to the database he's
copying text from MS Word.  I've tried various substitutions that I've found
hanging around the internet, but nothing's working for the "long dash" that
it insists on converting normal hyphens to.

This morning I did a bin2hex to see exactly what was being sent from $_POST:

A - long dash -.

41 20 >>>e2 80 93<<< 20 6c 6f 6e 67 20 64 61 73 68 20 2d 2e 20 20

The offending character is the one I've highlighted.  As far as I can tell,
it should be getting found by this -

"\xe2\x80\x93", // long dash

but it isn't, which makes me think there's something wrong with the code
I've copied.  How to find the hex string?  I've tried "\xe2\x80\x93" and
"\xe2x80x93" in addition, but to no avail.

Is driving me scatty!!!

Any help much appreciated.

$search = array( chr(145),
'?o', // left side double smart quote
'?', // right side double smart quote
'?~', // left side single smart quote
'?T', // right side single smart quote
'?', // elipsis
'?"', // em dash
'?"', // en dash
"\xe2\x80\xa6", // ellipsis
"\xe2\x80\x93", // long dash
"\xe2\x80\x94", // long dash
"\xe2\x80\x9c", // double quote opening
"\xe2\x80\x9d", // double quote closing
"\xe2\x80\xa2" // dot used for bullet points
$replace = array( "'",
ECHO '<p>'.BIN2HEX( $_POST['short_desc'] ).'</p>';
$short_desc = STR_REPLACE($search, $replace, $_POST['short_desc']);


Re: Treating text copied from MS Word

Quoted text here. Click to load it

Not really a PHP question - configure your webserver to use a 7 bit


Re: Treating text copied from MS Word

On Wed, 09 Jul 2008 12:03:57 +0100, +mrcakey wrote:
Quoted text here. Click to load it

You want to use one backslash here, not two. But, rather than specifying
the search-and-replace yourself, it's probably easier to use
htmlentities. You need to know what encoding your data has been sent in
(it looks, from your post, like you're receiving UTF-8), and do, like so:

$short_desc = htmlentities($_POST['short_desc'], ENT_COMPAT, 'UTF-8');

Site Timeline