Do you have a question? Post it now! No Registration Necessary. Now with pictures!
- Posted on
- Cleaning MS Word input - last resort!!
February 21, 2006, 8:02 pm
rate this thread
I have a problem with a form, and I have tried various permutations of
htmlentities() and html_entity_decode() to resolve, but without success.
Here is the workflow.
1: User pastes MS Word formatted text into form field.
2: Server uses mail() to send input text to mail client.
3: Recipient pastes text into html file.
The problem is that MS Word contains peculiar characters for things like
bullets, which come out as tabs, which then come out as different, but
spurious, html characters in the html translation.
Does anyone know of a function(s) that can clean up MS Word input into
something that can be simply pasted as plain text without spurious
Re: Cleaning MS Word input - last resort!!
From a comment on the PHP documentation for the utf8_decode() function
peter dot mescalchin at geemail dot com
Adding to below I have a few more MS word characters that need
replacing. Found this was required when "fixing" some phpmyadmin export
scripts from a live server where MS word characters were all through
content - before importing them back into my local mySQL database.
The code I wrote for this process also does a strpos for any extra
"\xe2\x80" strings - which are the tell-tale sign of any funny
characters I want removed.
Here are my updated arrays()
$badchr = array(
"\xe2\x80\xa6", // ellipsis
"\xe2\x80\x93", // long dash
"\xe2\x80\x94", // long dash
"\xe2\x80\x98", // single quote opening
"\xe2\x80\x99", // single quote closing
"\xe2\x80\x9c", // double quote opening
"\xe2\x80\x9d", // double quote closing
"\xe2\x80\xa2" // dot used for bullet points
$goodchr = array(
Julien CROUZET - DSI Theoconcept