Cleaning MS Word input - last resort!!

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Dear all,

I have a problem with a form, and I have tried various permutations of  
htmlentities() and html_entity_decode() to resolve, but without success.

Here is the workflow.

1: User pastes MS Word formatted text into form field.
2: Server uses mail() to send input text to mail client.
3: Recipient pastes text into html file.

The problem is that MS Word contains peculiar characters for things like  
bullets, which come out as tabs, which then come out as different, but  
spurious, html characters in the html translation.

Does anyone know of a function(s) that can clean up MS Word input into  
something that can be simply pasted as plain text without spurious  


Re: Cleaning MS Word input - last resort!!

Il se trouve que turnitup a formulé :
Quoted text here. Click to load it

From a comment on the PHP documentation for the  utf8_decode() function

peter dot mescalchin at geemail dot com
27-Dec-2005 06:43

Adding to below I have a few more MS word characters that need
replacing. Found this was required when "fixing" some phpmyadmin export
scripts from a live server where MS word characters were all through  
content - before importing them back into my local mySQL database.

The code I wrote for this process also does a strpos for any extra
"\xe2\x80" strings - which are the tell-tale sign of any funny
characters I want removed.

Here are my updated arrays()

$badchr = array(
    "\xe2\x80\xa6",        // ellipsis
    "\xe2\x80\x93",        // long dash
    "\xe2\x80\x94",        // long dash
    "\xe2\x80\x98",        // single quote opening
    "\xe2\x80\x99",        // single quote closing
    "\xe2\x80\x9c",        // double quote opening
    "\xe2\x80\x9d",        // double quote closing
    "\xe2\x80\xa2"        // dot used for bullet points

$goodchr = array(

Julien CROUZET - DSI Theoconcept
julien.crouzet@/enlever ca\

Re: Cleaning MS Word input - last resort!!

Julien CROUZET wrote:
Quoted text here. Click to load it


Re: Cleaning MS Word input - last resort!!

turnitup wrote:
Quoted text here. Click to load it

tidy perhaps?

Justin Koivisto, ZCE -

Site Timeline