|
Posted by +mrcakey on July 9, 2008, 7:03 am
Please log in for more thread options
I've built a MySQL database for a client and a web interface to be able to
add/edit/delete records in it. When he's adding stuff to the database he's
copying text from MS Word. I've tried various substitutions that I've found
hanging around the internet, but nothing's working for the "long dash" that
it insists on converting normal hyphens to.
This morning I did a bin2hex to see exactly what was being sent from $_POST:
A - long dash -.
41 20 >>>e2 80 93<<< 20 6c 6f 6e 67 20 64 61 73 68 20 2d 2e 20 20
The offending character is the one I've highlighted. As far as I can tell,
it should be getting found by this -
"\xe2\x80\x93", // long dash
but it isn't, which makes me think there's something wrong with the code
I've copied. How to find the hex string? I've tried "\xe2\x80\x93" and
"\xe2x80x93" in addition, but to no avail.
Is driving me scatty!!!
Any help much appreciated.
$search = array( chr(145),
chr(146),
chr(147),
chr(148),
chr(151),
chr(196),
'?o', // left side double smart quote
'?', // right side double smart quote
'?~', // left side single smart quote
'?T', // right side single smart quote
'?', // elipsis
'?"', // em dash
'?"', // en dash
"\xe2\x80\xa6", // ellipsis
"\xe2\x80\x93", // long dash
"\xe2\x80\x94", // long dash
"\xe2\x80\x9c", // double quote opening
"\xe2\x80\x9d", // double quote closing
"\xe2\x80\xa2" // dot used for bullet points
);
$replace = array( "'",
"'",
'"',
'"',
'-',
'-',
'"',
'"',
"'",
"'",
"…",
"-",
"-",
'…',
'-',
'-',
'"',
'"',
'*'
);
ECHO '<p>'.BIN2HEX( $_POST['short_desc'] ).'</p>';
$short_desc = STR_REPLACE($search, $replace, $_POST['short_desc']);
+mrcakey
|
|
Posted by I V on July 10, 2008, 7:25 pm
Please log in for more thread options
On Wed, 09 Jul 2008 12:03:57 +0100, +mrcakey wrote:
> The offending character is the one I've highlighted. As far as I can
> tell, it should be getting found by this -
>
> "\xe2\x80\x93", // long dash
You want to use one backslash here, not two. But, rather than specifying
the search-and-replace yourself, it's probably easier to use
htmlentities. You need to know what encoding your data has been sent in
(it looks, from your post, like you're receiving UTF-8), and do, like so:
$short_desc = htmlentities($_POST['short_desc'], ENT_COMPAT, 'UTF-8');
|
| Similar Threads | Posted | | Treating .html pages as PHP | October 6, 2006, 3:40 pm |
| how to write in a text file before a given word | August 31, 2004, 7:01 pm |
| insert text, ms word document | March 22, 2005, 3:57 am |
| Text with images from Word to RIchText | October 10, 2005, 1:37 pm |
| Full text search in PDF and Word files ? | September 19, 2005, 4:11 pm |
| array creation as reference, or always copied? | February 16, 2005, 12:52 am |
| what is the preg for capitals in a word to be replaced by that word preceded by a space | January 10, 2007, 11:40 am |
| Regular expression: non-latin word/non-word characters and UTF-8 | September 22, 2005, 1:34 pm |
| textarea fields --> export to ms word --> word is stretch | March 2, 2007, 12:20 pm |
| SOAP envelope change from Content-Type: text/xml to text/html | April 4, 2006, 11:34 am |
|