problem with mb_detect_encoding

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I am trying to determine if data entered in a $_POST variable in a form  
contains all ASCII (0 - 127) characters or not.  To do this I am using  
mb_detect_encoding().  I am running into problems with non-English  
characters, however - for example, I translated the word 'test' into  
Russian and got 'испытание'.  If I feed this into the function as:

$_POST['var'] = 'испытание'; // from form
echo mb_detect_encoding($_POST['var']);

it returns ASCII.  After thinking about it and running some tests I  
figured out that it is doing this because PHP is feeding  
mb_detect_encoding the string after it is converted to its html  
representation, i.e. instead of 'испытание' mb_detect_encoding() is  
Obviously all of these characters are ASCII, and as far as I can tell  
this is what's happening.

Is there a way that I can tell if data entered is ASCII or not BEFORE it  
is converted?  With the example above, I would want this test to fail  
(not return ASCII).  Thanks in advance.

Re: problem with mb_detect_encoding

Marcus wrote:
Quoted text here. Click to load it
  If I feed this into the function as:
Quoted text here. Click to load it
'; // from form
Quoted text here. Click to load it
ие' mb_detect_encoding() is
Quoted text here. Click to load it

The form data is converted into html entities on the client side before
php receives the data, convert the html entities back into a string
using html_entity_decode()

Even then mb_detect_encodings() might not work, the user notes in the
php manual aren't encouraging anyway. Someone gave a regular expression
for detecting utf-8 that can be adapted

I think preg_match( '/[^\x09\x0A\x0D\x20-\x7E]/xs',
html_entity_decode($_POST['var']) ) will work

Tim Hunt

Site Timeline