white spaces in uploaded html file

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hi all,
on one of my sites I want to give the user the possibility to upload a
html file where I want to extract all that is within the <body>-tags.
The upload works fine:

  <form id="uploadform" action="index.php" method="post"
    <input type="file" name="Datei" size="30"/>
    <input type="submit"/>

Then I want to parse the uploaded file with:

if (isset($_FILES['Datei']) and !$_FILES['Datei']['error']) {
$buffer = file_get_Contents($_FILES['Datei']['tmp_name']);
echo "body: ".$buffer."\n";

I get a weird result:
body: ÿþ< h t m l > < h e a d > < t i t l e < / t i t l e > .....
So there seem to be some white spaces between every character.

And then there is no way to find the <body>-tag.
echo "sub:  ".strpos($buffer, "< b o d y")."\n";
echo "sub:  ".strpos($buffer, "<body")."\n";
works. Both show no result.

Can anybody explain me this? How can I parse the file to extract
everything which is within the <body>-Tags (possibly without the white

Thanks a lot,

Re: white spaces in uploaded html file

On Sun, 23 Jul 2006 01:16:49 +0200, Matthias Langbein

Quoted text here. Click to load it

 They're not spaces; that's UTF-16 encoded (with a leading BOM character).

 What encoding is the original page in? What was the file you uploaded edited
with? You may also want to look at the accept-charset attribute of the <form>

Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool

Site Timeline