|
Posted by jzhang on March 29, 2007, 8:47 pm
Please log in for more thread options
> Hi
>
> I had a script that was able to parse the decoded_content for the
> forms in a html page. However, a recent update to the page broke the
> script ( @forms = HTML::Form->parse($response->decoded_content,
> $response->base);) was unable to find the forms in the web page.
> After much research I found that the decoded_content was empty but the
> call to parse seemed happy with HTML::Form->parse($response->content,
> $response->base); instead.
>
> It looks like the issue may have been caused by the addition of a meta
> tag to the html page:
>
> <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
>
> so far, I have been unable to prove that as the page is generated via
> compiled javascript and is painful to change.
>
> Any idea whether this meta tag would cause an issue with
> decoded_content and whether there might be a work around...
>
> Tim
I've also met such kind of error when processing Chinese web pages.
It seems decoded_content() failed to recognize the charset of your web
page.
You can try $response->decoded_content('default_charset'=>'utf8');
Or you can hack the decoded_content function in HTTP::Message module,
to make the charset detection part more sophisticated.
Zhang Jun
|