Click here to get back home

WWW::Mechanize doesn't always follow_link(text

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
WWW::Mechanize doesn't always follow_link(text M.O.B. i L. 04-20-2008
Get Chitika Premium
Posted by John Bokma on April 28, 2008, 12:30 am
Please log in for more thread options

> John Bokma wrote:
>>
>>> John Bokma wrote:
>>
>> [..]
>>
>>>> HTML::TreeBuilder, or a module it's using, returns   as a
>>>> single character, it might be that you have to
>>>> use the code instead.
>>>>
>>>> Comment on
>>>> ( , stored as char 225)
>>>>
>>>> So you might want to try: "Edit\xe1Librarians".
>>>>
>>>> Wild guess.
>>>>
>>> Thanks! But it should be \xa0.
>>
>> Yeah, but HTML::TreeBuilder returns it as 225 :-D.
>
> He's after a ' ',

Yes, I am aware of that. And somehow HTML::TreeBuilder or a module it uses
returns   as \xe1.

--
John

http://johnbokma.com/perl/

Posted by szr on April 28, 2008, 1:28 am
Please log in for more thread options
John Bokma wrote:
>
>> John Bokma wrote:
>>>
>>>> John Bokma wrote:
>>>
>>> [..]
>>>
>>>>> HTML::TreeBuilder, or a module it's using, returns   as a
>>>>> single character, it might be that you have to
>>>>> use the code instead.
>>>>>
>>>>> Comment on
>>>>> ( , stored as char 225)
>>>>>
>>>>> So you might want to try: "Edit\xe1Librarians".
>>>>>
>>>>> Wild guess.
>>>>>
>>>> Thanks! But it should be \xa0.
>>>
>>> Yeah, but HTML::TreeBuilder returns it as 225 :-D.
>>
>> He's after a ' ',
>
> Yes, I am aware of that. And somehow HTML::TreeBuilder or a module it
> uses returns   as \xe1.

Yes. The question whether this is a bug in HTML::TreeBuilder or is there
a logical reason for this? DEC 225 doesn't seem to be a space of any
kind in any ascii list I've checked, but I don't doubt I've missed one
somewhere :-)

--
szr



Posted by M.O.B. i L. on April 24, 2008, 1:17 pm
Please log in for more thread options
M.O.B. i L. wrote:
> Thanks! But it should be \xa0. First I tried matching with regular
> expressions and that worked using . (dot) for the unknown character. I
> then found this page about  
> <http://www.w3.org/International/questions/qa-escapes> where it says:
> "An example of an ambiguous character is 00A0: NO-BREAK SPACE. This type
> of space prevents line breaking, but it looks just like any other space
> when used as a character. Using &nbsp; (or &#xA0;) makes it quite clear
> where such spaces appear in the text.".
>
> So this works:
> $agent->follow_link(text => "Edit\xa0Librarians", n => 1);

I add that I have developed these command lines to convert back and forth:
sed -i '/&nbsp;/s/&nbsp;/\xa0/g;/\xa0/s/'\''/"/g' MKBTest.pl
sed -i '/\xa0/s/\xa0/\&nbsp;/g;/&nbsp;/s/"/'\''/g' MKBTest.pl

Similar ThreadsPosted
use WWW::Mechanize; May 11, 2006, 6:28 pm
LWP::UserAgent & Mechanize August 1, 2004, 5:44 am
tricks against WWW::Mechanize April 10, 2005, 6:48 pm
Understanding Mechanize August 19, 2005, 4:23 am
WWW::Mechanize issue November 15, 2005, 7:18 pm
using perl mechanize January 10, 2006, 5:12 pm
selenium with www::mechanize September 12, 2006, 6:52 am
Mechanize location October 8, 2006, 10:17 pm
www::mechanize and forms November 5, 2006, 4:47 pm
WWW::Mechanize question July 5, 2007, 2:37 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap