Click here to get back home

WWW::Mechanize doesn't always follow_link(text

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
WWW::Mechanize doesn't always follow_link(text M.O.B. i L. 04-20-2008
Get Chitika Premium
Posted by RedGrittyBrick on April 26, 2008, 2:15 pm
Please log in for more thread options
szr wrote:
>
> He's after a ' ', which us a non-breaking space, which is ASCII
> 0xA0 hex or 160 dec. ' ' can even be re-written as ' ' .
>

s/ASCII/Unicode/

--
RGB

Posted by szr on April 26, 2008, 2:59 pm
Please log in for more thread options
RedGrittyBrick wrote:
> szr wrote:
>>
>> He's after a ' ', which us a non-breaking space, which is ASCII
>> 0xA0 hex or 160 dec. ' ' can even be re-written as ' ' .
>>
>
> s/ASCII/Unicode/

No, it's ASCII. Extended Ascii to be precise.

My ascii chart (an old printed out list I have) lists DEC 225 as
"Lowercase 'a' with acute accent" and DEC 160 as being reserved or a
blank (which is used as a non breaking space.)

These links show the same:
http://www.ascii-code.com/
http://www.idevelopment.info/data/Programming/ascii_table/PROGRAMMING_ascii_table.shtml


--
szr



Posted by Martijn Lievaart on April 26, 2008, 4:00 pm
Please log in for more thread options
On Sat, 26 Apr 2008 11:59:10 -0700, szr wrote:

> RedGrittyBrick wrote:
>> szr wrote:
>>>
>>> He's after a ' ', which us a non-breaking space, which is ASCII
>>> 0xA0 hex or 160 dec. ' ' can even be re-written as ' ' .
>>>
>>>
>> s/ASCII/Unicode/
>
> No, it's ASCII. Extended Ascii to be precise.

Extended ASCII is a general name for several incompatible extensions to
ASCII. They are NOT ASCII.

But The above IS Unicode. Which is in itself also an extension of ASCII,
BTW.

>
> My ascii chart (an old printed out list I have) lists DEC 225 as
> "Lowercase 'a' with acute accent" and DEC 160 as being reserved or a
> blank (which is used as a non breaking space.)
>
> These links show the same:
> http://www.ascii-code.com/

The "extended" ASCII shown here is the Windows extension, which in itself
is an extension of ISO-Latin-1 which is an extension of ASCII. The site
notes this, and is in itseld correct. And it does not support your idea
of extended ASCII.

> http://www.idevelopment.info/data/Programming/ascii_table/
PROGRAMMING_ascii_table.shtml

This site is plain wrong. Don't believe everything on tha Intuhnet.

M4

Posted by szr on April 26, 2008, 4:43 pm
Please log in for more thread options
Martijn Lievaart wrote:
> On Sat, 26 Apr 2008 11:59:10 -0700, szr wrote:
>
>> RedGrittyBrick wrote:
>>> szr wrote:
>>>>
>>>> He's after a ' ', which us a non-breaking space, which is
>>>> ASCII 0xA0 hex or 160 dec. ' ' can even be re-written as
>>>> ' ' .
>>>>
>>>>
>>> s/ASCII/Unicode/
>>
>> No, it's ASCII. Extended Ascii to be precise.
>
> Extended ASCII is a general name for several incompatible extensions
> to ASCII. They are NOT ASCII.
>
> But The above IS Unicode. Which is in itself also an extension of
> ASCII, BTW.


The old printed out list I have doesn't make this distinction, but you
are right the Unicode is -an- extension.

>> My ascii chart (an old printed out list I have) lists DEC 225 as
>> "Lowercase 'a' with acute accent" and DEC 160 as being reserved or a
>> blank (which is used as a non breaking space.)
>>
>> These links show the same:
>> http://www.ascii-code.com/
>
> The "extended" ASCII shown here is the Windows extension, which in
> itself is an extension of ISO-Latin-1 which is an extension of ASCII.
> The site notes this, and is in itseld correct. And it does not
> support your idea of extended ASCII.

I got the same output on my Linux system in it's xterm launched from KDE
as I did in Secure CRT in windows, which matches up to outpout used in
windows.

This extended ASCII set I'm refering to is what HTML (such as   aka
 ) is based on, or perhaps more precisely based on ISO-Latin-1.

>> http://www.idevelopment.info/data/Programming/ascii_table/
>> PROGRAMMING_ascii_table.shtml
>
> This site is plain wrong.

In what way? It's the same list in my O'Reilly HTML Pocket Reference, as
is the previous link.

> Don't believe everything on tha Intuhnet.

I don't, but ut matches up with what things like HTML go by (again,
ISO-Latin-1 unless otherwise specified in the HEAD, META tags in the
case of HTML.)

--
szr



Posted by Martijn Lievaart on April 26, 2008, 5:26 pm
Please log in for more thread options
On Sat, 26 Apr 2008 13:43:16 -0700, szr wrote:

>>> http://www.idevelopment.info/data/Programming/ascii_table/
>>> PROGRAMMING_ascii_table.shtml
>>
>> This site is plain wrong.
>
> In what way? It's the same list in my O'Reilly HTML Pocket Reference, as
> is the previous link.

Welcome to the wonderful world of character sets. Or how to loose your
sanity in a day. Read http://en.wikipedia.org/wiki/
Western_Latin_character_sets_%28computing%29 as a good introduction.

It is wrong because it says that the table is "extended ASCII". There is
no such thing as. There's ISO-Latin-1, 2, 3, etc, the Windows character
set, the Macintosh character set, the IBM extended ASCII set, etc. And
those are actually used today (except possibly the Mac set, did they
switch?), there are many, many more that are not frequently used today.

In fact, that table seems to show the Windows character set
(Windows-1252). A character set which is actually used very little,
Windows NT and derivatives use UCS16 by preference and the Internet uses
mainly ISO-Latin-1 or UCS32, although ISO-Latin-15 is used too (it
contains the Euro sign, which ISO-Latin-1 does not).

My workstation uses ISO-Latin-15. In Windows I can enter characters by by
holding down alt and typing their IBM Extended ASCII code on the numeric
keypad. So even saying ISO-Latin-1 is by default "the extended character
set" doesn't hold water, although it probably is the widest used chacter
set besides UCS16 and UCS32.

Extended ASCII is a concept, a character set that uses the ASCII codes
for the first 127 characters. There are many extended ASCII sets. Calling
one THE extended ASCII set is just plain wrong. And calling the Windows
character set THE extended ASCII set is just ludicrous.

That is why the world is switching to Unicode. One characterset to rule
them all. But even with Unicode, which one? :-)

M4
-- I believe in standards. Everyone should have one. --

Similar ThreadsPosted
use WWW::Mechanize; May 11, 2006, 6:28 pm
LWP::UserAgent & Mechanize August 1, 2004, 5:44 am
tricks against WWW::Mechanize April 10, 2005, 6:48 pm
Understanding Mechanize August 19, 2005, 4:23 am
WWW::Mechanize issue November 15, 2005, 7:18 pm
using perl mechanize January 10, 2006, 5:12 pm
selenium with www::mechanize September 12, 2006, 6:52 am
Mechanize location October 8, 2006, 10:17 pm
www::mechanize and forms November 5, 2006, 4:47 pm
WWW::Mechanize question July 5, 2007, 2:37 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap