Click here to get back home

better way to parse html

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
better way to parse html Fred 09-10-2004
Posted by Fred on September 10, 2004, 9:49 pm
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by Gunnar Hjalmarsson on September 11, 2004, 5:37 am
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by Michael Slass on September 11, 2004, 8:19 am
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Similar ThreadsPosted
parse email coming from outlook in html November 17, 2005, 4:44 pm
Way to Parse HTML fields to Excel Spreadsheet January 14, 2008, 4:59 pm
Use of uninitialized value in subroutine entry at C:/Perl/site/lib/HTML/Parse.pmline 127. November 27, 2004, 6:24 pm
Hash references & parsing HTML with HTML::Parser February 16, 2005, 8:10 pm
Best way to remove body/html tag from HTML::Element tree September 6, 2006, 6:38 pm
Return HTML between tags with HTML::TokeParser ? February 22, 2005, 5:44 pm
Parsing HTML - using HTML::TreeBuilder October 5, 2006, 2:05 pm
How to parse .... May 4, 2005, 11:48 am
How to parse .... May 5, 2005, 2:36 am
csv parse bug... July 21, 2005, 4:46 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap