Click here to get back home

Need a module for grabbing text from a web page

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Need a module for grabbing text from a web page emelio garcia 02-18-2005
Posted by emelio garcia on February 18, 2005, 11:42 pm
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by YYusenet on February 18, 2005, 5:49 pm
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by A. Sinan Unur on February 19, 2005, 12:06 am
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by Gregory Toomey on February 19, 2005, 2:08 pm
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Posted by Joe Smith on February 21, 2005, 12:29 am
Please log in for more thread options
>>
>> [snip]
>>
>> > if I print "$1\n",
>> > the file prints just fine. But, if I do something like print "$1 after
>> > \n", the whole output is messed up. If I print "before $1\n", nothing
>> > prints at all. If I print "before $1 after\n", only after prints.
>>
>> not really sure, but could be a rogue "\r" in $1,


> There
> is a rogue carriage return (0xd) in the string

> Is there something I can do to deal with this
> situation?


Repair the corrupted file:

perl -p -i -e 'tr/\r//d' bad_file


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Similar ThreadsPosted
Trouble with parsing text file and grabbing values needed July 21, 2006, 1:18 pm
extracting text content from web page September 28, 2005, 1:06 pm
Need a mod that will navigate web links and dl the text on a page February 18, 2006, 8:19 pm
Retrieving only the text portion of a web page May 8, 2007, 10:18 pm
Need Perl module to get tag of a web page</a></td><td class="normal"> March 17, 2008, 4:32 pm</td></tr> <tr><td class="bold"><a href="/forums/Using-Perl-to-align-text-in-an-HTML-page-article12305--6.htm" >Using Perl to align text in an HTML page</a></td><td class="normal"> December 6, 2004, 11:45 am</td></tr> <tr><td class="bold"><a href="/forums/Is-there-a-module-that-grabs-a-remote-page-and-prints-thumbn-article54828--6.htm" >Is there a module that grabs a remote page and prints thumbnail image?</a></td><td class="normal"> May 26, 2006, 12:13 am</td></tr> <tr><td class="bold"><a href="/forums/Grabbing-a-PDF-file-from-the-web-how-article10191--6.htm" >Grabbing a PDF file from the web...how?</a></td><td class="normal"> November 8, 2004, 3:40 pm</td></tr> <tr><td class="bold"><a href="/forums/justify-text-and-PDF-API2-module-article18653--6.htm" >justify text and PDF::API2 module</a></td><td class="normal"> February 17, 2005, 2:26 am</td></tr> <tr><td class="bold"><a href="/forums/Module-for-text-analysis-and-comparison-article94530--6.htm" >Module for text analysis and comparison?</a></td><td class="normal"> March 23, 2008, 2:21 am</td></tr> </table> <!-- google_ad_section_end --> </tr> </table> </td> <TD valign="top"> </TD> </tr> </table> </td></tr></table> <br> <table border="0" width="90%" cellspacing="0" cellpadding="0" class="bordercolor" align="center"><tr><td> <table border="0" width="100%" cellpadding="3" cellspacing="0" bgcolor="#FFFFFF"> <tr> <td bgcolor="#FFFFFF" valign="middle" align="center" colspan="3"> <!-- google_ad_section_start(weight=ignore) --> <span class="small"> <p align="center">Our other projects:</p> <p align="center"><a href="http://www.sunnyfaces.net">Art Dolls, Fairies and Mermaids - Sunnyfaces.net</a></p> <a href="http://www.schestowitz.com/UseNet/" title="Roy's messages about Linux, Search Engines and computer programming" target="_blank">Roy's Linux, Programming and Search Engines messages</a><br> <br/> <a href="/sitemap.xml"><img src="/images/xml.gif" alt="1-Script XML Sitemap" border = "0">XML Sitemap</a> <br/> <!-- Page generated in 1.093 seconds. --></span> <!-- google_ad_section_end --> </td> </tr> </table> </td></tr></table> </body> </html>