Click here to get back home

Help with Mechanize

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Help with Mechanize Bill 01-15-2007
Posted by Bill on January 15, 2007, 3:14 pm
Please log in for more thread options


Hello,



I could use some help with Mechanize and Andy Lester recommended I post an
email on the libwww mailing list. I am trying to do what should be a simple
scrape of the us patent and trademark website for bibliographic info that
they post for all patents. Unfortunately I keep getting re-routed to a page
that says



"We are unable to display the requested information. Please note that all
requests must be made using this form."



Do you think I am out of luck or are there some things I can try? The form
that is used to request the patent info does have the following javascript
line:



<script language="JavaScript" type="text/javascript">

<!--

document.forms["mfInputForm"].elements["patentNum"].focus()

// -->

</script>



Basically, I am wondering how the website could know that I am using
mechanize and not internet explorer to enter the info into the fields and
click "submit."



Here is my perl code. Thanks.



#!/usr/local/bin/perl -w

print "Content-type: text/html\n\n";

use strict;

use WWW::Mechanize;

use Crypt::SSLeay;

my $url = "https://ramps.uspto.gov/eram/";

my $maintenancepatent = "5771669";

my $maintenanceapp = "08672157";

my $outfile = "out.htm";

my $mech = WWW::Mechanize->new( autocheck => 1);

$mech->proxy(['https'], '');

$mech->get($url);

$mech->follow_link(text => "Pay or Look up Patent Maintenance Fees", n =>
1);

$mech->form_name('mfInputForm');

$mech->field(patentNum => "$maintenancepatent");

$mech->field(applicationNum => "$maintenanceapp");

$mech->add_header( Referer => $url );

$mech->click_button (number => 2);

open(OUTFILE, ">$outfile");

my $output_page = $mech->content();

print OUTFILE "$output_page";

close(OUTFILE);

print "done";



Similar ThreadsPosted
WWW::Mechanize v 1.03_01 August 3, 2004, 9:57 pm
Mechanize question November 8, 2004, 10:57 pm
WWW:Mechanize with Menu September 9, 2006, 9:13 pm
WWW:Mechanize problem? January 29, 2007, 7:00 am
WWW::Mechanize "There is no Form named (...)" December 8, 2004, 5:58 am
win32::ie::mechanize PROBLEM December 10, 2004, 7:06 am
WWW::Mechanize and NTLM authentication December 21, 2004, 3:55 pm
www::mechanize $mech->select February 4, 2005, 5:01 pm
WWW::Mechanize cannot find the form. February 14, 2006, 9:04 pm
WWW::Mechanize single quotes around url January 12, 2007, 12:54 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap