|
Posted by damians on February 20, 2008, 12:01 pm
Please log in for more thread options wrote:
> aage.gribs...@gmail.com wrote:
> > I wish to capture data from a Web page e.g.
> > "http://www.eppraisal.com/PropertyInfo.aspx?a=3D1215%20Jefferson
> > %20Ave&z=3D46201"
>
> > I am using the LWP modules.
> > The page responds in three steps and I have succeeded in capturing
> > only the first.
>
> > The page first paints up nicely with "Loading" text in the area of
> > interest.
> > After a delay the "Loading" text is replaced with "Calculating".
> > Shortly thereafter, sometimes apparently instantaniously, the data of
> > interest appears.
>
> > I have tried LWP:: UserAgent and LWP::Parallel::UserAgent and capture
> > only the initial response.
> > TimeOut parameters do not change the behavior.
> > The callback subroutine indicates the HTML comes in several chunks.
>
> > How can the other responses be captured?
> > The documentation mentions =A0LPW::Parallel::UserAgent::Entry objects
> > and follow up requests.
> > Will this be of help?
> > I have found no documentation of this feature.
> > Is there any additional documentation or examples?
>
> It's using javascript - which neither LWP nor WWW::Mechanize will
> execute - =A0to move between pages. You could try using
> Win32::IE::Mechanize or Selenium, but both of these rely on controlling
> a running browser.
>
> Mark
There is an API to some of our data. What data elements are you
looking to pull?
Send me an email or to info (at) eppraisal.com. Scraping the front-end
is time consuming and prone to errors (when we push out updates).
Damian (from eppraisal.com)
|