"reverse templating" or "auto-meta-regex" module for automated screen-scrape learning?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I was discussing screen scraping with some acquaintances recently, and
they claimed they'd seen a website which allowed users to select
certain regions of a given page via a nice UI... and there was an app
behind this that would then learn from these selections to extract
data from corresponding regions of similar pages. "Reverse templating"
and "auto-meta-regex" were the terms we came up with, but there's
probably a better description. They also claimed there were perl
modules that did this same thing, but I haven't been able to locate
them on CPAN -- does anyone know what these might be?


Re: "reverse templating" or "auto-meta-regex" module for automated screen-scrape learning?

On 2007-09-18 22:09:09 -0400, Weston

Quoted text here. Click to load it

Apple created the Web Clip Widget for Mac OS X 10.5 which is what
popped into my head when I read this.  Basically it allows you to
select a region of a page which corresponds to a table, div or whatever
and make a widget out of it.  They showed this off a long time ago and
have yet to release it but someone created a knock off right away:

Dash Clipping
http://www.fondantfancies.com/blog/3001239 /

On the perl side of things, when it comes to scraping I would say
HTML::Treebuilder is your best friend.  It allows you to parse down to
the table, div or whatever and play with what is inside of it.

But it sounds like you are looking for more.  Maybe a UI that allows
you to select the table, div or whatever and it then generate perl code
that uses HTML::Treebuilder to get you to where you selected in the UI.
 Now that sounds fun.  Something tells me someone could work towards
that using Camelbones to access the WebKit innards.

Sorry to those that are put off by the talk of Mac stuff... it is what
I know and where I play.

David Steinbrunner

Site Timeline