Click here to get back home

"reverse templating" or "auto-meta-regex" module for automated screen-scrape learning?

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
"reverse templating" or "auto-meta-regex" module for automated screen-scrape learning? Weston 09-18-2007
Posted by Weston on September 18, 2007, 10:09 pm
Please log in for more thread options
I was discussing screen scraping with some acquaintances recently, and
they claimed they'd seen a website which allowed users to select
certain regions of a given page via a nice UI... and there was an app
behind this that would then learn from these selections to extract
data from corresponding regions of similar pages. "Reverse templating"
and "auto-meta-regex" were the terms we came up with, but there's
probably a better description. They also claimed there were perl
modules that did this same thing, but I haven't been able to locate
them on CPAN -- does anyone know what these might be?

Thanks!


Posted by David Steinbrunner on September 19, 2007, 10:45 pm
Please log in for more thread options
On 2007-09-18 22:09:09 -0400, Weston

> I was discussing screen scraping with some acquaintances recently, and
> they claimed they'd seen a website which allowed users to select
> certain regions of a given page via a nice UI... and there was an app
> behind this that would then learn from these selections to extract
> data from corresponding regions of similar pages. "Reverse templating"
> and "auto-meta-regex" were the terms we came up with, but there's
> probably a better description. They also claimed there were perl
> modules that did this same thing, but I haven't been able to locate
> them on CPAN -- does anyone know what these might be?

Apple created the Web Clip Widget for Mac OS X 10.5 which is what
popped into my head when I read this. Basically it allows you to
select a region of a page which corresponds to a table, div or whatever
and make a widget out of it. They showed this off a long time ago and
have yet to release it but someone created a knock off right away:

Dash Clipping
http://www.fondantfancies.com/blog/3001239/

On the perl side of things, when it comes to scraping I would say
HTML::Treebuilder is your best friend. It allows you to parse down to
the table, div or whatever and play with what is inside of it.

But it sounds like you are looking for more. Maybe a UI that allows
you to select the table, div or whatever and it then generate perl code
that uses HTML::Treebuilder to get you to where you selected in the UI.
Now that sounds fun. Something tells me someone could work towards
that using Camelbones to access the WebKit innards.

Sorry to those that are put off by the talk of Mac stuff... it is what
I know and where I play.

--
David Steinbrunner


Similar ThreadsPosted
Reverse SortByValue with Tie::IxHash October 2, 2004, 3:35 pm
Amazon: "Learning PERL, 5th Edition" (2008) July 14, 2008, 4:00 pm
Best HTML Templating Module July 17, 2005, 6:04 pm
Request for naming help (templating module) September 8, 2005, 11:34 pm
I don't want to write an HTML templating module August 30, 2007, 5:27 am
Lower case module name for non-pragma module January 4, 2005, 10:19 am
RFC: New module 'Module::Bundled::Files' August 26, 2005, 3:49 pm
help with an MD5.pm module!! September 3, 2005, 11:39 pm
Looking for RTP module December 9, 2004, 9:17 pm
module for FFT May 9, 2005, 3:06 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap