Scan web pages and compose summary

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I am looking for a way to read html file and create
a short summary (like that shows in google results for example)
which ought to be the first few lines of welcome text or so.

Does any got any idea on how to do this? (I searched allot,
but all I found was simply extracting meta tags).


Re: Scan web pages and compose summary

Well, the tricky part is that you'll need to decide what text to grab
and show from the file - which is why there's a meta description tag
for the purpose. I believe google grabs the text surrounding a search
term and displays that if there's no meta description tag to use - so
if you're actually searching for a term you could do something like

www.NEXCESS.NET - Shared/Reseller Hosting - Dedicated Servers, Server Clusters - Virtual Private Servers
- Great prices, Great service - check us out!

Quoted text here. Click to load it

Re: Scan web pages and compose summary


solk wrote:
Quoted text here. Click to load it

I can recommend Snoopy ( /). It is able to  
retrieve an entire web page, follow links and so on. The result will be  
the HTML source output you can see if you do a view source in your web  
browser. From there you can strip HTML tags, use substr() to jump to  
certain sections in the source (eg: jump to right after the body tag,  
remove all HTML tags and save the text output).

- Jensen

Site Timeline