web page capture / compression

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I'm working on an app where a page is created dynamically via a CMS,
and I need to save the page similar to the way you can save a
'complete web page' via a browser.  I need to save the graphics,
links, embedded applets, flash, etc.

Is there any way that I can use native PHP code to 'capture' the web
page and maintain the functionality of the page?

Re: web page capture / compression


On 09/27/2004 06:00 PM, Steve High wrote:
Quoted text here. Click to load it

You may want to try this HTTP client class to retrieve the current page
HTML. If you need to login the class can also handle the eventual
cookies and redirection if needed. Than you need to parse the HTML and
figure which other elements you need to retrieve and save.



Manuel Lemos

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org /

PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews /

Metastorage - Data object relational mapping layer generator

Re: web page capture / compression

Steve High wrote:

Quoted text here. Click to load it

AFAIK, not unless you count using system() or exec() ;) in that case,
you could call up wget to do the capture and saving of files.

Justin Koivisto - spam@koivi.com

Re: web page capture / compression

schigh@comcast.net (Steve High) wrote in message
Quoted text here. Click to load it

Hmmm.  I figured out how to do it...so i will post here in case anyone
else has the same problem.
I used cURL to grab the url as a file stream, then parsed the
<object>, <img>, and <applet> tags to look for the source.  I then
created a dynamic 'images' directory and an 'objects' directory where
the images, swfs, and applet jars are stored.
Then i replaced the source in the file stream to point to my new
directories.  Finally, i bundled the files and folders with php
compression utility to make one zip-readable file...

it's a lot of jerry-rigging, but it works pretty well.
anyone interested in the source should email me.

Site Timeline