I want to be able to ask users for a URL, open that page, change some of the  
contents and then display that page as if they had typed the URL into a  
browser. I have toyed with some of the php functions for opening URLs, but  
what I am not clear on is how much work my script will have to do (do I need  
to fully emulate a browser, for example).

The net effect I am after is very similar to the page translation feature  
that Google offers. Does anyone have any examples of this kind of technique.  
Any ideas how much work is involved? (My 'translation' is pretty trivial, so  
really it is mostly a question of how much work to display the remote page).

Thanks in advance!



Re: Open and process remote page

Following on from William Hudson's message. . .
Personally I'd approach this by writing a proxy server in Java and deal  
at the header level with http requests/responses rather than try to be  
an arms-length browser/server.

With your PHP approach: If you want to 'serve'  
then you need to fetch it, parse it recursively looking for urls inside  
frameset, css and javascript and perhaps hack them and at least fetch  
them.  A 'web page' is not necessarily a single entity.  Suppose  
index.htm is just a frameset.  You could 'translate' this as much as you  
like but the 'real' content would be missed.

With the proxy server you look for http responses with mime types of  
interest and translate the data as appropriate then pass on.

Java Examples In A Nutshell (Pub. O'Reilly) shows how straightforward it  
is.  You'd need to bootstrap a proxy session from your normal PHP pages  
by telling it who's calling and what they want to see.

Re: Open and process remote page

William Hudson wrote:

Hi William,  

Some thoughts:

- Opening a remote URL is very easy in PHP as you probably found out.
(Just fopen and offer an URL, PHP will in most cases wrap the whole complex  
request into a handle that can be treated as a (readonly) file.)

beware however of the paranoid webdesigner.
Many people have this twisted idea that they want to offer content to the  
world, but try to make it difficult for you to read the source.
Often Javascript is used to make things more difficult.
(Beats me why, but they come in masses.)

If you only want normal plain HTML-pages, I think you can just fopen,  
replace the stuff you want, and deliver that (in a frame eg., or whatever  
you like).

Also be aware of redirects by the server. (page moved)
I have seen a few situation where PHP doesn't handle that very well.
Or maybe it was the webserver sending something strange, I do not remember  
for sure, I only remember that PHP and redirects with fopen-wrapper around  
an URL had some issues.

Beside the above possible traps, I do not expect you will find a lot of  
trouble. I once wrote something similar, be it more simple than what you  
are doing, and it was all very straightforward.

You could also get yourself in trouble (with regexpr. or substringsearching,  
etc) when trying to replace some pieces in the HTML when the HTML is not  
coded as it 'should' be: Think about missing end-tags and the like.
Browsers are very forgiving, but the programmers of the browsers had  
headaches before their program was forgiving enough. :-)

But maybe you can get away with just replacing stuff you understand, and let  
the remainding HTML as it was. Then the browser can display it the way it  
was ment. (probably).

just my 2 cents.

Good luck.

Erwin Moller

Re: Open and process remote page

Erwin Moller wrote:
HTML tidy is your friend here. It has saved me from many a nasty  
frontpage generate HTML page :)


Re: Open and process remote page

Kimmo Laine wrote:
Unless next_page.php generates PHP, the script with this include will
only get HTML.

    if (isset($_GET['foo'])) {
      echo '<?php echo $_GET[\'foo\']; ?>';
    } else {
      echo '<?php echo \'Not available\'; ?>';

Re: Open and process remote page

Krustov wrote:

What's all this $rocky junk?

$contents = str_replace("BBC","Bungholes",$contents);
$contents = str_replace("bbc","Bungholes",$contents);
$contents = str_replace("the","its only a word",$contents);

