text to copy out of html sides

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

there is a web page with links to other sides which include texts which I
want to copy into word. There are quite a lot of links. Is there a
possibility to get get the original texts without clicking on each of those
links a copy the text manually?

Thanks for your help.


Re: text to copy out of html sides

M. Lesaar wrote:

Quoted text here. Click to load it

If you use Word, I assume that you work under Windows and you will lack
flexibility. You can install Cygwin (www.cygwin.com) to get Linux
functionality, which will enable you to do the following.

If the page is located at ADDRESS, run the following command:

wget -r -l2 -t1 -N -np -erobots=off ADDRESS

This assumes internal links, but can be modified as necessary (see 'man

You should then have a directory (or several directories) with all the text
(hopefully not hypertext, which complicates things). You can then append
the files using 'cat' (see 'man cat').

I am afraid that I see no simpler alternatives. If you don't perform this
task often, then it is not worth the investment.


Roy S. Schestowitz

Re: text to copy out of html sides

M. Lesaar wrote:
Quoted text here. Click to load it

You can use vbscript to do this fairly straightforwardly on your Windows system,
by fleshing out the below to an HTML2Word.vbs file:

     set ie = newIEtoForeground("HTML to Word")
     SourcePage = "your web page address"
     Do Until ie.ReadyState=4 : Wscript.Sleep 10 : Loop
Now get the links for this page and stuff them into an array or some such
     For each such link:
         ie.Navigate(that link)
         Do Until ie.ReadyState=4 : Wscript.Sleep 10 : Loop
         myText = ie.Document.Body.innerText
         Save the text to Word here
     End of For

You can find the code for newIEtoForeground at
http://groups-beta.google.com/group/microsoft.public.scripting.vbscript/browse_frm/thread/b5a4788bb2dacc09 /

Of course you still need to save out the links in the first page
and then you have to save myText to Word
(microsoft.public.scripting.vbscript can help you with questions)
but this should give you a framework.

Csaba Gabor from Vienna

Site Timeline