Click here to get back home

Get all Link from a Website

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Get all Link from a Website saqib ali 11-18-2004
Posted by saqib ali on November 18, 2004, 2:51 pm
Please log in for more thread options
Hello All,

I manage a rather large website, that has several hundred content
managers. These content managers can create links at their will. I want
to take a look a list of all the Links that are on our website.

Is there a utility that can generate a list (text based) of all the
URLs that are mentioned on our website?

Thanks.
Saqib Ali
http://validate.sf.net <--- DocBook XML / XHTML Validator



Posted by Geoff Muldoon on November 18, 2004, 11:42 pm
Please log in for more thread options
rumionfire@gmail.com says...

> I manage a rather large website, that has several hundred content
> managers. These content managers can create links at their will. I want
> to take a look a list of all the Links that are on our website.
>
> Is there a utility that can generate a list (text based) of all the
> URLs that are mentioned on our website?

What platform?

If on Linux I'd recommend:
http://htcheck.sourceforge.net/

Geoff M


Posted by saqib ali on November 18, 2004, 4:04 pm
Please log in for more thread options
windows would be preferrable.

i don't want a elaborate link checker. I just want a simple console
based app that i can i run on a nightly basis, that generate a text
file with all the links on my website. I need to pass that text file on
a C++ program that I wrote.

Thanks.
Saqib Ali



Posted by Alan J. Flavell on November 19, 2004, 12:11 am
Please log in for more thread options
On Thu, 18 Nov 2004, saqib ali wrote:

> windows would be preferrable.

Xenu link checker produces something along the lines that you're
describing.

> i don't want a elaborate link checker. I just want a simple console
> based app that i can i run on a nightly basis, that generate a text
> file with all the links on my website.

It's not quite what you want, but it might be worth looking at
nevertheless.

Have you considered lynx (available in a win32 version), which has
various site-exploring options that can be invoked as a batch job?


Posted by SimonFx on November 19, 2004, 6:38 am
Please log in for more thread options
A simple perl script could do this easy.

Even grep, if you have the latest grep plus the extra pain in the butt
DLLs you need to download for windows.

Something like:
grep -ior "href=[^>]*" c:internetwww*.html > links.txt

or possibly:
grep -ior href="[^"]* c:internetwww*.html > links.txt

Hmmm, if you want to create a historic log, create a batch file with:

for /f "tokens=1-4 delims=/.- " %%A in ('date /t') do SET FN=%%D%%C%%B
grep -ior "href=[^>]*" c:internetwww*.html > %FN%.log

Grep + DLLs (libintl, libiconv, pcre) available from
http://sourceforge.net/project/showfiles.php?group_id=23617

Sourceforge grep is a bit buggy - but the only one I know of that has
the "-o" option (not show whole line, just the text that matches the regexp)

Perl code would be cleaner, more reliable. Pay a perl programmer $40 to
write it or buy an old perl book from the bargain bin at your local
bookshop and write it yourself.


Similar ThreadsPosted
website link problem in css August 15, 2005, 12:29 pm
html link from browser link to xml editor September 9, 2004, 5:53 am
Is this website ok May 13, 2006, 1:04 am
How is this website made???? August 26, 2004, 6:30 am
website hosting November 29, 2004, 7:17 am
website hosting January 3, 2005, 2:29 pm
Validating old website (W3C) July 7, 2005, 1:07 pm
Need code for website. January 23, 2006, 7:30 pm
Using the W3schools website January 8, 2007, 6:20 pm
website review January 22, 2007, 8:59 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap