Click here to get back home

How to find out un-referenced webpages,images and files in web pages directory tree ?

 HomeNewsGroups | Search | About
 comp.infosystems.www.authoring.html    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
How to find out un-referenced webpages,images and files in web pages directory tree ? Patricia Mindanao 12-06-2007
Posted by Patricia Mindanao on December 6, 2007, 6:15 am
Please log in for more thread options
I have a directory tree on my hard disc which represents all the web pages and
linked stuff
on my mirrored web hoster server.

All web pages and files are statically linked. So dynamically composed links e.g.
with javascript do not matter here.

Now I want to find out which of all these (many) files are un-reference orphans
starting from the main page index.html (or index.shtml)

In other words if e.g. a file aaa.log can not be referenced by a chain like

index.html -> subpage8.html -> details2345.html -> aaa.log

Is there a tool which help me to investigate all these un-referenced webpages
and files?
Of cause without doing a manual code review :-)

Keep in mind that the static link URLs can be absolute
(http://www.mywebpages.com/content/subpage8.html)
or relative (content/subpage8.html)

Pat

Posted by William Hughes on December 7, 2007, 8:45 am
Please log in for more thread options
On 06 Dec 2007 11:15:12 GMT, in comp.infosystems.www.authoring.html
patmin@hotmail.com (Patricia Mindanao) wrote:

>Is there a tool which help me to investigate all these un-referenced webpages
and files?
>Of cause without doing a manual code review :-)

Xenu - http://home.snafu.de/tilman/xenulink.html

Also checks external (off-site) links.
--
William Hughes, San Antonio, Texas: cvproj@grandecom.net
The Carrier Project: http://home.grandecom.net/~cvproj/carrier.htm
Support Project Valour-IT: http://soldiersangels.org/valour/index.html

Posted by David E. Ross on December 7, 2007, 7:07 pm
Please log in for more thread options
On 12/6/2007 3:15 AM, Patricia Mindanao wrote:
> I have a directory tree on my hard disc which represents all the web pages and
linked stuff
> on my mirrored web hoster server.
>
> All web pages and files are statically linked. So dynamically composed links
e.g.
> with javascript do not matter here.
>
> Now I want to find out which of all these (many) files are un-reference orphans
> starting from the main page index.html (or index.shtml)
>
> In other words if e.g. a file aaa.log can not be referenced by a chain like
>
> index.html -> subpage8.html -> details2345.html -> aaa.log
>
> Is there a tool which help me to investigate all these un-referenced webpages
and files?
> Of cause without doing a manual code review :-)
>
> Keep in mind that the static link URLs can be absolute
(http://www.mywebpages.com/content/subpage8.html)
> or relative (content/subpage8.html)
>
> Pat

Using your example, use a search tool (e.g., Search on Windows, grep on
UNIX) to search the directory for all files of the form *.html, first
for the string href="aaa.log" and second for the string
href="http://www.mywebpages.com/content/aaa.log". I do this often but
not often enough to create a search script.

--
David E. Ross
<http://www.rossde.com/>

Natural foods can be harmful: Look at all the
people who die of natural causes.

Posted by Klaus Johannes Rusch on December 16, 2007, 7:22 pm
Please log in for more thread options
Patricia Mindanao wrote:
> Is there a tool which help me to investigate all these un-referenced webpages
and files?
> Of cause without doing a manual code review :-)

linklint <URL:http://www.linklint.org/> can determine orphans and
supports both local-file and HTTP site checking

--
Klaus Johannes Rusch
KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/

Similar ThreadsPosted
Script for migrating HTML tree into a single directory ? September 23, 2005, 10:41 am
W3C's HTML validator unable to find PHP or content negotiated files? November 16, 2004, 8:49 pm
Simple (!) HTML/PERL code fpr uploading files through web pages entyr field ?? November 18, 2004, 11:12 pm
Problem with downloading Word doc files and rtf files through browser January 25, 2005, 5:33 pm
Printing Webpages May 12, 2005, 2:23 pm
One GIF: different areas link to different webpages November 26, 2005, 11:49 am
simple editor just for the text in webpages November 16, 2005, 2:17 pm
create margin notes on webpages March 6, 2006, 9:03 pm
Apps for Creating webpages / Basic Authentication April 1, 2005, 6:32 am
WebPages with cyrillic chars are not displayed correctly January 9, 2006, 12:12 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap