|
Posted by David E. Ross on December 7, 2007, 7:07 pm
Please log in for more thread options On 12/6/2007 3:15 AM, Patricia Mindanao wrote:
> I have a directory tree on my hard disc which represents all the web pages and
linked stuff
> on my mirrored web hoster server.
>
> All web pages and files are statically linked. So dynamically composed links
e.g.
> with javascript do not matter here.
>
> Now I want to find out which of all these (many) files are un-reference orphans
> starting from the main page index.html (or index.shtml)
>
> In other words if e.g. a file aaa.log can not be referenced by a chain like
>
> index.html -> subpage8.html -> details2345.html -> aaa.log
>
> Is there a tool which help me to investigate all these un-referenced webpages
and files?
> Of cause without doing a manual code review :-)
>
> Keep in mind that the static link URLs can be absolute
(http://www.mywebpages.com/content/subpage8.html)
> or relative (content/subpage8.html)
>
> Pat
Using your example, use a search tool (e.g., Search on Windows, grep on
UNIX) to search the directory for all files of the form *.html, first
for the string href="aaa.log" and second for the string
href="http://www.mywebpages.com/content/aaa.log". I do this often but
not often enough to create a search script.
--
David E. Ross
<http://www.rossde.com/>
Natural foods can be harmful: Look at all the
people who die of natural causes.
|