Suggestions -- Web-Based Document Archive

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

This is a request for suggestions on software, strategy -- anything relevant
to a massive project I am overseeing on a volunteer basis.

I am preparing an archive of the personal papers of a leader of the
California environmental movement, who died earlier this year after more
than half a century as an activist. She left about 250,000 paper documents,
which are being pared down to about 10% of that number by professional
archivists and researchers. We want to create a web-based resource that can
be accessed by scholars and students anywhere.

I've overseen the creation of many sites, but mostly "brochureware" sites of
a dozen or so pages, static or CMS-driven. This isn't great training for a
project that may have 25,000 to 50,000 scanned documents, presumably in PDF
format, each indexed by name, key words, etc.

Are there open source programs suitable for this kind of application, and
that can handle a database of this size? Any recommendations would be
appreciated. Ditto for commercial software if that would be a better
solution, for links to information about creating web-based research
archives, approaches we should taken when planning the project and the site
architecture, etc.

Would it make sense to include a "wiki" function, so scholars can comment on
and update the material? (I assume this would have to be moderated, to
control spam.)

I don't expect the site to get a lot of traffic, because of its specialized
content and appeal. Is a dedicated server the only option? (Budget is an

Thanks in advance for any ideas, suggestions, cautionary advice...


Re: Suggestions -- Web-Based Document Archive

Alex wrote:
Quoted text here. Click to load it

PLEASE consider the proper techniques for doing this right. Talk to
librarians, not web developers.  There's a huge body of previous work
on this and librarians and museum curatorial staff have published a
great deal of advice and experience on just this sort of web project.
There are all sorts of issues you wouldn't realise at first around
metadata models, digitisation and consistency control of manually
entered metadata property values.

You should be able to find a suitable open source project to host all
this. It's a couple of years old now, but you might find the MIT DSpace
project worth looking at.

Re: Suggestions -- Web-Based Document Archive

Quoted text here. Click to load it

Thanks for the advice. We have an archivist working with us on some of the
issues you raised. I think we still need help on the website end of things.
I took a quick look on for information about DSpace, but access is
blocked to DSpace; only press releases, etc., are available.


Site Timeline