Hello,

 

I am posting this on behalf of my colleague Ed Pinsent.

He has some interesting things to say and great advice!

 

All the best,

 

Patricia Sleeman

************************************************************

*      I am wondering if anyone has done this before and how you were
able to archive it.

 

Yes. Here at University of London Computer Centre I have been archiving
Higher Education project websites in the UK since 2005. We use a crawler
tool called Heritrix (http://crawler.archive.org/) to perform the
capture. This is quite a sophisticated copying tool that runs
recursively over an entire website and makes copies of all the pages,
stylesheets, images etc, and manages the internal links so that the
copied website retains its internal structure.

 

We manage Heritrix through an interface called Web Curator Tool
(http://webcurator.sourceforge.net/), which as the name suggests allows
us to manage the gathers in a curatorial way, adding metadata and
descriptions to the gathers.

 

Those tools however are mainly used by large Institutions (e.g. National
Libraries) who want to gather several website titles and build
collections. For your purposes it might be simpler to use a tool called
HTTrack, a free utility for copying websites; it's fairly well supported
with online guidance. (http://www.httrack.com/)

 

*      We are currently looking to launch a new website but would like
to archive the old one.

 

This may depend on what content (and how much of the content) you intend
to capture, and more importantly what you expect to do with the content
in the future. Depending on the target website and how it's built, it
may not be possible for you to capture all the content, or render all
the functionality of the site. This is particularly true if the website
was built using a content management system, or has some of its content
provided from a server-side database; or if it uses scripts for
navigation, for example.

 

If it's any help, there are numerous international initiatives for
gathering (and preserving) website content, including the Library of
Congress Minerva project, and the Internet Archive (www.archive.org
<http://www.archive.org/> ). The former is a selective approach, while
the Internet Archive is trying to capture everything on the web.

 

May I also recommend our Handbook on web preservation, available for
free from http://jiscpowr.jiscinvolve.org/wp/handbook/. It is targeted
at the HFE sector, but has a lot of useful and practical advice for
approaches you can take. 

 

Please have a look at our blog for discussions about issues to do with
digital preservation http://dablog.ulcc.ac.uk/ .

 

 

	Ed Pinsent
	Digital Archivist / Project Manager
	Digital Archives and Repositories
	University of London Computer Centre
	Senate House
	South Block
	Malet Street
	London WC1E 7HU

	http://www.ulcc.ac.uk/digital-preservation.html
	http://www.ulcc.ac.uk <http://www.ulcc.ac.uk/> 
	Tel: +44 (0)20 7863 1345

	The University of London is an exempt charity in England and
Wales and a charity registered in Scotland (reg. no. SC041194)

	 

	 


________________________________

To unsubscribe from the MUSEUM-L list, click the following link:
http://home.ease.lsoft.com/scripts/wa.exe?SUBED1=MUSEUM-L&A=1 


=========================================================
Important Subscriber Information:

The Museum-L FAQ file is located at http://www.finalchapter.com/museum-l-faq/ . You may obtain detailed information about the listserv commands by sending a one line e-mail message to [log in to unmask] . The body of the message should read "help" (without the quotes).

If you decide to leave Museum-L, please send a one line e-mail message to [log in to unmask] . The body of the message should read "Signoff Museum-L" (without the quotes).