Hello, I am posting this on behalf of my colleague Ed Pinsent. He has some interesting things to say and great advice! All the best, Patricia Sleeman ************************************************************ * I am wondering if anyone has done this before and how you were able to archive it. Yes. Here at University of London Computer Centre I have been archiving Higher Education project websites in the UK since 2005. We use a crawler tool called Heritrix (http://crawler.archive.org/) to perform the capture. This is quite a sophisticated copying tool that runs recursively over an entire website and makes copies of all the pages, stylesheets, images etc, and manages the internal links so that the copied website retains its internal structure. We manage Heritrix through an interface called Web Curator Tool (http://webcurator.sourceforge.net/), which as the name suggests allows us to manage the gathers in a curatorial way, adding metadata and descriptions to the gathers. Those tools however are mainly used by large Institutions (e.g. National Libraries) who want to gather several website titles and build collections. For your purposes it might be simpler to use a tool called HTTrack, a free utility for copying websites; it's fairly well supported with online guidance. (http://www.httrack.com/) * We are currently looking to launch a new website but would like to archive the old one. This may depend on what content (and how much of the content) you intend to capture, and more importantly what you expect to do with the content in the future. Depending on the target website and how it's built, it may not be possible for you to capture all the content, or render all the functionality of the site. This is particularly true if the website was built using a content management system, or has some of its content provided from a server-side database; or if it uses scripts for navigation, for example. If it's any help, there are numerous international initiatives for gathering (and preserving) website content, including the Library of Congress Minerva project, and the Internet Archive (www.archive.org <http://www.archive.org/> ). The former is a selective approach, while the Internet Archive is trying to capture everything on the web. May I also recommend our Handbook on web preservation, available for free from http://jiscpowr.jiscinvolve.org/wp/handbook/. It is targeted at the HFE sector, but has a lot of useful and practical advice for approaches you can take. Please have a look at our blog for discussions about issues to do with digital preservation http://dablog.ulcc.ac.uk/ . Ed Pinsent Digital Archivist / Project Manager Digital Archives and Repositories University of London Computer Centre Senate House South Block Malet Street London WC1E 7HU http://www.ulcc.ac.uk/digital-preservation.html http://www.ulcc.ac.uk <http://www.ulcc.ac.uk/> Tel: +44 (0)20 7863 1345 The University of London is an exempt charity in England and Wales and a charity registered in Scotland (reg. no. SC041194) ________________________________ To unsubscribe from the MUSEUM-L list, click the following link: http://home.ease.lsoft.com/scripts/wa.exe?SUBED1=MUSEUM-L&A=1 ========================================================= Important Subscriber Information: The Museum-L FAQ file is located at http://www.finalchapter.com/museum-l-faq/ . You may obtain detailed information about the listserv commands by sending a one line e-mail message to [log in to unmask] . The body of the message should read "help" (without the quotes). If you decide to leave Museum-L, please send a one line e-mail message to [log in to unmask] . The body of the message should read "Signoff Museum-L" (without the quotes).