needsupdate

Backing Up and Restoring a DSpace Instance

This is a strawman guide to backing up your DSpace instance. Comments, additions, errata etc. invited.

TODO: Good restore info

When it comes to backing up DSpace there are a few broad backup strategies you can take:

File system backup

Back up DSpace and the applications and data files as they are, directly from the file system. The advantage to this is that, if you have a compatible hardware/OS stack, it is very straightforward to restore a DSpace backup to a functioning instance. The disadvantage is that if you need to restore to a different platform (or maybe even a similar platform with a newer OS version) your backup will not be easy to restore as many of the restored apps won't work.

At an absolute minimum, you should back up:

Really you could carry own down the tool and O/S stack; Java, Tomcat it's a judgement call as to how far down you go.

Storage layer backup

Just back up the data; i.e. the RDBMS tables and [dspace]/assetstore (in the default config). You assume that you will be able to recreate a software/hardware environment that will run DSpace, and then you will just need to restore the data.

The advantage of this approach is that you only have to back up the data; as long as you can reconstruct the correct DSpace and PostgreSQL versions, it doesn't matter if the underlying hardware and OS are different. i.e. you're not so reliant on having an exact replica of the hardware and OS around to restore the backup.

However, you DO need the right version of DSpace (including any customisations you may have made) since the underlying DB schema can change between DSpace versions (usually when the second number has changed, e.g. 1.1.1 -> 1.2).

You also need a compatible version of PostgreSQL to restore the backup. There are two approaches here. One is a 'file system level backup', where you simply backup the relevant PostgreSQL data files from the system (e.g. /usr/local/pgsql/data), as described here. Alternatively (and preferably) you can do an 'SQL dump' of the database, in which you get PostgreSQL to dump out the contents of the DSpace database as an SQL file, which you can feed back into another PostgreSQL instance to recreate the database. This is described here.

In short, what you'd do here is:

Some notes in http://dspace.org/technology/system-docs/storage.html about backing up and restoring your Postgres database and the bitstream store.

Use import/export tools

In DSpace 1.7.0 and above, there is now an AIP (Archival Information Package) Backup and Restore process available. For more information see: AIP Backup and Restore

Back up the data using the batch item exporter. In other words, you'd use the batch item exporter to export all of the contents of your DSpace to location X, and back X up.

From the point of view of being able to restore a functioning DSpace system to how it was before a failure, this isn't a complete solution, since the item exporter does not export all the information. Specifically, the following information is not present:

There are a few other issues right now:

The experimental METS exporter tool does export richer information about each item, but there's no corresponding importer yet! The CWSpace project at MIT is looking at providing this in the next few months though.

Although there are issues, this strategy does have benefits in terms of long-term preservation. First and foremost, you don't need a functioning DSpace instace to be able to access your backed up data. If you have another application or converter that can understand the export format, you can put the backed up data into a different system if you wish. The exported files are also independent of any particular version of PostgreSQL or other tools, and hopefully also independent of DSpace version; the batch import/export format should remain backwards-compatible.

Notes

For the short term, strategy 1 or 2 (or a mix) is probably the best way to go; if you have the backup capacity, following strategy 3 in parallel has the potential for long-term benefits.

As requested, a short description of the procedure we follow.

Other methods

Please check out: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Disaster_Recovery for a method on Ubuntu servers.