...
This Add-On also integrates DSpace with DuraCloud for users that wish to easily back up their content into DuraCloud directly from their DSpace administrative interface.
Info | ||
---|---|---|
| ||
On May 3, 2013, two new versions of the Replication Task Suite (RTS) were released:
|
Note | ||
---|---|---|
| ||
For a quick overview of the various tasks offered in the Replication Task Suite, along with some real-life scenarios / examples of where each Replication task may come in handy, you may wish to skip directly to the Problem Statement and Usage Examples section at the bottom of this page. |
...
Replication Task Suite Version | Supported DSpace Version(s) | Supported Interfaces | Notes |
---|---|---|---|
1.01 | DSpace version 1.8.x | XMLUI and/or commandline | Highly recommended to use either DSpace 1.8.1 or 1.8.2. DSpace 1.8.0 has a known bug where running a Replication Task will always return a NullPointerException - see DS-1077 |
3.0 | DSpace version 3.x | XMLUI and/or commandline | The " 3.0 " stable version of the Replication Task Suite is nearly identical to the " 1.1 " stable version. It just includes minor bug fixes to ensure the Replication Task Suite is compatible with the new DSpace 3.x API. |
...
- In your DSpace Source directory (
[dspace-src]
), you will modify two Mavenpom.xml
files:[dspace-src]/dspace/pom.xml
(This POM controls dependencies of CommandLine scripts. Modifying it will let you rundspace-replicate
from commandline)[dspace-src]/dspace/modules/xmlui/pom.xml
(This POM controls dependencies of XMLUI. Modifying it will let you rundspace-replicate
from XMLUI)
For each of these pom.xml files, add the following
<dependency>
section at the end of the existing<dependencies>
section (just before the closing</dependencies>
tag).Code Block <dependencies> ... <!-- Adding this dependency will install the Replication Task Suite Addon --> <dependency> <groupId>org.dspace</groupId> <artifactId>dspace-replicate</artifactId> <version>1.0<1</version> </dependency> </dependencies>
Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your
[dspace-src]/dspace/
folder:Code Block mvn clean package
- Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
- You may wish to ensure these configurations exist in your
[dspace-src]/dspace/config/
directory. That way they will be auto-installed/copied whenever you run "ant update" (see next step).
- You may wish to ensure these configurations exist in your
You will need to update your existing DSpace 1.8.x installation, by running the following from your
[dspace-src]/dspace/target/dspace-
directory[version]
-build/Code Block ant update
Note Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:
ant update_code
(Updates the existing[dspace]/lib/
directory)ant update_webapps
(Updates the existing[dspace]/webapp/
directory)
...
Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'LocalObjectStore' plugin
Code Block # Replica store implementation class (specify one) plugin.single.org.dspace.ctask.replicate.ObjectStore = \ org.dspace.ctask.replicate.store.LocalObjectStore
Configure Local Storage Folder: Configure the location where you want all AIPs to be stored on your local filestystem. This defaults to the
[dspace]/repstore
folder. However, we recommend changing this to at least a separate hard drive from your existing DSpace installation directory! This ensures that all your content will not be lost in the case of a hard drive failure.Code Block # Location of local (e.g. local, mountable, sync) object store # ignored for non-local stores (e.g. DuraCloud) store.dir = ${dspace.dir}/repstore
Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under
store.dir
) which will be used to store AIPs, checkm manifests (if enabled), etc.Code Block # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP") group.aip.name = aip_-store # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest # based tasks are executed (org.dspace.ctask.replicate.checkm.*). group.manifest.name = manifest_-store # The storage group / folder where deletion AIPsrecords are temporarily stored/retrievedkept when an object deletion occurs # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored. # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible the ReplicationConsumer is enabled (see below). Each time an object is deleted in DSpace, # a DELETION-RECORD@[handle] file is written to this location. The deletion record is always in # BagIt format. It details basic info about the deleted object (along with any deleted child/member objects) # This deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks), # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task). group.delete.name = trashdeletions
Info title Using Subfolders Your "group.aip.name", "group.manifest.name" and "group.delete.name" settings also support subfolder paths. For example:
group.aip.name = dspace_-backup/aip_-store
group.manifest.name = dspace_-backup/manifest_-store
group.delete.name = dspace_-backup/trashdeletions
With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" folder (under
store.dir
). AIPs will all be stored under the subfolder "aip_-store/". Manifests will all be stored under the subfolder "manifest_-store/". And any deleted objects object deletion records will be temporarily stored under the subfolder "trashdeletions/".
Configuring Mountable Storage
...
Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'MountableObjectStore' plugin
Code Block # Replica store implementation class (specify one) plugin.single.org.dspace.ctask.replicate.ObjectStore = \ org.dspace.ctask.replicate.store.MountableObjectStore
Configure Mounted Folder: Configure the location where you want all AIPs to be stored. The folder should already be mounted on your local filesystem. This defaults to the
[dspace]/repstore
folder.Code Block # Location of local (e.g. local, mountable, sync) object store # ignored for non-local stores (e.g. DuraCloud) store.dir = ${dspace.dir}/repstore
Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under
store.dir
) which will be used to store AIPs, checkm manifests (if enabled), etc.Code Block # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP") group.aip.name = aip_-store # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest # based tasks are executed (org.dspace.ctask.replicate.checkm.*). group.manifest.name = manifest_-store # The storage group / folder where AIPsdeletion records are temporarily stored/retrievedkept when an object deletion occurs # and the ReplicationConsumer is enabled (see below). Each Essentially,time thisan 'delete'object groupis providesdeleted ain DSpace, # a DELETION-RECORD@[handle] file is written to this location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored. # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible. The deletion record is always in # BagIt format. It details basic info about the deleted object (along with any deleted child/member objects) # This deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks), # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task). group.delete.name = trashdeletions
Info title Using Subfolders Your "group.aip.name", "group.manifest.name" and "group.delete.name" settings also support subfolder paths. For example:
group.aip.name = dspace_-backup/aip_-store
group.manifest.name = dspace_-backup/manifest_-store
group.delete.name = dspace_-backup/trashdeletions
With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" folder (under
store.dir
). AIPs will all be stored under the subfolder "aip_-store/". Manifests will all be stored under the subfolder "manifest_-store/". And any deleted objects object deletion records will be temporarily stored under the subfolder "trashdeletions/".
Configuring DuraCloud Storage
...
Enable DuraCloud Storage Plugin: Ensure the Replication suite is setup to use the 'DuraCloudObjectStore' plugin
Code Block # Replica store implementation class (specify one) plugin.single.org.dspace.ctask.replicate.ObjectStore = \ org.dspace.ctask.replicate.store.DuraCloudObjectStore
Configure DuraCloud Primary Space to use: Your DuraCloud account allows you to separate content into various "Spaces". You'll need to create a new DuraCloud Space that your AIPs will be stored within, and configure that as your
group.aip.name
(by default it's set to a DuraCloud Space with ID of "aip_-store").Code Block # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP") group.aip.name = aip_-store
Optionally, Configure Additional DuraCloud Spaces: If you have chosen to utilize Checkm manifest validation, you will need to create and configure a DuraCloud Space corresponding to the
group.manifest.name
setting below. Additionally, if you have chosen to enable the Automatic Replication, you will need to create and configure a DuraCloud Space corresponding to thegroup.delete.name
setting below.Code Block # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest # based tasks are executed (org.dspace.ctask.replicate.checkm.*). group.manifest.name = manifest_-store # The storage group / folder where AIPsdeletion records are temporarilykept stored/retrieved when an object deletion occurs # and the ReplicationConsumer is enabled (see below). Each Essentially,time thisan 'delete'object groupis providesdeleted ain DSpace, # a DELETION-RECORD@[handle] file is written to this location. whereThe AIPsdeletion canrecord beis temporarily keptalways in # caseBagIt theformat. deletionIt needsdetails tobasic beinfo revertedabout andthe thedeleted object restored.(along with any deleted child/member objects) # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossibleThis deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks), # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task). group.delete.name = trashdeletions
Info title Using File Prefixes instead of separate DuraCloud Spaces If you'd rather keep all your DSpace files in a single DuraCloud Space, you can tweak your "group.aip.name", "group.manifest.name" and "group.delete.name" settings to specify a file-prefix to use. For example:
group.aip.name = dspace_-backup/aip_-store
group.manifest.name = dspace_-backup/manifest_-store
group.delete.name = dspace_-backup/trashdeletions
With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" Space within DuraCloud. AIPs will all be stored with a file-prefix of "aip_-store/" (e.g. "aip_-store/ITEM@123456789-2.zip"). Manifests will all be stored with a file-prefix of "manifest_-store/". And any deleted objects object deletion records will be temporarily stored with a file-prefix of "trashdeletions/". This allows you to keep all your content in a single DuraCloud Space while avoiding name conflicts between AIPs, Manifests and deleted filesdeletion records.
Automation Options (Recommended)
Performing a backup of DSpace is one thing..but ensuring that backup is always "synchronized" with your changing DSpace content is another.
...