Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This Add-On also integrates DSpace with DuraCloud for users that wish to easily back up their content into DuraCloud directly from their DSpace administrative interface.

Info
title2013-May-3: New Release of Replication Task Suite

On May 3, 2013, two new versions of the Replication Task Suite (RTS) were released:

  • RTS, version 3.0 - compatible with DSpace 3.x releases (this release is nearly identical to the below 1.1 release, except for minor DSpace 3.x compatibility fixes)
  • RTS, version 1.1 - bug-fix release, compatible with DSpace 1.8.x releases.  This fixes several small bugs (namely with the event consumer utilized during Automatic Replication).  The exact fixes between 1.0 and 1.1 are detailed in Pull Request #7.
    • To upgrade to RTS 1.1 from 1.0, simply change your pom.xml (see Installation on DSpace 1.8.x) to reference 'dspace-replicate' version 1.1.  Then rebuild DSpace & re-run 'ant update'.   Your existing RTS configuration files will still work with RTS 1.1.
Note
titleUsage Examples

For a quick overview of the various tasks offered in the Replication Task Suite, along with some real-life scenarios / examples of where each Replication task may come in handy, you may wish to skip directly to the Problem Statement and Usage Examples section at the bottom of this page.

...

Replication Task Suite VersionSupported DSpace Version(s)Supported InterfacesNotes
1.01DSpace version 1.8.xXMLUI and/or commandlineHighly recommended to use either DSpace 1.8.1 or 1.8.2. DSpace 1.8.0 has a known bug where running a Replication Task will always return a NullPointerException - see DS-1077
3.0DSpace version 3.xXMLUI and/or commandlineThe " 3.0 " stable version of the Replication Task Suite is nearly identical to the " 1.1 " stable version. It just includes minor bug fixes to ensure the Replication Task Suite is compatible with the new DSpace 3.x API.

...

  1. In your DSpace Source directory ([dspace-src]), you will modify two Maven pom.xml files:
    • [dspace-src]/dspace/pom.xml (This POM controls dependencies of CommandLine scripts. Modifying it will let you run dspace-replicate from commandline)
    • [dspace-src]/dspace/modules/xmlui/pom.xml (This POM controls dependencies of XMLUI. Modifying it will let you run dspace-replicate from XMLUI)

  2. For each of these pom.xml files, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag).

    Code Block
    <dependencies>
        ...
    
    	<!-- Adding this dependency will install the Replication Task Suite Addon -->
    	<dependency>
       		<groupId>org.dspace</groupId>
       		<artifactId>dspace-replicate</artifactId>
       		<version>1.0<1</version>
    	</dependency>
    </dependencies> 
  3. Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    
  4. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
    1. You may wish to ensure these configurations exist in your [dspace-src]/dspace/config/ directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).
  5. You will need to update your existing DSpace 1.8.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    
    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)

...

  1. Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'LocalObjectStore' plugin

    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.LocalObjectStore
    
  2. Configure Local Storage Folder: Configure the location where you want all AIPs to be stored on your local filestystem. This defaults to the [dspace]/repstore folder. However, we recommend changing this to at least a separate hard drive from your existing DSpace installation directory! This ensures that all your content will not be lost in the case of a hard drive failure.

    Code Block
    # Location of local (e.g. local, mountable, sync) object store
    # ignored for non-local stores (e.g. DuraCloud)
    store.dir = ${dspace.dir}/repstore
    
  3. Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under store.dir) which will be used to store AIPs, checkm manifests (if enabled), etc.

    Code Block
    # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP")
    group.aip.name = aip_-store
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest 
    # based tasks are executed (org.dspace.ctask.replicate.checkm.*).
    group.manifest.name = manifest_-store
    
    # The storage group / folder where deletion AIPsrecords are temporarily stored/retrievedkept when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a 
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible the ReplicationConsumer is enabled (see below). Each time an object is deleted in DSpace,
    # a DELETION-RECORD@[handle] file is written to this location. The deletion record is always in
    # BagIt format. It details basic info about the deleted object (along with any deleted child/member objects)
    # This deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks),
    # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task).
    group.delete.name = trashdeletions
    
    Info
    titleUsing Subfolders

    Your "group.aip.name", "group.manifest.name" and "group.delete.name" settings also support subfolder paths.  For example:

    group.aip.name = dspace_-backup/aip_-store

    group.manifest.name = dspace_-backup/manifest_-store

    group.delete.name = dspace_-backup/trashdeletions

    With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" folder (under store.dir).  AIPs will all be stored under the subfolder "aip_-store/".  Manifests will all be stored under the subfolder "manifest_-store/".  And any deleted objects object deletion records will be temporarily stored under the subfolder "trashdeletions/".

Configuring Mountable Storage

...

  1. Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'MountableObjectStore' plugin

    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.MountableObjectStore
    
  2. Configure Mounted Folder: Configure the location where you want all AIPs to be stored. The folder should already be mounted on your local filesystem. This defaults to the [dspace]/repstorefolder.

    Code Block
    # Location of local (e.g. local, mountable, sync) object store
    # ignored for non-local stores (e.g. DuraCloud)
    store.dir = ${dspace.dir}/repstore
    
  3. Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under store.dir) which will be used to store AIPs, checkm manifests (if enabled), etc.

    Code Block
    # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP")
    group.aip.name = aip_-store
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest 
    # based tasks are executed (org.dspace.ctask.replicate.checkm.*).
    group.manifest.name = manifest_-store
    
    # The storage group / folder where AIPsdeletion records are temporarily stored/retrievedkept when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Each Essentially,time thisan 'delete'object groupis providesdeleted ain DSpace,
    # a DELETION-RECORD@[handle] file is written to this location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible. The deletion record is always in
    # BagIt format. It details basic info about the deleted object (along with any deleted child/member objects)
    # This deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks),
    # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task).
    group.delete.name = trashdeletions
    
    Info
    titleUsing Subfolders

    Your "group.aip.name", "group.manifest.name" and "group.delete.name" settings also support subfolder paths.  For example:

    group.aip.name = dspace_-backup/aip_-store

    group.manifest.name = dspace_-backup/manifest_-store

    group.delete.name = dspace_-backup/trashdeletions

    With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" folder (under store.dir).  AIPs will all be stored under the subfolder "aip_-store/".  Manifests will all be stored under the subfolder "manifest_-store/".  And any deleted objects object deletion records will be temporarily stored under the subfolder "trashdeletions/".

Configuring DuraCloud Storage

...

  1. Enable DuraCloud Storage Plugin: Ensure the Replication suite is setup to use the 'DuraCloudObjectStore' plugin

    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.DuraCloudObjectStore
    
  2. Configure DuraCloud Primary Space to use: Your DuraCloud account allows you to separate content into various "Spaces". You'll need to create a new DuraCloud Space that your AIPs will be stored within, and configure that as your group.aip.name (by default it's set to a DuraCloud Space with ID of "aip_-store").

    Code Block
    # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP")
    group.aip.name = aip_-store
    
  3. Optionally, Configure Additional DuraCloud Spaces: If you have chosen to utilize Checkm manifest validation, you will need to create and configure a DuraCloud Space corresponding to the group.manifest.name setting below. Additionally, if you have chosen to enable the Automatic Replication, you will need to create and configure a DuraCloud Space corresponding to the group.delete.name setting below.

    Code Block
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest 
    # based tasks are executed (org.dspace.ctask.replicate.checkm.*).
    group.manifest.name = manifest_-store
    
    # The storage group / folder where AIPsdeletion records are temporarilykept stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Each Essentially,time thisan 'delete'object groupis providesdeleted ain DSpace,
    # a DELETION-RECORD@[handle] file is written to this location. whereThe AIPsdeletion canrecord beis temporarily keptalways in
    # caseBagIt theformat. deletionIt needsdetails tobasic beinfo revertedabout andthe thedeleted object restored.(along with any deleted child/member objects)
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossibleThis deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks),
    # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task).
    group.delete.name = trashdeletions
    
    Info
    titleUsing File Prefixes instead of separate DuraCloud Spaces

    If you'd rather keep all your DSpace files in a single DuraCloud Space, you can tweak your "group.aip.name", "group.manifest.name" and "group.delete.name" settings to specify a file-prefix to use.  For example:

    group.aip.name = dspace_-backup/aip_-store

    group.manifest.name = dspace_-backup/manifest_-store

    group.delete.name = dspace_-backup/trashdeletions

    With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" Space within DuraCloud.  AIPs will all be stored with a file-prefix of "aip_-store/" (e.g. "aip_-store/ITEM@123456789-2.zip").  Manifests will all be stored with a file-prefix of "manifest_-store/".  And any deleted objects object deletion records will be temporarily stored with a file-prefix of "trashdeletions/".   This allows you to keep all your content in a single DuraCloud Space while avoiding name conflicts between AIPs, Manifests and deleted filesdeletion records.

Automation Options (Recommended)

Performing a backup of DSpace is one thing..but ensuring that backup is always "synchronized" with your changing DSpace content is another.

...