Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleLatest Releases of Replication Task Suite

Based on the version of DSpace you are running, here are the compatible latest releases of the Replication Task Suite:

  • RTS, version 7.6 - compatible only with DSpace 7.6.0.x releases
    • Follow the directions below to install the Replication Task Suite (see Installation on DSpace 7.x).
      • After upgrading the RTS software, it is recommended to run a full backup to ensure all your AIP packages are also updated (if necessary).
    • Version 7.6 release notes
  • RTS, version 6.1 - - bug-fix release, compatible only with DSpace 6.x and higher releases
    • Upgrading: To upgrade to RTS 36.2 1 from a previous version, simply change your pom.xml (see Installation on DSpace 36.x, 4.x or 5.x) to reference 'dspace-replicate' version 36.21.  Then rebuild DSpace & re-run 'ant update'. Your existing RTS 3.x configuration files will still work with RTS 3.2.You should verify your configurations are still compatible with DSpace 6.x, as the DSpace Configuration System received an overhaul in DSpace 6
      • After upgrading the RTS software, it is recommended to run a full backup to ensure all your AIP packages are also updated (if necessary).
      6.0 Bug Fixes: refactored to work with DSpace 6.x (UUIDs instead of integers)
    • Version 6.1 release notes
  • RTS, version 5.0 - compatible only with DSpace 5.x releases
    • Upgrading: To upgrade to RTS 5.0 from a previous version, simply change your pom.xml (see Installation on DSpace 5.x) to reference 'dspace-replicate' version 5.0.  Then rebuild DSpace & re-run 'ant update'.
      • After upgrading the RTS software, it is recommended to run a full backup to ensure all your AIP packages are also updated (if necessary).
    • Version 5.0 release notes
  • RTS, version 3.2 - bug-fix release, compatible with all DSpace 3.x , and 4.x and 5.x releases
    • Upgrading: To upgrade to RTS 3.2 5 from a previous version, simply change your pom.xml (see Installation on DSpace 3.x , or 4.x or 5.x) to reference 'dspace-replicate' version 3.25.  Then rebuild DSpace & re-run 'ant update'. Your existing RTS 3.x configuration files will still work with RTS 3.25.
      • After upgrading the RTS software, it is recommended to run a full backup to ensure all your AIP packages are also updated (if necessary).
    • Version 3.2 Bug Fixes: Ensures RTS compiles for Java 7 (which is required for RTS 3.x).
    • 3.1 Bug Fixes: Fixes several small bugs (namely with the event consumer utilized during Automatic Replication).  Also ensures RTS runs on the DSpace 3.x platform.
    • 5 release notes
  • RTS, version 1.3 RTS, version 1.3 - bug-fix release, compatible with all DSpace 1.8.x releases.
    • Upgrading: To upgrade to RTS version 1.3 from a previous release, simply change your pom.xml (see Installation on DSpace 1.8.x) to reference 'dspace-replicate' version 1.3. Then rebuild DSpace & re-run 'ant update'. Your existing RTS 1.x configuration files will still work with RTS 1.3.
      • After upgrading the RTS software, it is recommended to run a full backup to ensure all your AIP packages are also updated (if necessary).
    • 1.3 Bug Fixes: This fixes a DuraCloud v2.4.0 connection error with version 1.2.
    • 1.2 Bug Fixes: This fixes a Java 6 incompatibility bug in version 1.1.  Previously version 1.1 required Java 7 when using DuraCloud.
    • 1.1 Bug Fixes: Fixes for several small bugs in 1.0 (namely with the event consumer utilized during Automatic Replication).

...

Info
titleMore Information / Screencasts

More information on the Replication Task Suite is available from the following webinars / screencasts:

...


Table of Contents
minLevel2
outlinetrue
stylenone

...

The Replication Task Suite currently supports the following versions of DSpace software:

Replication Task Suite VersionSupported DSpace Version(s)Supported Java VersionSupported InterfacesNotes
7.6
.0
DSpace
verxion
version 7.6.x
or higher
Java
7
11 or above
XMLUI and/or commandline
DSpace 7.6.x UI or command lineThe 7.6
The 6.0
stable version of the Replication Task Suite offers no new functionality over the previous versions. It is simply a refactor of the code to ensure that Replication Task Suite works with DSpace 7.6.x
and later versions - see DS-33893.2
.
6.1DSpace version
3
6.x
, 4.x or 5.x
Java
7
8 or aboveXMLUI and/or commandlineThe
3
6.
2
1 stable version of the Replication Task Suite
is nearly identical to the 1.x stable version. It just includes minor bug fixes to ensure the Replication Task Suite is
offers no new functionality over the previous versions. It is simply a refactor of the code to ensure that Replication Task Suite works with DSpace 6.x.
5.0DSpace version 5.xJava 8 or aboveXMLUI and/or commandlineThe 5.0 stable version of the Replication Task Suite offers no new functionality over the previous versions. It is simply a refactor of the code to ensure that Replication Task Suite works with DSpace 5.x.
3.5DSpace version 3.x or 4.xJava 8 or aboveXMLUI and/or commandlineThe 3.5 stable version of the Replication Task Suite is nearly identical to the 1.x stable version. It just includes minor bug fixes to ensure the Replication Task Suite is compatible with the newer DSpace APIs.
1.3DSpace version 1.8.xJava 6 or aboveXMLUI and/or commandlineHighly recommended to use either DSpace 1.8.1 or above. DSpace 1.8.0 has a known bug where running a Replication Task will always return a NullPointerException - see DS-1077

Installation instructions for each version are included below:

...

  • From the Command Line
  • From the Admin UI (XMLUI OnlyIn DSpace 7.x or XMLUI in DSpace through 6.x)
  • From Item Approval Workflow
  • From custom Java code

For more information see the Curation System details on Task Invocation.

Installation on DSpace

...

7.x

Installation in the DSpace 7.x server (backend)
  1. In your DSpace Source directory ([dspace-src]), you will need to modify the following POM file:
    • [dspace-src]/dspace/modules/additions/pom.xml (This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)

  2. For this pom.xml file, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag). NOTE: the exclusions are required to work around differences in DSpace and DuraCloud dependency versions.

    Code Block
    <dependencies>
        ...
    
    	    <!-- Adding this dependency will install the Replication Task Suite Addon -->
    	    <dependency>
       		    <groupId>org.dspace</groupId>
           		<artifactId>dspace-replicate</artifactId>
       		<version>6.0<    <version>7.6</version>
    	</dependency>
    </dependencies> 

    Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    
  3. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
    1. You may wish to ensure these configurations exist in your [dspace-src]/dspace/config/ directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).
  4. You will need to update your existing DSpace 3.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    
    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)

Installation on DSpace 3.x, 4.x or 5.x

  1.        <exclusions>
             <exclusion>
               <groupId>com.amazonaws</groupId>
               <artifactId>aws-java-sdk-core</artifactId>
             </exclusion>
          	 <exclusion>
               <groupId>com.amazonaws</groupId>
               <artifactId>aws-java-sdk-sqs</artifactId>
             </exclusion>
             <exclusion>
               <groupId>org.apache.commons</groupId>
               <artifactId>commons-compress</artifactId>
             </exclusion>
             <exclusion>
               <groupId>org.hibernate.javax.persistence</groupId>
               <artifactId>hibernate-jpa-2.1-api</artifactId>
             </exclusion>
             <exclusion>
               <groupId>org.apache.httpcomponents</groupId>
               <artifactId>httpmime</artifactId>
             </exclusion>
             <exclusion>
               <groupId>org.springframework.security</groupId>
               <artifactId>spring-security-core</artifactId>
             </exclusion>
           </exclusions>     
       </dependency>
     
    </dependencies> 


  2. Once you've finished modifying the pom.xml file, rebuild DSpace by running the following from your 

  3. In your DSpace Source directory ([dspace-src]), you will need to modify the following POM file:
    • [dspace-src]/dspace/modules/additions/pom.xml (This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)

  4. For this pom.xml file, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag).

    Code Block
    <dependencies>
        ...
    
    	<!-- Adding this dependency will install the Replication Task Suite Addon -->
    	<dependency>
       		<groupId>org.dspace</groupId>
       		<artifactId>dspace-replicate</artifactId>
       		<version>3.2</version>
    	</dependency>
    </dependencies> 

    Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    


  5. Follow the instructions in the Configuration section below in order to enable & configure

    Update the default dspace.cfg to include the Replication Task Suite

    Add-On.
    1. You may wish to ensure these configurations exist in your [dspace-src]/dspace/config/ directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).

    config files. This ensures these configs are loaded as part of your DSpace configuration. This also allows you to override the configurations in your own local.cfg file. Including the duracloud.cfg file is only required if you are planning to replicate/backup your content to DuraCloud.

    Code Block
    include = ${module_dir}/replicate.cfg
    include = ${module_dir}/replicate-mets.cfg
    include = ${module_dir}/replicate-bagit.cfg
    include = ${module_dir}/duracloud.cfg


  6. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
  7. Update your existing DSpace installation by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    


    Note

    Alternatively, if you don't want to do a full DSpace update, you can

    You will need to update your existing DSpace 3.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    
    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)


Installation

...

in the DSpace

...

7.

...

x

...

Warning
titleKnown Curation System bug in 1.8.0

DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug was resolved in DSpace 1.8.1.
Because of the above bug, we recommend running the Replication Task Suite on DSpace 1.8.1 or above.

 

...

  • [dspace-src]/dspace/pom.xml (This POM controls dependencies of CommandLine scripts. Modifying it will let you run dspace-replicate from commandline)
  • [dspace-src]/dspace/modules/xmlui/pom.xml (This POM controls dependencies of XMLUI. Modifying it will let you run dspace-replicate from XMLUI)

For each of these pom.xml files, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag).

Code Block
<dependencies>
    ...

	<!-- Adding this dependency will install the Replication Task Suite Addon -->
	<dependency>
   		<groupId>org.dspace</groupId>
   		<artifactId>dspace-replicate</artifactId>
   		<version>1.3</version>
	</dependency>
</dependencies> 

Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

Code Block
mvn clean package

...

  1. You may wish to ensure these configurations exist in your [dspace-src]/dspace/config/ directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).

You will need to update your existing DSpace 1.8.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

Code Block
ant update
Note

Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

  • ant update_code (Updates the existing [dspace]/lib/ directory)
  • ant update_webapps (Updates the existing [dspace]/webapp/ directory)

Upgrades

Upgrading the Replication Task Suite to a new version essentially involves a reinstallation of the add-on.

Follow the latest installation instructions, based on the version of DSpace you are running:

Once you have reinstalled the Replication Task Suite, you should compare your existing configurations with the latest Replication Task Suite configurations.  In most cases, your existing configurations should function perfectly, but you should review the differences just in case.

Configuration

Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.

Enabling Replication Task Suite

In order to enable the Replication Task Suite, you need to create / edit several configuration files.

  1. A copy of all configuration files utilized by the Replication Task Suite (RTS) can be found in the following locations:
    1. Configs for RTS version 1.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-1_x/config/modules
    2. Configs for RTS version 3.x : TODO: we need a maintenance branch for 3_x
    3. Configs for RTS version 6.x : https://github.com/DSpace/dspace-replicate/tree/master/config/modules
  2. Copy the following configuration files to your DSpace's [dspace]/config/modules/ directory:
    1. replicate.cfg - This file contains the base settings for the Replication Task Suite
    2. replicate-mets.cfg - This file provides a few additional replication options specific to METS-based AIPs (see below for more details)
    3. duracloud.cfg - If you'd like to replicate/backup your content to DuraCloud, this file holds your DuraCloud account information
  3. Edit your [dspace]/config/modules/curate.cfg configuration file to define & enable all tasks. The list of tasks to add to this configuration file depends on which type of AIP (METS based or BagIt based) you wish to use. Please see the AIP Format Options section below for the details of what should be added to your curate.cfg file
    1. A sample, fully enabled curate.cfg configuration file is provided alongside the other Replication Task Suite config files listed above.  This sample file is preconfigured to use METS-based AIPs.
  4. Recommended (but not required):  Edit your [dspace]/config/modules/dspace.cfg and enable the Replication Task Suite 'listener' to perform automatic synchronization of your AIP backup store with what is in DSpace (see Automation Options for more info).

Overview of Configuration Options

Before getting started, you may wish to determine the answers to the following questions:

  1. AIP Format Options: Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt AIP Format?
  2. Storage Options: Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
  3. Automation Options (Recommended): Do you want to automatically sync your AIP backup store with what is in DSpace? (this is highly recommended, but not required)
  4. Additional Options: Do you plan to use Checkm manifests for checksum auditing?
Info
titleOverview of Task Suite usage

For a higher level introduction to the Replication Task Suite, please see the Problem Statement and Usage Examples section below. It may provide you with a better idea of how you'd like to configure this task suite based on your institutional needs.

AIP Format Options

One of the first questions to ask yourself is the format you wish to utilize for your AIPs.

There are two options:

  1. DSpace AIP Format (METS-based) (default) - This is the same AIP format utilized by the DSpace AIP Backup and Restore feature, so it is 100% compatible with that DSpace feature. In fact when using this format, the Replication Task Suite just "wraps" calls to the AIP Backup and Restore feature itself.
  2. BagIt AIP Format (beta) - This is a new AIP format provided by the Replication Task Suite. It generates AIPs in the BagIt File Packaging Format. Institutions which already are familiar with BagIt or use it elsewhere may find this format preferable.  (Please note that this AIP format does not yet support all DSpace objects. See the below table for more information.)

These two AIP formats are not identical.  The below table seeks to describe some of the differences.

...

 

...

DSpace AIP Format (METS-based AIPs)

...

BagIt AIP Format

...

Supported Backup/Restore Types

...

 

...

 

...

Can Backup & Restore all DSpace Content easily

...

Yes

...

Yes

...

Can Backup & Restore a Single Community/Collection/Item easily

...

Yes

...

Yes

...

Backups can be used to move one or more Community/Collection/Items to another DSpace system easily.

...

Yes (Using the Replication Task Suite or using the command line AIP Backup and Restore tools)

...

Yes (though the Replication Task Suite add-on must be installed on both systems)

...

Supported DSpace Object Types

...

 

...

 

...

No (AIPs are only generated for objects which are completed and considered "in archive")

...

No (AIPs are only generated for objects which are completed and considered "in archive")

...

 

For more information on the tasks available based on your AIP format choice, please see the Problem Statement and Usage Examples section below. This section also provides good examples of how to use each of the tasks available to you in the Replication Task Suite.

Configuring usage of DSpace default AIP Format (METS-based)

This section goes through the steps of configuring the Replication Suite to use the default DSpace AIP format, which utilizes METS packaging.  This is the default & recommended setting.

...

Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).

Code Block
plugin.named.org.dspace.curate.CurationTask = \
    ... (YOUR EXISTING TASKS) ... , \
    org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
    org.dspace.ctask.replicate.ReadOdometer = readodometer, \
    org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
    org.dspace.ctask.replicate.TransmitSingleAIP = transmitsingleaip, \
    org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
    org.dspace.ctask.replicate.FetchAIP = fetchaip, \
    org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
    org.dspace.ctask.replicate.RemoveAIP = removeaip, \
    org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \
    org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \
    org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \
    org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \
    org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip

Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).

Code Block
ui.tasknames = \
    ... (YOUR EXISTING TASK NAMES) ... , \
    estaipsize = Estimate Storage Space for AIP(s), \
    readodometer = Read Odometer, \
    transmitaip = Transmit AIP(s) to Storage, \
    verifyaip = Verify AIP(s) exist in Storage, \
    fetchaip = Fetch AIP(s) from Storage, \
    auditaip = Audit against AIP(s), \
    removeaip = Remove AIP(s) from Storage, \
    restorefromaip = Restore Missing Object(s) from AIP(s), \
    replacewithaip = Replace Existing Object(s) with AIP(s), \
    restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\
    restoresinglefromaip = Restore Single Object from AIP, \
    replacesinglewithaip = Replace Single Object with AIP
UI
  1. In the DSpace 7.x UI, you will need to specify labels for the RTS tasks (so that descriptive names are displayed in the Curation Task list in the UI.) You can either add these directly to [dspace-angular]/src/assets/i18n/en.json5 or include them in the en.json5 file in your theme directory and execute the merge-i18n script. If your DSpace site supports languages other than English, you'll need to add these (and appropriate translations) to each language file available to users. 

    Code Block
      "curation-task.task.estaipsize.label": "Estimate Storage Space for AIP(s)",
      "curation-task.task.readodometer.label": "Read Odometer",
      "curation-task.task.transmitaip.label": "Transmit AIP(s) to Storage",
      "curation-task.task.transmitsingleaip.label": "Transmit Single Object AIP to Storage",
      "curation-task.task.verifyaip.label": "Verify AIP(s) exist in Storage",
      "curation-task.task.fetchaip.label": "Fetch AIP(s) from Storage",
      "curation-task.task.auditaip.label": "Audit against AIP(s)",
      "curation-task.task.removeaip.label": "Remove AIP(s) from Storage",
      "curation-task.task.restorefromaip.label": "Restore Missing Object(s) from AIP(s)",
      "curation-task.task.replacewithaip.label": "Replace Existing Object(s) with AIP(s)",
      "curation-task.task.restorekeepexisting.label": "Restore Missing Object(s) but Keep Existing Objects",
      "curation-task.task.restoresinglefromaip.label": "Restore Single Object from AIP",
      "curation-task.task.replacesinglewithaip.label": "Replace Single Object with AIP",


Installation on DSpace 6.x

  1. In your DSpace Source directory ([dspace-src]), you will need to modify the following POM file:
    • [dspace-src]/dspace/modules/additions/pom.xml (This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)

  2. For this pom.xml file, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag). NOTE: the exclusions are required to work around DS-3536.

    Code Block
    <dependencies>
        ...
    
        <!-- Adding this dependency will install the Replication Task Suite Addon -->
        <dependency>
            <groupId>org.dspace</groupId>
            <artifactId>dspace-replicate</artifactId>
            <version>6.1</version>
              <!-- These exclusions are currently necessary to resolve dependency mismatches with some dependencies pulled into RTS 6.0 to work with DuraCloud, see DS-3536 for details -->
              <exclusions>
                     <exclusion>
                            <groupId>org.apache.commons</groupId>
                            <artifactId>commons-lang3</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>com.amazonaws</groupId>
                            <artifactId>aws-java-sdk-core</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.apache.httpcomponents</groupId>
                            <artifactId>httpmime</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.springframework</groupId>
                            <artifactId>spring-expression</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.springframework.security</groupId>
                            <artifactId>spring-security-core</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.codehaus.jackson</groupId>
                            <artifactId>jackson-mapper-asl</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.codehaus.jackson</groupId>
                            <artifactId>jackson-core-asl</artifactId>
                     </exclusion>
              </exclusions>
        </dependency>
    
    </dependencies> 


  3. Once you've finished modifying the pom.xml file, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    


  4. Update the default dspace.cfg to include the Replication Task Suite config files. This ensures these configs are loaded as part of your DSpace configuration. This also allows you to override the configurations in your own local.cfg file. Including the duracloud.cfg file is only required if you are planning to replicate/backup your content to DuraCloud.

    Code Block
    include = ${module_dir}/replicate.cfg
    include = ${module_dir}/replicate-mets.cfg
    include = ${module_dir}/replicate-bagit.cfg
    include = ${module_dir}/duracloud.cfg
    1. You should ensure these configurations exist in your [dspace-src]/dspace/config/modules directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).
  5. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
  6. You will need to update your existing DSpace 3.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    


    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)


Installation on DSpace 5.x

  • Follow the instructions for deployment on DSpace 6.x above, substituting version 5.0 of the dspace-replicate dependency.

Installation on DSpace 3.x or 4.x

  1. In your DSpace Source directory ([dspace-src]), you will need to modify the following POM file:
    • [dspace-src]/dspace/modules/additions/pom.xml (This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)

  2. For this pom.xml file, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag).

    Code Block
    <dependencies>
        ...
    
    	<!-- Adding this dependency will install the Replication Task Suite Addon -->
    	<dependency>
       		<groupId>org.dspace</groupId>
       		<artifactId>dspace-replicate</artifactId>
       		<version>3.4</version>
    	</dependency>
    </dependencies> 


  3. Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    


  4. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
    1. You may wish to ensure these configurations exist in your [dspace-src]/dspace/config/ directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).
  5. You will need to update your existing DSpace 3.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    


    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)


Installation on DSpace 1.8.x

Warning
titleKnown Curation System bug in 1.8.0

DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug was resolved in DSpace 1.8.1.
Because of the above bug, we recommend running the Replication Task Suite on DSpace 1.8.1 or above.


  1. In your DSpace Source directory ([dspace-src]), you will modify two Maven pom.xml files:
    • [dspace-src]/dspace/pom.xml (This POM controls dependencies of CommandLine scripts. Modifying it will let you run dspace-replicate from commandline)
    • [dspace-src]/dspace/modules/xmlui/pom.xml (This POM controls dependencies of XMLUI. Modifying it will let you run dspace-replicate from XMLUI)

  2. For each of these pom.xml files, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag).

    Code Block
    <dependencies>
        ...
    
    	<!-- Adding this dependency will install the Replication Task Suite Addon -->
    	<dependency>
       		<groupId>org.dspace</groupId>
       		<artifactId>dspace-replicate</artifactId>
       		<version>1.3</version>
    	</dependency>
    </dependencies> 


  3. Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    


  4. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
    1. You may wish to ensure these configurations exist in your [dspace-src]/dspace/config/ directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).
  5. You will need to update your existing DSpace 1.8.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    


    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)


Upgrades

Upgrading the Replication Task Suite to a new version essentially involves a reinstallation of the add-on.

Follow the latest installation instructions, based on the version of DSpace you are running:

Once you have reinstalled the Replication Task Suite, you should compare your existing configurations with the latest Replication Task Suite configurations.  In most cases, your existing configurations should function perfectly, but you should review the differences just in case.

Configuration

Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.

Enabling Replication Task Suite

In order to enable the Replication Task Suite, you need to create / edit several configuration files.

  1. A copy of all configuration files utilized by the Replication Task Suite (RTS) can be found in the following locations:
    1. Configs for RTS version 1.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-1_x/config/modules
    2. Configs for RTS version 3.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-3_x/config/modules
    3. Configs for RTS version 5.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-5_x/config/modules
    4. Configs for RTS version 6.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-6_x/config/modules
    5. Configs for RTS version 7.x : https://github.com/DSpace/dspace-replicate/tree/master/config/modules
  2. Copy the following configuration files to your DSpace's [dspace]/config/modules/ directory:
    1. replicate.cfg - This file contains the base settings for the Replication Task Suite
    2. replicate-mets.cfg - This file provides a few additional replication options specific to METS-based AIPs (see below for more details)
    3. replicate-bagit.cfg - This file provides additional configuration for BagIt AIPs (see below for more details)
    4. duracloud.cfg - If you'd like to replicate/backup your content to DuraCloud, this file holds your DuraCloud account information
  3. Edit your [dspace]/config/modules/curate.cfg configuration file to define & enable all tasks. The list of tasks to add to this configuration file depends on which type of AIP (METS based or BagIt based) you wish to use. Please see the AIP Format Options section below for the details of what should be added to your curate.cfg file
    1. A sample, fully enabled curate.cfg configuration file is provided alongside the other Replication Task Suite config files listed above.  This sample file is preconfigured to use METS-based AIPs.
  4. Recommended (but not required):  Edit your [dspace]/config/modules/dspace.cfg and enable the Replication Task Suite 'listener' to perform automatic synchronization of your AIP backup store with what is in DSpace (see Automation Options for more info).

Overview of Configuration Options

Before getting started, you may wish to determine the answers to the following questions:

  1. AIP Format Options: Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt AIP Format?
  2. Storage Options: Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
  3. Automation Options (Recommended): Do you want to automatically sync your AIP backup store with what is in DSpace? (this is highly recommended, but not required)
  4. Additional Options: Do you plan to use Checkm manifests for checksum auditing?
Info
titleOverview of Task Suite usage

For a higher level introduction to the Replication Task Suite, please see the Problem Statement and Usage Examples section below. It may provide you with a better idea of how you'd like to configure this task suite based on your institutional needs.

AIP Format Options

One of the first questions to ask yourself is the format you wish to utilize for your AIPs.

There are two options:

  1. DSpace AIP Format (METS-based) (default) - This is the same AIP format utilized by the DSpace AIP Backup and Restore feature, so it is 100% compatible with that DSpace feature. In fact when using this format, the Replication Task Suite just "wraps" calls to the AIP Backup and Restore feature itself.
  2. BagIt AIP Format (beta) - This is a new AIP format provided by the Replication Task Suite. It generates AIPs in the BagIt File Packaging Format. Institutions which already are familiar with BagIt or use it elsewhere may find this format preferable.  (Please note that this AIP format does not yet support all DSpace objects. See the below table for more information.)

These two AIP formats are not identical.  The below table seeks to describe some of the differences.


DSpace AIP Format (METS-based AIPs)

BagIt AIP Format

Supported Backup/Restore Types



Can Backup & Restore all DSpace Content easily

Yes

Yes

Can Backup & Restore a Single Community/Collection/Item easily

Yes

Yes

Backups can be used to move one or more Community/Collection/Items to another DSpace system easily.

Yes (Using the Replication Task Suite or using the command line AIP Backup and Restore tools)

Yes (though the Replication Task Suite add-on must be installed on both systems)

Can Backup & Restore Item Versions (added in DSpace 3.x)No (Item Versioning not yet compatible with AIP format. Only the most recent version of an Item is described in the AIP.)No (Item Versioning not yet compatible with AIP format. Only the most recent version of an Item is described in the AIP.)

Supported DSpace Object Types



Supports backup/restore of all Communities/Collections/Items (including metadata, files, logos, etc.)YesYes
Supports backup/restore of all People/Groups/PermissionsYesYes
Supports backup/restore of all Collection-specific Item TemplatesYesNo (Not yet supported)
Supports backup/restore of all Collection Harvesting settings (only for Collections which pull in all Items via OAI-PMH or OAI-ORE)No (The harvest settings are not preserved, but previously harvested items are preserved in their own AIPs)No (The harvest settings are not preserved, but previously harvested items are preserved in their own AIPs)
Supports backup/restore of all Withdrawn (but not deleted) ItemsYesYes
Supports backup/restore of Item Mappings between CollectionsYesYes
Supports backup/restore of all in-process, uncompleted Submissions (or those currently in an approval workflow)

No (AIPs are only generated for objects which are completed and considered "in archive")

No (AIPs are only generated for objects which are completed and considered "in archive")

Supports backup/restore of Items using custom Metadata Schemas & FieldsYesYes
Supports backup/restore of all local DSpace Configurations and CustomizationsNo (You are expected to backup your DSpace configurations and customizations separately. AIPs only backup content held within DSpace.)No (You are expected to backup your DSpace configurations and customizations separately. AIPs only backup content held within DSpace.)


For more information on the tasks available based on your AIP format choice, please see the Problem Statement and Usage Examples section below. This section also provides good examples of how to use each of the tasks available to you in the Replication Task Suite.

Configuring usage of DSpace default AIP Format (METS-based)

This section goes through the steps of configuring the Replication Suite to use the default DSpace AIP format, which utilizes METS packaging.  This is the default & recommended setting.

  1. General Curation Configuration: First, in your [dspace]/config/modules/curate.cfg you will want to enable & configure the METS-based replication tasks. (NOTE: there is a sample curate.cfg file provided in https://github.com/DSpace/dspace-replicate/tree/master/config/modules which is pre-configured to use METS-based AIPs).
    • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
          org.dspace.ctask.replicate.ReadOdometer = readodometer, \
          org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
          org.dspace.ctask.replicate.TransmitSingleAIP = transmitsingleaip, \
          org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
          org.dspace.ctask.replicate.FetchAIP = fetchaip, \
          org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
          org.dspace.ctask.replicate.RemoveAIP = removeaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
      


    • (Only for RTS versions prior to 7.0) Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          estaipsize = Estimate Storage Space for AIP(s), \
          readodometer = Read Odometer, \
          transmitaip = Transmit AIP(s) to Storage, \
          verifyaip = Verify AIP(s) exist in Storage, \
          fetchaip = Fetch AIP(s) from Storage, \
          auditaip = Audit against AIP(s), \
          removeaip = Remove AIP(s) from Storage, \
          restorefromaip = Restore Missing Object(s) from AIP(s), \
          replacewithaip = Replace Existing Object(s) with AIP(s), \
          restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\
          restoresinglefromaip = Restore Single Object from AIP, \
          replacesinglewithaip = Replace Single Object with AIP
      


    • (Only for RTS versions prior to 7.0) Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.* settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.

      Code Block
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks, \
         replicate = Replication Suite Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip
      


  2. Replication Suite Configuration: Next, in your [dspace]/config/modules/replicate.cfg you will want to ensure it is setup to properly use METS-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:

    Code Block
    # Package type. Permitted values: 'mets', 'bagit'
    # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
    # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
    packer.pkgtype = mets
    
    # Format of package compression. Permitted values: 'zip' or 'tgz'
    # for 'mets' packages, only 'zip' is supported
    packer.archfmt = zip
    
    # Whether or not the name packages with a DSpace type prefix.
    # When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip)
    # When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip)
    # Defaults to 'true'. For 'mets' packages, this must be 'true'.
    packer.typeprefix = true
    


  3. Optionally tweak the AIP Restore/Replace settings: Optionally, you can decide to tweak the way AIPs are restored or replaced (using AIP Backup and Restore options). These settings normally should not need to be tweaked, but are available in the [dspace]/config/modules/replicate-mets.cfg configuration file. See that configuration file for more details.

Configuring usage of DSpace BagIt AIP Format

This section goes through the steps of configuring the Replication Suite to use BagIt-based AIPs. The Replication Suite uses the BagIt Profiles specification in order to provide additional guarantees about the BagIt AIPs which are exported and ingested. The following profiles are supported:

If no BagIt Profile is specified the beyondtherepository profile will be used by default. For more information on the BagIt packaging format, see: https://wiki.ucop.edu/display/Curation/BagIt; the BagIt Profiles implementation used is DuraSpace's bagit-support.

  1. General Curation Configuration: First, in your [dspace]/config/modules/curate.cfg you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a sample curate.cfg file provided in https://github.com/DSpace/dspace-replicate/tree/master/config/modules which provides example settings, though they are all commented out by default).
    • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
          org.dspace.ctask.replicate.ReadOdometer = readodometer, \
          org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
          org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
          org.dspace.ctask.replicate.FetchAIP = fetchaip, \
          org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
          org.dspace.ctask.replicate.RemoveAIP = removeaip, \
          org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \
          org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
      


    • (Only for RTS versions prior to 7.0) Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          estaipsize = Estimate Storage Space for AIP(s), \
          readodometer = Read Odometer, \
          transmitaip = Transmit AIP(s) to Storage, \
          verifyaip = Verify AIP(s) exist in Storage, \
          fetchaip = Fetch AIP(s) from Storage, \
          auditaip = Audit/Compare against AIP(s), \
          removeaip = Remove AIP(s) from Storage, \
          restorefromaip = Restore Missing Object(s) from AIP(s), \
          replacewithaip = Replace Existing Object(s) with AIP(s)
      


    • (Only for RTS versions prior to 7.0) Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.*settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.

      Code Block
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks, \
         replicate = Replication Suite Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
      


  2. Replication Suite Configuration: Next, in your [dspace]/config/modules/replicate.cfg you will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:

    Code Block
    # Package type. Permitted values: 'mets', 'bagit'
    # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
    # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
    packer.pkgtype = bagit
    


  3. BagIt Configuration: Finally, in [dspace]/config/modules/replicate-bagit.cfg, you will need to configure settings for the BagIt tasks:

    • Configure the BagIt Profile: Set the BagIt Profile which will be used

      Code Block
      # The Bag Profile setting allows you to select a BagProfile which the RTS
      # will create and read bags for. The RTS will check the conformance of a
      # bag to a profile as part of both the packaging and restoration processes.
      #    
      # See: https://github.com/duraspace/bagit-support/ for more information
      #                          
      # Available Options: aptrust, beyondtherepository
      # Default: beyondtherepository
                                    
      replicate-bagit.profile = beyondtherepository


    • Configure the Bag Metadata: Under the replicate-bagit.tag, set appropriate values for additional bag metadata to be packaged with your DSpace AIPs. Each configuration property of this section follows the format of replicate-bagit.tag.tag-filename.metadata-key: metadata-value. See section 2.2.2 of the BagIt specification for more information on bag metadata.
      Note: depending on the BagIt Profile specified there will be different required fields for the bag metadata files, so it is important to know what profile you're working with.

      Code Block
      #### BagIt Bag Metadata Settings ####
                   
      # These settings allow you to customize the bag-info.txt which
      # is written by the BagIt packaging tools. By default no fields
      # are used which will produce Bags which do not conform to any
      # BagProfiles.
      
      replicate-bagit.tag.bag-info.source-organization = dspace
      replicate-bagit.tag.bag-info.organization-address = localhost

Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.* settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.

Code Block
# Tasks may be organized into named groups which display together in UI drop-downs
ui.taskgroups = \
   general = General Purpose Tasks, \
   replicate = Replication Suite Tasks

# Group membership is defined using comma-separated lists of task names, one property per group
ui.taskgroup.general = profileformats, requiredmetadata, checklinks
ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip

Replication Suite Configuration: Next, in your [dspace]/config/modules/replicate.cfg you will want to ensure it is setup to properly use METS-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:

Code Block
# Package type. Permitted values: 'mets', 'bagit'
# mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
# bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
packer.pkgtype = mets

# Format of package compression. Permitted values: 'zip' or 'tgz'
# for 'mets' packages, only 'zip' is supported
packer.archfmt = zip

# Whether or not the name packages with a DSpace type prefix.
# When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip)
# When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip)
# Defaults to 'true'. For 'mets' packages, this must be 'true'.
packer.typeprefix = true

...

Configuring usage of DSpace BagIt AIP Format

This section goes through the steps of configuring the Replication Suite to use BagIt-based AIPs. For more information on the BagIt packaging format, see: https://wiki.ucop.edu/display/Curation/BagIt

  1. General Curation Configuration: First, in your [dspace]/config/modules/curate.cfg you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a sample curate.cfg file provided in https://github.com/DSpace/dspace-replicate/tree/master/config/modules which provides example settings, though they are all commented out by default).

    Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
    REMEMBER to add a comma and backslash (", \") after each line (except the final line).

    Code Block
    plugin.named.org.dspace.curate.CurationTask = \
        ... (YOUR EXISTING TASKS) ... , \
        org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
        org.dspace.ctask.replicate.ReadOdometer = readodometer, \
        org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
        org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
        org.dspace.ctask.replicate.FetchAIP = fetchaip, \
        org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
        org.dspace.ctask.replicate.RemoveAIP = removeaip, \
        org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \
        org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
    

    Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
    REMEMBER to add a comma and backslash (", \") after each line (except the final line).

    Code Block
    ui.tasknames = \
        ... (YOUR EXISTING TASK NAMES) ... , \
        estaipsize = Estimate Storage Space for AIP(s), \
        readodometer = Read Odometer, \
        transmitaip = Transmit AIP(s) to Storage, \
        verifyaip = Verify AIP(s) exist in Storage, \
        fetchaip = Fetch AIP(s) from Storage, \
        auditaip = Audit/Compare against AIP(s), \
        removeaip = Remove AIP(s) from Storage, \
        restorefromaip = Restore Missing Object(s) from AIP(s), \
        replacewithaip = Replace Existing Object(s) with AIP(s)
    

    Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.*settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.

    Code Block
    # Tasks may be organized into named groups which display together in UI drop-downs
    ui.taskgroups = \
       general = General Purpose Tasks, \
       replicate = Replication Suite Tasks
    
    # Group membership is defined using comma-separated lists of task names, one property per group
    ui.taskgroup.general = profileformats, requiredmetadata, checklinks
    ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
    

    Replication Suite Configuration: Next, in your [dspace]/config/modules/replicate.cfg you will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:

    Code Block# Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtype = bagit

Storage Options

Where your AIPs will be stored is the next decision to make. There are three options currently available:

...

  • METS-based AIP Replicate Consumer: This consumer will listen for changes to any DSpace Communities, Collections, Items, Groups, or EPeople.  It should be utilized if you have chosen to use METS-based AIPs. See AIP Format Options above for more details.

    Code Block
    #### Event System Configuration ####
    
    # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer)
    event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate
    
    ....
    
    # Configure consumer to manage METS AIP content replication
    event.consumer.replicate.class = org.dspace.ctask.replicate.METSReplicateConsumer
    event.consumer.replicate.filters = Community|Collection|Item|Group|EPerson+All
    
     


    • In human terms, this configuration essentially means: listen for all changes to Communities, Collections, Items, Groups and EPeople. If a change is detected, run the "METSReplicateConsumer" (which adds that object to the queue).
  • BagIt-based AIP Consumer : This consumer will ONLY listen for changes to DSpace Communities, Collections and Items as those are the only types of objects which are stored in BagIt-based AIPs. See AIP Format Options above for more details

    Code Block
    #### Event System Configuration ####
    
    # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer)
    event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate
    
    ....
    
    # Configure consumer to manage BagIt AIP content replication
    event.consumer.replicate.class = org.dspace.ctask.replicate.BagItReplicateConsumer
    event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete
    
     


    • In human terms, this configuration essentially means: listen for any new, modified or deleted Items, Collections and Communities. If you do not care about Community or Collection AIPs, just remove 'Community' or 'Collection' from the list. When one of the specified changes is detected, run the "BagItReplicateConsumer" (which adds that object to the queue).

...

Note
titleYou still may wish to perform an occasional full site audit/backup

Even if you are processing the "sync queue" on a daily or weekly basis, you still may want to perform a full site-wide audit and/or backup on a less frequent basis.  For example, if you are processing the sync queue on a daily basis, you might want to perform a weekly or monthly site audit/backup.  Although this full site audit/backup is not required, it helps to ensure that all of your AIPs are simultaneously update-to-date at a given point in time.  It's worth noting that only AIPs that have changed (i.e. have a different checksum) will be transferred to your backup location. So, if all AIPs are already up-to-date in your backup location, no AIPs would even be transferred.

More information on performing such an "audit" or full-site backup (including cron job examples) can be found in the section on Scheduled Site Auditing / Replication

 


Enhancing the Performance of the Queue Processing (optional)

...

We can suppose our data curator has identified a collection of items in her DSpace repository consisting of high-value, born-digital, and unique/irreplaceable (not held elsewhere) content (called the 'Amazing Images' collection). She prudently wishes to insure against catastrophic local loss of this content by keeping a copy or replica of this collection elsewhere (e.g. either on a backup drive, or even in the cloud via a service like DuraCloud). She'd prefer to replicate all her DSpace content, but realizes that storage costs over long periods has made her administration wary, so decides to begin with this collection.

First Steps - Estimation

Replication Task Used:

Estimate Storage Space for AIP(s)

Task ID: estaipsize

In order to budget for replication storage, she needs to know the 'size' of the collection. When she asks her sysadmin, he replies that it is easy to give her figures for the whole DSpace asset store, but since collections aren't stored separately, she would have to add up each item's bitstreams in the collection, a rather tedious process. Thus the first task: a reporting tool which operates on natural DSpace objects, rather than storage volumes. The "Estimate Storage Space for AIP(s)" (estaipsize) task will give her this ability.

...

We should warn that the estimates from this task are rather crude, in that they do not measure the actual size of all AIPs. Rather they just total up the bitstream (file) sizes (and do not include metadata files). However, even this crude estimate should provide a decent idea of overall storage needs.

Replicating

Replication Task Used:

Transmit AIP(s) to Storage

Task ID: transmitaip

Having secured approval to replicate 'Amazing Images' collection, our curator obviously needs a task to generate the AIP representations of each item in the collection, and transmit these archive files to the replication storage site (which may be service-backed, local, in the cloud, etc, as will be explored below).  This task is the "Transmit AIP(s) to Storage" (transmitaip) task.

...

Our data curator may elect to perform this task in the DSpace Admin UI, or, if the collection is rather large, she may instead 'queue' the task for later execution by using the queueing facility available in the curation system. We should note that the 'transmitaip' task, like all other replication tasks, operates on whatever DSpace object(s) they are given. Thus, if the object is a collection, the task creates (and transmits, of course) an AIP for the collection object itself (metadata and logo), as well as AIPs for each item in the collection. If the task is given an identifier for a single Item, then only one AIP will be created and transmitted.

Verifying Replication

Replication Task Used:

Verify AIP(s) exist in Storage

Task ID: verifyaip

While the 'transmitaip' task will report on whether or not it was successful in generating and transmitting AIP(s) to the replication service, our data curator wants the ability (within DSpace) to check whenever she likes that the AIP(s) which were transmitted are still there. A simple task "Verify AIP(s) exist in Storage" (verifyaip) can perform this function.

Ensuring Replica Integrity and Accuracy over time

Replication Task Used:

Audit against AIP(s)

Task ID: auditaip

The 'Amazing Images' collection is comparatively static, meaning that few new items are likely to be added, and most of the metadata in each item is not routinely changed. However, over longer periods of time, cataloging errors are discovered and corrected, perhaps formats become obsolete and new bitstreams are added. If the curator is fastidious about each change, and performs the 'transmitaip' task on each item that has changed, then in general the set of AIP replicas will always be 'in sync' with the repository. However, it useful to have the means to ensure that the replicas agree with the repository without having to create and transmit entirely new ones. Thus the task: "Audit against AIP(s)" (auditaip), which can also be thought of as a simple, quick auditing task. When performed on an Item, the task does the following:

...

A set of replication tasks perform these functions, as described below.

Restoring Object(s)

Replication Tasks Used:

Restore Missing Object(s) from AIP(s)

Task ID: restorefromaip

 

Restore Missing Object(s) but Keep Existing Objects (*METS-AIP Only)

Task ID: restorekeepexisting

 

Restore Single Object from AIP (*METS-AIP Only)

Task ID: restoresinglefromaip

If the curator should ever find the need to restore a deleted object, a variety of restoration based tasks are available.  The base task is the "Restore Missing Object(s) from AIP(s)" (restorefromaip) task.

...

  • Restore Single Object from AIP (restoresinglefromaip)
    • This task acts the same as the default "restorefromaip" task, but it does NOT restore any child objects. So, if it is run on a collection, just the collection itself will be restored (items in that collection will not be restored).
  • Restore Missing Object(s) but Keep Existing Objects (restorekeepexisting)
    • This task acts similar to the default "restorefromaip" task, but it attempts to skip over any objects which already exist in the repository. In other words, an error is not thrown if an object already exists – rather that entire object (and all its child objects) are skipped over during processing and left unchanged. This mode is identical to the "Keep Existing" mode of the DSpace AIP Backup and Restore tool.

Replacing Object(s)

Replication Tasks Used:

Replace Existing Object(s) with AIP(s)

Task ID: replacewithaip

 


Replace Single Object with AIP (*METS-AIP Only)

Task ID: replacesinglewithaip

If the curator should ever find a need to replace a corrupted object or revert an existing object back to the version in remote storage, a variety of replacement tasks are available.  The base task is the "Replace Existing Object(s) with AIP(s)" (replacewithaip) task.

...

  • Replace Single Object from AIP (replacesinglewithaip)
    • This task acts the same as the default "replacewithaip" task, but it does NOT replace any child objects. So, if it is run on a collection, just the collection metadata will be replaced (items existing in that collection will not be replaced).

Cleanup

Replication Task Used:

Remove AIP(s) from Storage

Task ID: removeaip

Ordinarily, a replication arrangement is long standing: the preservation function cannot be fulfilled unless the replicas (here, the AIPs) are always kept and available. However, some collections (or items within them) may be removed for a variety of reasons: legal challenge, de-accession, etc. When the repository no longer locally wants to hold the object, the replica AIP ceases to have value. The task 'Remove AIP(s) from Storage' (removeaip) will permanently delete the replica store AIP for its identifier. As will other replication tasks, if the identifier points to collection or community, all the AIPs of all the members will also be permanently deleted.

Keeping Score

Replication Task Used:

Read Odometer

Task ID: readodometer

Many storage providers have cost structures that are more complex than simple functions of the total stored bytes: particularly cloud providers have costs associated wth the use of the network to upload and download the stored object. An object that occupies 2 megaBytes might cost far more over time than a 1 gigaByte object, if the former is downloaded 1000 times for every time the latter is. The replication system provides a very rudimentary task to help manage and track these factors: 'Read Odometer' (readodometer). This task simply displays the readings from the replication system that records cumulative use. The statistics are:

...