Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Reverted from v. 120

...

This Add-On also integrates DSpace with DuraCloud for users that wish to easily back up their content into DuraCloud directly from their DSpace administrative interface.

For a quick overview of the various tasks offered in the , along with some real-life scenarios / examples of where each Replication task may come in handy, you may wish to skip directly to the Problem Statement and Usage Examples section at the bottom of this page.
Noteinfo
titleUsage Examples
2013-May-3: New Release of Replication Task Suite
Info
titleMore Information / Screencasts

More information on the Replication Task Suite is available from the following webinars / screencasts:

 

Table of Contents
minLevel2
outlinetrue
stylenone

Installation

Supported DSpace Versions

The Replication Task Suite currently supports the following versions of DSpace software:

Replication Task Suite VersionSupported DSpace Version(s)Supported InterfacesNotes
1.0DSpace version 1.8.xXMLUI and/or commandlineHighly recommended to use either DSpace 1.8.1 or 1.8.2. DSpace 1.8.0 has a known bug where running a Replication Task will always return a NullPointerException - see DS-1077
3.0-SNAPSHOTDSpace version 3.xXMLUI and/or commandlineThe "3.0-SNAPSHOT" version of the Replication Task Suite is nearly identical to the "1.0" stable version. It just includes minor bug fixes to ensure the Replication Task Suite is compatible with the new DSpace 3.x API.

Installation instructions for each version are included below:

User Interface Compatibility Notes

As the Replication Suite is just a suite of Curation System tasks, it may be called (like any Curation Tasks) from the following locations:

  • From the Command Line
  • From the Admin UI (XMLUI Only)
  • From Item Approval Workflow
  • From custom Java code

For more information see the Curation System details on Task Invocation.

Installation on DSpace 1.8.x

Warning
titleKnown Curation System bug in 1.8.0

DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug was resolved in DSpace 1.8.1.
Because of the above bug, we recommend running the Replication Task Suite on DSpace 1.8.1 or above.

 

On May 3, 2013, two new versions of the Replication Task Suite (RTS) were released:

  • RTS, version 3.1- compatible with DSpace 3.x releases (this release is nearly identical to the below 1.1 release, except for minor DSpace 3.x compatibility fixes)
    • To upgrade to RTS 3.1 from 3.0-SNAPSHOT, simply change your pom.xml (see Installation on DSpace 3.x) to reference 'dspace-replicate' version 3.1.  Then rebuild DSpace & re-run 'ant update'. Your existing RTS configuration files will still work with RTS 3.1.
  • RTS, version 1.1 - bug-fix release, compatible with DSpace 1.8.x releases.  This fixes several small bugs (namely with the event consumer utilized during Automatic Replication).  The exact fixes between 1.0 and 1.1 are detailed in Pull Request #7 and Pull Request #8.
    • To upgrade to RTS version 1.1 from 1.0, simply change your pom.xml (see Installation on DSpace 1.8.x) to reference 'dspace-replicate' version 1.1. Then rebuild DSpace & re-run 'ant update'. Your existing RTS configuration files will still work with RTS 1.1.
Note
titleUsage Examples

For a quick overview of the various tasks offered in the Replication Task Suite, along with some real-life scenarios / examples of where each Replication task may come in handy, you may wish to skip directly to the Problem Statement and Usage Examples section at the bottom of this page.

Info
titleMore Information / Screencasts

More information on the Replication Task Suite is available from the following webinars / screencasts:

 

Table of Contents
minLevel2
outlinetrue
stylenone

Installation

Supported Java Versions

  • Replication Task Suite with DuraCloud backend: requires Java 7 be installed on your DSpace server, as the DuraCloud Java API requires Java 7.
  • Replication Task Suite (standalone): may be used with either Java 6 or Java 7 on your DSpace server.

Supported DSpace Versions

The Replication Task Suite currently supports the following versions of DSpace software:

Replication Task Suite VersionSupported DSpace Version(s)Supported InterfacesNotes
3.1DSpace version 3.xXMLUI and/or commandlineThe 3.1 stable version of the Replication Task Suite is nearly identical to the 1.1 stable version. It just includes minor bug fixes to ensure the Replication Task Suite is compatible with the new DSpace 3.x API.
1.1DSpace version 1.8.xXMLUI and/or commandlineHighly recommended to use either DSpace 1.8.1 or 1.8.2. DSpace 1.8.0 has a known bug where running a Replication Task will always return a NullPointerException - see DS-1077

Installation instructions for each version are included below:

User Interface Compatibility Notes

As the Replication Suite is just a suite of Curation System tasks, it may be called (like any Curation Tasks) from the following locations:

  • From the Command Line
  • From the Admin UI (XMLUI Only)
  • From Item Approval Workflow
  • From custom Java code

For more information see the Curation System details on Task Invocation.

Installation on DSpace 3.x

  1. In your DSpace Source directory ([dspace-src]), you will need to modify the following POM file:
    • [dspace-src]/dspace/modules/additions/pom.xml (This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)

  2. For this pom.xml file, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag).

    Code Block
    <dependencies>
        ...
    
  3. In your DSpace Source directory ([dspace-src]), you will modify two Maven pom.xml files:
    • [dspace-src]/dspace/pom.xml (This POM controls dependencies of CommandLine scripts. Modifying it will let you run dspace-replicate from commandline)
    • [dspace-src]/dspace/modules/xmlui/pom.xml (This POM controls dependencies of XMLUI. Modifying it will let you run dspace-replicate from XMLUI)

  4. For each of these pom.xml files, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag).

    Code Block
    <dependencies>
        ...
    
    	<!-- Adding this dependency will install the Replication Task Suite Addon -->
    	<dependency>
       		<groupId>org.dspace</groupId>
       		<artifactId>dspace-replicate</artifactId>
       		<version>1<version>3.0<1</version>
    	</dependency>
    </dependencies> 
  5. Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    
  6. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
    1. You may wish to ensure these configurations exist in your [dspace-src]/dspace/config/ directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).
  7. You will need to You will need to update your existing DSpace 1.83.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    
    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)
  8. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.

     

Installation on DSpace

...

1.8.x

...

Warning
titleKnown Curation System bug in 1.8.0

DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug was resolved in DSpace 1.8.1.
Because of the above bug, we recommend running the Replication Task Suite on DSpace 1.8.1 or above.

 

  1. In your DSpace Source directory ([dspace-src]), you will modify two Maven pom.xml files:
    • [dspace-src]/dspace/pom.xml (This POM controls dependencies of CommandLine scripts. Modifying it will let you run dspace-replicate from commandline)
    • [dspace-src]/dspace/modules/xmlui/pom.xml (This POM controls dependencies of XMLUI. Modifying it will let you run dspace-replicate from XMLUI)

  2. For each of these pom.xml files, add the following <dependency> section at the end of the existing

  3. In your DSpace Source directory ([dspace-src]), you will need to modify the following POM file:
    • [dspace-src]/dspace/modules/additions/pom.xml (This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)

  4. For this pom.xml file, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag).

    Code Block
    <dependencies>
        ...
    
    	<!-- Adding this dependency will install the Replication Task Suite Addon -->
    	<dependency>
       		<groupId>org.dspace</groupId>
       		<artifactId>dspace-replicate</artifactId>
       		<version>3<version>1.0-SNAPSHOT<1</version>
    	</dependency>
    </dependencies> 
    Note
    titleReplication Task Suite version 3.0-SNAPSHOT is nearly identical to 1.0 stable

    The 3.0-SNAPSHOT version of the Replication Task Suite is nearly identical to the 1.0 stable version.  The only changes are very minor bug fixes to allow for the Replication Task Suite to be compatible with the new DSpace 3.x API.  So, even though this is a "-SNAPSHOT" version, you should still find it to be stable.  A "3.0-EA1" (Early Access #1) version will be released in the near future after more extensive testing is performed.

  5. Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    
  6. Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
    1. You may wish to ensure these configurations exist in your [dspace-src]/dspace/config/ directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).
  7. You will need to update your existing DSpace 1.8You will need to update your existing DSpace 3.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    
    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)
    Follow the instructions in the Configuration section below in order to enable & configure

Upgrades

Upgrading the Replication Task Suite

...

to a new version essentially involves a reinstallation of the add-on.

Follow the latest installation instructions, based on the version of DSpace you are running:

Once you have reinstalled the Replication Task Suite, you should compare your existing configurations with the latest Replication Task Suite configurations.  In most cases, your existing configurations should function perfectly, but you should review the differences just in case.

Configuration

Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and

Configuration

Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.

Enabling Replication Task Suite

...

  1. A copy of all configuration files utilized by the Replication Task Suite (RTS) can be found in the following locations:
    1. Configs for RTS version 1.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-1_x/config/modules
    2. Configs for RTS version 3.x : https://github.com/DSpace/dspace-replicate/tree/master/config/modules
  2. Copy the following configuration files to your DSpace's [dspace]/config/modules/ directory:
    1. replicate.cfg - This file contains the base settings for the Replication Task Suite
    2. replicate-mets.cfg - This file provides a few additional replication options specific to METS-based AIPs (see below for more details)
    3. duracloud.cfg - If you'd like to replicate/backup your content to DuraCloud, this file holds your DuraCloud account information
  3. Edit your [dspace]/config/modules/curate.cfg configuration file to define & enable all tasks. The list of tasks to add to this configuration file depends on which type of AIP (METS based or BagIt based) you wish to use. Please see the AIP Format Options section below for the details of what should be added to your curate.cfg file
    1. A sample, fully enabled curate.cfg configuration file is provided alongside the other Replication Task Suite config files listed above.  This sample file is preconfigured to use METS-based AIPs.

Overview of Configuration Options

Before getting started, you may wish to determine the answers to the following questions:

  1. Recommended (but not required):  Edit your [dspace]/config/modules/dspace.cfg and enable the Replication Task Suite 'listener' to perform automatic synchronization of your AIP backup store with what is in DSpace (see Automation Options for more info).

Overview of Configuration Options

Before getting started, you may wish to determine the answers to the following questions:

  1. AIP Format Options: Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt AIP Format?
  2. Storage Options: Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
  3. Automation Options (Recommended): Do you want to automatically sync your AIP backup store with what is in DSpace? (this is highly recommended, but not required)
  4. Additional Options: Do you plan to use Checkm manifests for checksum auditing?

...

  1. Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'LocalObjectStore' plugin

    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.LocalObjectStore
    
  2. Configure Local Storage Folder: Configure the location where you want all AIPs to be stored on your local filestystem. This defaults to the [dspace]/repstore folder. However, we recommend changing this to at least a separate hard drive from your existing DSpace installation directory! This ensures that all your content will not be lost in the case of a hard drive failure.

    Code Block
    # Location of local (e.g. local, mountable, sync) object store
    # ignored for non-local stores (e.g. DuraCloud)
    store.dir = ${dspace.dir}/repstore
    
  3. Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under store.dir) which will be used to store AIPs, checkm manifests (if enabled), etc.

    Code Block
    # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP")
    group.aip.name = aip_-store
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest 
    # based tasks are executed (org.dspace.ctask.replicate.checkm.*).
    group.manifest.name = manifest_-store
    
    # The storage group / folder where AIPsdeletion records are temporarilykept stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Each Essentially,time thisan 'delete'object groupis providesdeleted ain DSpace,
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossiblea DELETION-RECORD@[handle] file is written to this location. The deletion record is always in
    # BagIt format. It details basic info about the deleted object (along with any deleted child/member objects)
    # This deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks),
    # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task).
    group.delete.name = trashdeletions
    
    Info
    titleUsing Subfolders

    Your "group.aip.name", "group.manifest.name" and "group.delete.name" settings also support subfolder paths.  For example:

    group.aip.name = dspace_-backup/aip_-store

    group.manifest.name = dspace_-backup/manifest_-store

    group.delete.name = dspace_-backup/trashdeletions

    With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" folder (under store.dir).  AIPs will all be stored under the subfolder "aip_-store/".  Manifests will all be stored under the subfolder "manifest_-store/".  And any deleted objects object deletion records will be temporarily stored under the subfolder "trashdeletions/".

Configuring Mountable Storage

...

  1. Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'MountableObjectStore' plugin

    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.MountableObjectStore
    
  2. Configure Mounted Folder: Configure the location where you want all AIPs to be stored. The folder should already be mounted on your local filesystem. This defaults to the [dspace]/repstorefolder.

    Code Block
    # Location of local (e.g. local, mountable, sync) object store
    # ignored for non-local stores (e.g. DuraCloud)
    store.dir = ${dspace.dir}/repstore
    
  3. Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under store.dir) which will be used to store AIPs, checkm manifests (if enabled), etc.

    Code Block
    # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP")
    group.aip.name = aip_-store
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest 
    # based tasks are executed (org.dspace.ctask.replicate.checkm.*).
    group.manifest.name = manifest_-store
    
    # The storage group / folder where deletion AIPsrecords are temporarilykept stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Each Essentially,time thisan 'delete'object groupis providesdeleted ain DSpace,
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossiblea DELETION-RECORD@[handle] file is written to this location. The deletion record is always in
    # BagIt format. It details basic info about the deleted object (along with any deleted child/member objects)
    # This deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks),
    # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task).
    group.delete.name = trashdeletions
    
    Info
    titleUsing Subfolders

    Your "group.aip.name", "group.manifest.name" and "group.delete.name" settings also support subfolder paths.  For example:

    group.aip.name = dspace_-backup/aip_-store

    group.manifest.name = dspace_-backup/manifest_-store

    group.delete.name = dspace_-backup/trashdeletions

    With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" folder (under store.dir).  AIPs will all be stored under the subfolder "aip_-store/".  Manifests will all be stored under the subfolder "manifest_-store/".  And any deleted objects object deletion records will be temporarily stored under the subfolder "trashdeletions/".

Configuring DuraCloud Storage

...

  1. Enable DuraCloud Storage Plugin: Ensure the Replication suite is setup to use the 'DuraCloudObjectStore' plugin

    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.DuraCloudObjectStore
    
  2. Configure DuraCloud Primary Space to use: Your DuraCloud account allows you to separate content into various "Spaces". You'll need to create a new DuraCloud Space that your AIPs will be stored within, and configure that as your group.aip.name (by default it's set to a DuraCloud Space with ID of "aip_-store").

    Code Block
    # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # are executed (e.g. "Transmit AIP", "RecoverRestore from AIP")
    group.aip.name = aip_-store
    
  3. Optionally, Configure Additional DuraCloud Spaces: If you have chosen to utilize Checkm manifest validation, you will need to create and configure a DuraCloud Space corresponding to the group.manifest.name setting below. Additionally, if you have chosen to enable the Automatic Replication, you will need to create and configure a DuraCloud Space corresponding to the group.delete.name setting below.

    Code Block
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest 
    # based tasks are executed (org.dspace.ctask.replicate.checkm.*).
    group.manifest.name = manifest_-store
    
    # The storage group / folder where AIPsdeletion records are temporarily stored/retrievedkept when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Essentially,Each this 'delete' group provides a time an object is deleted in DSpace,
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossiblea DELETION-RECORD@[handle] file is written to this location. The deletion record is always in
    # BagIt format. It details basic info about the deleted object (along with any deleted child/member objects)
    # This deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks),
    # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task).
    group.delete.name = trashdeletions
    
    Info
    titleUsing File Prefixes instead of separate DuraCloud Spaces

    If you'd rather keep all your DSpace files in a single DuraCloud Space, you can tweak your "group.aip.name", "group.manifest.name" and "group.delete.name" settings to specify a file-prefix to use.  For example:

    group.aip.name = dspace_-backup/aip_-store

    group.manifest.name = dspace_-backup/manifest_-store

    group.delete.name = dspace_-backup/trashdeletions

    With the above settings in place, all your DSpace content will be stored in the "dspace_-backup" Space within DuraCloud.  AIPs will all be stored with a file-prefix of "aip_-store/" (e.g. "aip_-store/ITEM@123456789-2.zip").  Manifests will all be stored with a file-prefix of "manifest_-store/".  And any deleted objects object deletion records will be temporarily stored with a file-prefix of "trashdeletions/".   This allows you to keep all your content in a single DuraCloud Space while avoiding name conflicts between AIPs, Manifests and deleted filesdeletion records.

Automation Options (Recommended)

Performing a backup of DSpace is one thing..but ensuring that backup is always "synchronized" with your changing DSpace content is another.

...

  1. Automatically Sync Changes (via Queue) : Any changes that happen in DSpace (new objects, changed objects, deleted objects) are automatically added to a "queue". This queue can then be processed on a schedule (via cron).
  2. Scheduled Site Auditing/Replication : You may also wish to perform a full site audit or backup on a scheduled basis.

...

In order to enable/activate synchronization, you will need to add a new consumer to the list of DSpace consumers (in dspace.cfg).  It is recommended to add this new configuration to the end of the list of existing "event.consumer." options in your dspace.cfg file.

  • METS-based AIP Replicate Consumer: This consumer will listen for changes to any DSpace Communities, Collections, Items, Groups, or EPeople.  It should be utilized if you have chosen to use METS-based AIPs. See AIP Format Options above for more details.

    Code Block
    #### Event System Configuration ####
    
    # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer)
    event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate
    
    ....
    
    # Configure consumer to manage METS AIP content replication
    event.consumer.replicate.class = org.dspace.ctask.replicate.METSReplicateConsumer
    event.consumer.replicate.filters = Community|Collection|Item|Group|EPerson+All
    

     

    • In human terms, this configuration essentially means: listen for all changes to Communities, Collections, Items, Groups and EPeople. If a change is detected, run the "METSReplicateConsumer" (which adds that object to the queue).
  • BagIt-based AIP Consumer : This consumer will ONLY listen for changes to DSpace Communities, Collections and Items as those are the only types of objects which are stored in BagIt-based AIPs. See AIP Format Options above for more details

    Code Block
    #### Event System Configuration ####
    
    # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer)
    event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate
    
    ....
    
    # Configure consumer to manage BagIt AIP content replication
    event.consumer.replicate.class = org.dspace.ctask.replicate.BagItReplicateConsumer
    event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete
    

     

    • In human terms, this configuration essentially means: listen for any new, modified or deleted Items, Collections and Communities. If you do not care about Community or Collection AIPs, just remove 'Community' or 'Collection' from the list. When one of the specified changes is detected, run the "BagItReplicateConsumer" (which adds that object to the queue).

You will need to restart DSpace for this new Consumer to be recognized.

How the Sync Consumer works

When the activated ReplicateConsumer detects a change on an object (Community, Collection or Item) in DSpace, it will do the following:

  • Object is created/added in DSpace:  If the event is an addition of a new DSpace object (for items this only occurs once the item exits approval workflow), then a request for an AIP transmission is queued.
  • Object is changed/modified in DSpace: The same occurs whenever an object has changed (so-called modify events). ... # Configure consumer to manage BagIt AIP content replication event.consumer.replicate.class = org.dspace.ctask.replicate.BagItReplicateConsumer event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete

     

    • In human terms, this configuration essentially means: listen for any new, modified or deleted Items, Collections and Communities. If you do not care about Community or Collection AIPs, just remove 'Community' or 'Collection' from the list. When one of the specified changes is detected, run the "BagItReplicateConsumer" (which adds that object to the queue).

You will need to restart DSpace for this new Consumer to be recognized.

How the Sync Consumer works

When the activated ReplicateConsumer detects a change on an object in DSpace, it will do the following:

  • The modified object is queued for AIP transmission.
  • Object is deleted from DSpace: When an object is deleted, a 'record' of the deletion is transmitted to the replication service. The deletion record is stored in your configured "group.delete.name" (in replicate.cfg) and named DELETION-RECORD@[handle].zip.  The deletion record is a BagIt package which simply lists all the objects that were deleted: if an item, then just the handle of the item, if a collection, then all the item handles that were in it. This way, if the deletion was a mistake, the "deletion record" can be used to recover all the contents. This represents the default behavior of the consumer. However, you may configure it in [dspace]/modules/replicate.cfg.
    • It is worth noting that when you delete an object in DSpace, the Sync Consumer will NOT delete that object's AIP from storage.  All it does is write a "DELETION-RECORD@[handle]" file to your configured storage location.  This ensures that you can review those "deletion records" at a later time, and decide whether to permanently delete the AIP(s) from storage, or alternatively restore the deleted object(s) in DSpace from their AIP(s).
    • How to restore deleted objects: Simply run one of the "Restore from AIP" commands available in the Replication Task Suite, passing it the handle of the deleted object.  That command will locate the associated "DELETION-RECORD" file and appropriately restore any objects listed in that deletion record.  Once the restoration is complete, the associated "DELETION-RECORD" file will be removed.
    • How to permanently delete objects' AIPs from storage:  If an object deletion was found to be valid, you may wish to permanently remove the deleted object's AIP from remote storage (to save storage space).  Simply run the "Remove AIP" command, passing it the handle of the deleted object.  That command will permanently delete the object's AIP along with any associated "DELETION-RECORD" file from your storage location.  (WARNING: once an AIP is deleted, you will be unable to restore that object to DSpace in the future.)
  • Newly Added Objects: If the event is an addition of a new DSpace object (for items this only occurs once the item exits approval workflow), then a request for an AIP transmission is queued.
  • Changed Objects: The same occurs whenever an object has changed (so-called modify events). The modified object is queued for AIP transmission.
  • Deleted Objects: When an object is deleted, a 'catalog' of the deletion is transmitted to the replication service. The catalog just lists all the objects that were deleted: if an item, then just the handle of the item, if a collection, then all the item handles that were in it. This way, if the deletion was mistaken, the catalog can be used to recover all the contents. This represents the default behavior of the consumer. However, you may configure it in [dspace]/modules/replicate.cfg
Configuring the Sync Consumer

...

Info
titleMore Information on where Odometer statistics are kept

The odometer statistics are stored in a small text file located at: [base.dir]/odometer, where [base.dir] is the value of the base.dir setting in your [dspace]/config/modules/replicate.cfg configuration file. Should you ever need to reset your odometer, you can do so by moving or removing this existing odometer file.

Automation (

...

Recommended)

While the coordinated use of the tasks described above can provide the basis for a solid replication strategy and practice, there are several processes that could necessitate a fair amount of curatorial work. For example, in the discussion on ensuring integrity of AIPs over time, we remarked that vigilance was required by the curator to transmit new AIPs whenever Items change. It is possible to leverage existing facilities in DSpace to substantially reduce this effort through automation.

...

More information about setting up automation is available in the Automation Options configuration section above.

...