Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Minor clarifications on auditing / full backups

...

These two AIP formats are not identical.  The below table seeks to describe some of the differences.

 

DSpace AIP Format (METS-based AIPs)

BagIt AIP Format

Supported Backup/Restore Types

 

 

Can Backup & Restore all DSpace Content easily

Yes

Yes

Can Backup & Restore a Single Community/Collection/Item easily

Yes

Yes

Backups can be used to move one or more Community/Collection/Items to another DSpace system easily.

Yes (Using the Replication Task Suite or using the command line AIP Backup and Restore tools)

Yes (though the Replication Task Suite add-on must be installed on both systems)

Can Backup & Restore Item Versions (added in DSpace 3.x)No (Item Versioning not yet compatible with AIP format. Only the most recent version of an Item is described in the AIP.)No (Item Versioning not yet compatible with AIP format. Only the most recent version of an Item is described in the AIP.)

Supported DSpace Object Types

 

 

Supports backup/restore of all Communities/Collections/Items (including metadata, files, logos, etc.)YesYes
Supports backup/restore of all People/Groups/PermissionsYesNo (Not yet supported)
Supports backup/restore of all Collection-specific Item TemplatesYesNo (Not yet supported)
Supports backup/restore of all Collection Harvesting settings (only for Collections which pull in all Items via OAI-PMH or OAI-ORE)No (The harvest settings are not preserved, but previously harvested items are preserved in their own AIPs)No (The harvest settings are not preserved, but previously harvested items are preserved in their own AIPs)
Supports backup/restore of all Withdrawn (but not deleted) ItemsYesYes
Supports backup/restore of Item Mappings between CollectionsYesYes
Supports backup/restore of all in-process, uncompleted Submissions (or those currently in an approval workflow)

No (AIPs are only generated for objects which are completed and considered "in archive")

No (AIPs are only generated for objects which are completed and considered "in archive")

Supports backup/restore of Items using custom Metadata Schemas & FieldsYesYes
Supports backup/restore of all local DSpace Configurations and CustomizationsNo (You are expected to backup your DSpace configurations and customizations separately. AIPs only backup content held within DSpace.)No (You are expected to backup your DSpace configurations and customizations separately. AIPs only backup content held within DSpace.)

...

Code Block
0 1 * * 6 $HOME/dspace/bin/dspace curate -q replication -r - > $HOME/dspace/log/replication-queue.log

In case it is not obvious, you can also process this queue manually via the command line by simply running: [dspace]/bin/dspace curate -q replication

 2>&1
  • The "-r -" part of this command ensures that the results of the sync are reported back to you on the command line, rather than being logged to the main dspace.log file. 
  • Then the "> replication-queue.log 2>&1" takes those reported results (success & errors) and writes them to a "replication-queue.log" file.
  • Although this example is a weekly sync, some institutions may wish to perform a more frequent (daily) or less frequent (monthly) sync based on local policies (especially based on whether AIPs are the sole form of backup, or acting as more of a "secondary" safety backup)

In case it is not obvious, you can also process this queue manually via the command line by simply running: [dspace]/bin/dspace curate -q replication

During the processing of the queue, the existing queue file is "locked", but any new changes are logged to a separate queue file.  In other words, changes that happen in the DSpace system during the processing of the queue will still be captured in a separate queue file (which will be processed the next time you execute the above script).

Note
titleYou still may wish to perform an occasional full site audit/backup

Even if you are processing the "sync queue" on a daily or weekly basis, you still may want to perform a full site-wide audit and/or backup on a less frequent basis.  For example, if you are processing the sync queue on a daily basis, you might want to perform a weekly or monthly site audit/backup.  Although this full site audit/backup is not required, it helps to ensure that all of your AIPs are simultaneously update-to-date at a given point in time.  It's worth noting that only AIPs that have changed (i.e. have a different checksum) will be transferred to your backup location. So, if all AIPs are already up-to-date in your backup location, no AIPs would even be transferred.

More information on performing such an "audit" or full-site backup (including cron job examples) can be found in the section on Scheduled Site Auditing / Replication

 During the processing of the queue, the existing queue file is "locked", but any new changes are logged to a separate queue file.  In other words, changes that happen in the DSpace system during the processing of the queue will still be captured in a separate queue file (which will be processed the next time you execute the above script).

Enhancing the Performance of the Queue Processing (optional)

...

Scheduled Site Auditing/Replication

Whether you decide to automatically synchronize your replica (backup) store or not, you may also wish to schedule some occasional auditing or even a full "refresh" of your backupyou decide to automatically synchronize your replica (backup) store or not, you may also wish to schedule some occasional auditing or even a full "refresh" of your backup.  Note that you need not necessarily perform both tasks, it's really up to you. Refreshing your backup (transmitaip) also performs its own internal "audit" by ensuring that only AIPs which have changed will be transmitted to your backup location (thus potentially cutting down on I/O costs for hosted backup locations).

  • Auditing for site differences: Running an audit will check if there are differences between DSpace Content and AIP backup content. A full site AIP audit can be run from the command line, or scheduled via a cron job (or similar). For example, the following command will run a site-wide audit for a DSpace site with a handle.prefix of "10673" (and writes the results of the audit to a "siteaudit.log" file).

    Code Block
    [dspace]/bin/dspace curate -t auditaip -i 10673/0 -r - > siteaudit.log 2>&1
    • This command runs the "auditaip" task on your entire site (The identifier "[handle-prefix]/0" refers to the entire DSpace Site...in this case we used the example handle-prefix of "10673")
    • The "-r -" part of this command ensures that the results of the audit are reported back to you on the command line, rather than being logged to the main dspace.log file. 
    • Then the "> siteaudit.log" takes those reported results and writes them to a "siteaudit.log" file.
  • A refresh of your full backup: If you decided not to synchronize your backup Whether or not you decide to use the sync queue for automated backups (as described above), you will may want to ensure that you are re-running your entire site replication backup on a scheduled basis.  For example, the following command will regenerate & transmit AIPs for every object in a DSpace site with a handle.prefix of "10673" (and writes the results of the audit to a "sitebackup.log" file).  It's nearly identical to the "auditaip" command above, except that you are executing "transmitaip" and writing to a different log filePlease note: only AIPs which have changed (i.e. have a different checksum) will be transferred to your backup location. So if you are also running automated syncing, this command may just act as way to double check that all your AIPs are up-to-date in your backup location.

    Code Block
    [dspace]/bin/dspace curate -t transmitaip -i 10673/0 -r - > sitebackup.log 2>&1

Additional Options

Configuring usage of Checkm manifest validation

...