Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents
minLevel2
outlinetrue
stylenone

Background & Overview

Warning
titleAIP Backup & Restore functionality only works with the Latest Version of Items

If you are using the new XMLUI-only Item Level Versioning functionality (disabled by default), you must be aware that this "Item Level Versioning" feature is not yet compatible with AIP Backup & Restore. Using them together may result in accidental data loss.  Currently the AIPs that DSpace generates only store the latest version of an Item.  Therefore, past versions of Items will always be lost when you perform a restore / replace using AIP tools.

Note

Additional background information available in the Open Repositories 2010 Presentation entitled Improving DSpace Backups, Restores & Migrations

...

 

Traditional Backup & Restore (Database and Files)

AIP Backup & Restore

Supported Backup/Restore Types

 

 

Can Backup & Restore all DSpace Content easily

Yes (Requires two backups/restores – one for Database and one for Files)

Yes (Though, will not backup/restore items which are not officially "in archive")

Can Backup & Restore a Single Community/Collection/Item easily

No (It is possible, but requires a strong understanding of DSpace database structure & folder organization in order to only backup & restore metadata/files belonging to that single object)

Yes

Backups can be used to move one or more Community/Collection/Items to another DSpace system easily.

No (Again, it is possible, but requires a strong understanding of DSpace database structure & folder organization in order to only move metadata/files belonging to that object)

Yes

Can Backup & Restore Item VersionsYes (Requires two backups/restores – one for Database and one for Files)No (Currently Item Level Versioning is not fully compatible with AIP Backup & Restore. AIP Backup & Restore can only backup/restore the latest version of an Item)

Supported Object Supported Object Types During Backup & Restore

 

 

Supports backup/restore of all Communities/Collections/Items (including metadata, files, logos, etc.)

Yes

Yes

Supports backup/restore of all People/Groups/Permissions

Yes

Yes

Supports backup/restore of all Collection-specific Item Templates

Yes

Yes

Supports backup/restore of all Collection Harvesting settings (only for Collections which pull in all Items via OAI-PMH or OAI-ORE)

Yes

No (This is a known issue. All previously harvested Items will be restored, but the OAI-PMH/OAI-ORE harvesting settings will be lost during the restore process.)

Supports backup/restore of all Withdrawn (but not deleted) Items

Yes

Yes

Supports backup/restore of Item Mappings between Collections

Yes

Yes (During restore, the AIP Ingester may throw a false "Could not find a parent DSpaceObject" error (see Common Issues or Error Messages), if it tries to restore an Item Mapping to a Collection that it hasn't yet restored. But this error can be safely bypassed using the 'skipIfParentMissing' flag (see Additional Packager Options for more details).

Supports backup/restore of all in-process, uncompleted Submissions (or those currently in an approval workflow)

Yes

No (AIPs are only generated for objects which are completed and considered "in archive")

Supports backup/restore of Items using custom Metadata Schemas & Fields

Yes

Yes (Custom Metadata Fields will be automatically recreated. Custom Metadata Schemas must be manually created first, in order for DSpace to be able to recreate custom fields belonging to that schema. See Common Issues or Error Messages for more details.)<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3dc7cbd1-2afb-4bb6-98dd-68d47594f69e"><ac:plain-text-body><![CDATA[

Supports backup/restore of all local DSpace Configurations and Customizations

Yes (if you backup your entire DSpace directory as part of backing up your files)

Not by default (unless your also backup parts of your DSpace directory – note, you wouldn't need to backup the '[dspace]/assetstore' folder again, as those files are already included in AIPs)

]]></ac:plain-text-body></ac:structured-macro>

Based Based on your local institutions needs, you will want to choose the backup & restore process which is most appropriate to you. You may also find it beneficial to use both types of backups on different time schedules, in order to keep to a minimum the likelihood of losing your DSpace installation settings or its contents. For example, you may choose to perform a Traditional Backup once per week (to backup your local system configurations and customizations) and an AIP Backup on a daily basis. Alternatively, you may choose to perform daily Traditional Backups and only use the AIP Backup as a "permanent archives" option (perhaps performed on a weekly or monthly basis).

{{\[dspace\]}} - The DSpace installation directory (Please note, if you also use the AIP Backup & Restore option, you do *not* need to backup your {{\[dspace\]/assetstore}} directory, as those files already exist in your AIPs).
  • {{\[dspace-source\]}} - The DSpace source directory
    Note
    titleDon't Forget to Backup your Configurations and Customizations

    If you choose to use the AIP Backup and Restore option, do not forget to also backup your local DSpace configurations and customizations. Depending on how you manage your own local DSpace, these configurations and customizations are likely in one or more of the following locations:

    Wiki Markup
    Wiki Markup

    How does this work help DSpace interact with DuraCloud?

    ...

    Warning
    titleMissing Groups or EPeople cannot be created when submitting an individual Community or Collection AIP

    Please note, if you are using AIPs to move an entire Community or Collection from one DSpace to another, there is a known issue (see DS-1105) that the new DSpace instance will be unable to (re-)create any DSpace Groups or EPeople which are referenced by a Community or Collection AIP. The reason is that the Community or Collection AIP itself doesn't contain enough information to create those Groups or EPeople (rather that info is stored in the SITE AIP, for usage during Full Site Restores).

    However, there are two possible ways to get around this known issue:

    • EITHER, you can manually recreate all referenced Groups/EPeople in the new DSpace that you are submitting the Community or Collection AIP into.
      • Note that if you are using Groups named with DSpace Database IDs (e.g. COMMUNITY_1_ADMIN, COLLECTION_2_SUBMIT), you may first need to rename those groups to no longer include Database IDs (e.g. MY_SUBMITTERS). The reason is that Database IDs will likely change when you move a Community or Collection to a new DSpace installation.
    • OR, you can temporarily disable the import of Group/EPeople information when submitting the Community or Collection AIP to the new DSpace. This would mean that after you submit the AIP to the new DSpace, you'd have to manually go in and add in any special permissions (as needed). To disable the import of Group/EPeople information, add these settings to your dspace.cfgfile, and re-run the submission of the AIP with these settings in place:

      Code Block
      mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
      mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
      • Don't forget to remove these settings after you import your Community or Collection AIP. Leaving them in place will mean that every time you import an AIP, all of its Group/EPeople/Permissions would be ignored.

    ...

    In some cases, when you restore a large amount of content to your DSpace, the internal database counts (called "sequences") may get out of sync with the Handles of the content you just restored. As a best practice, it is *highly recommended to always* re-run the "update-sequences.sql" script on your DSpace database after a larger scale restore. This database script can be run while the system is online (i.e. no need to stop Tomcat or PostgreSQL). The script can be found in the following locations for PostgreSQL and Oracle, respectively: {{\
    [dspace\]/etc/postgres/update-sequences.sql}} {{\
    [dspace\]/etc/oracle/update-sequences.sql}}

    Note
    titleHighly Recommended to Update Database Sequences after a Large Restore

    Wiki Markup
    Info
    titleMore Information on using Default Restore Mode with Community/Collection AIPs
    • Using the Default Restore Mode without the -a option, will only restore the metadata for that specific Community or Collection. No child objects will be restored.
    • Using the Default Restore Mode with the -a option, will only successfully restore a Community or Collection if that object along with any child objects (Sub-Communities, Collections or Items) do not already exist. In other words, if any objects belonging to that Community or Collection already exist in DSpace, the Default Restore Mode will report an error that those object(s) could not be recreated. If you encounter this situation, you will need to perform the restore using either the Restore, Keep Existing Mode or the Force Replace Mode (depending on whether you want to keep or replace those existing child objects).

    ...

    1. Install a completely "fresh" version of DSpace by following the Installation instructions in the DSpace Manual
      • At this point, you should have a completely empty, but fully-functional DSpace installation. You will need to create an initial Administrator user in order to perform this restore (as a full-restore can only be performed by a DSpace Administrator).
    2. Once DSpace is installed, run the following command to restore all its contents from AIPs

      Code Block
       [dspace]/bin/dspace packager -r -a -f -t AIP -e <eperson> -i <site-handle-prefix>/0 /full/path/to/your/site-aip.zip
      

    ...

    unmigrated-wiki-markup
    Note
    titleHighly Recommended to Update Database Sequences after a Large Restore

    In some cases, when you restore a large amount of content to your DSpace, the internal database counts (called "sequences") may get out of sync with the Handles of the content you just restored. As a best practice, it is *highly recommended to always* re-run the "update-sequences.sql" script on your DSpace database after a larger scale restore. This database script can be run while the system is online (i.e. no need to stop Tomcat or PostgreSQL). The script can be found in the following locations for PostgreSQL and Oracle, respectively: {{\
    [dspace\]/etc/postgres/update-sequences.sql}} {{\sql
    [dspace\]/etc/oracle/update-sequences.sql}}

    Additional Packager Options

    ...

    As a basic example:

    Code Block
    
    PackageParameters params = new PackageParameters;
    params.addProperty("createMetadataFields", "false");
    params.addProperty("ignoreParent", "true");
    

    ...

    The following configurations allow you to specify what metadata is stored within each METS-based AIP. In 'dspace.cfg', the general format for each of these settings is:

    ...

    • {{aip.disseminate.<setting> = <mdType>:<DSpace-crosswalk-name> \ [, ...\]}}
      • <setting> is the setting name (see below for the full list of valid settings)
      • <mdType> is optional. It allows you to specify the value of the @MDTYPE or @OTHERMDTYPE attribute in the corresponding METS element.
      • <DSpace-crosswalk-name> is required. It specifies the name of the DSpace Crosswalk which should be used to generate this metadata.
      • Zero or more <label-for-METS>:<DSpace-crosswalk-name> may be specified for each setting

    ...

    By default, the settings in dspace.cfg are:

    Code Block
    
    mets.dspaceAIP.ingest.crosswalk.DSpaceDepositLicense = NULLSTREAM
    mets.dspaceAIP.ingest.crosswalk.CreativeCommonsRDF = NULLSTREAM
    mets.dspaceAIP.ingest.crosswalk.CreativeCommonsText = NULLSTREAM
    

    ...

    • mets.xsd.<abbreviation> = <namespace> <local-file-name>
      • <abbreviation> is a unique abbreviation (of your choice) for this schema
      • <namespace> is the Schema namespaceunmigrated-wiki-markup
      • {{<local-file-name>}} the full name of the cached schema file (which should reside in your {{\[dspace\]/config/schemas/}} directory, by default this directory does not exist -- you will need to create it)

    ...

    The default settings are all commented out. But, they provide a full listing of all schemas currently used during validation of AIPs. In order to utilize them, uncomment the settings, download the appropriate schema file, and save it to your {{\[dspace\]/config/schemas/}} directory (by default this directory does not exist -- you will need to create it) using the specified file name:

    Code Block
    
    #mets.xsd.mets = http://www.loc.gov/METS/ mets.xsd
    #mets.xsd.xlink = http://www.w3.org/1999/xlink xlink.xsd
    #mets.xsd.mods = http://www.loc.gov/mods/v3 mods.xsd
    #mets.xsd.xml = http://www.w3.org/XML/1998/namespace xml.xsd
    #mets.xsd.dc = http://purl.org/dc/elements/1.1/ dc.xsd
    #mets.xsd.dcterms = http://purl.org/dc/terms/ dcterms.xsd
    #mets.xsd.premis = http://www.loc.gov/standards/premis PREMIS.xsd
    #mets.xsd.premisObject = http://www.loc.gov/standards/premis PREMIS-Object.xsd
    #mets.xsd.premisEvent = http://www.loc.gov/standards/premis PREMIS-Event.xsd
    #mets.xsd.premisAgent = http://www.loc.gov/standards/premis PREMIS-Agent.xsd
    #mets.xsd.premisRights = http://www.loc.gov/standards/premis PREMIS-Rights.xsd
    

    ...

    Issue / Error Message

    How to Fix this Problem

    Ingest/Restore Error: "Group Administrator already exists"

    If you receive this problem, you are likely attempting to Restore an Entire Site, but are not running the command in Force Replace Mode (-r -f). Please see the section on Restoring an Entire Site for more details on the flags you should be using.

    Ingest/Restore Error: "Unknown Metadata Schema encountered (mycustomschema)"

    If you receive this problem, one or more of your Items is using a custom metadata schema which DSpace is currently not aware of (in the example, the schema is named "mycustomschema"). Because DSpace AIPs do not contain enough details to recreate the missing Metadata Schema, you must create it manually via the DSpace Admin UI. Please note that you only need to create the Schema. You do not need to manually create all the fields belonging to that schema, as DSpace will do that for you as it restores each AIP. Once the schema is created in DSpace, re-run your restore command. DSpace will automatically re-create all fields belonging to that custom metadata schema as it restores each Item that uses that schema.

    Ingest Error: "Could not find a parent DSpaceObject referenced as 'xxx/xxx'"

    When you encounter this error message it means that an object could not be ingested/restored as it belongs to a parent object which doesn't currently exist in your DSpace instance. During a full restore process, this error can be skipped over and treated as a warning by specifying the 'skipIfParentMissing=true' option (see Additional Packager Options). If you have a larger number of Items which are mapped to multiple Collections, the AIP Ingester will sometimes attempt to restore an item mapping before the Collection itself has been restored (thus throwing this error). Luckily, this is not anything to be concerned about. As soon as the Collection is restored, the Item Mapping which caused the error will also be automatically restored. So, if you encounter this error during a full restore, it is safe to bypass this error message using the 'skipIfParentMissing=true' option. All your Item Mappings should still be restored correctly.<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8fb8fa85-b274-49dd-9794-46d763c13094"><ac:plain-text-body><![CDATA[

    Submit Error: PSQLException: ERROR: duplicate key value violates unique constraint "handle_handle_key"

    This error means that while submitting one or more AIPs, DSpace encountered a Handle conflict. This is a general error the may occur in DSpace if your Handle sequence has somehow become out-of-date. However, it's easy to fix. Just run the [dspace]/etc/postgres/update-sequences.sql script (or if you are using Oracle, run: [dspace]/etc/oracle/update-sequences.sql).]]></ac:plain-text-body></ac:structured-macro>