This page & all code described on it is now OBSOLETE. It has been replaced by the AIP Backup and Restore feature, which will first be released in DSpace 1.7.0

PLEDGE AIP Prototype

This page & all code described on it is now OBSOLETE. It has been replaced by the AipBackupRestore feature, which will first be released in DSpace 1.7.0

This page describes a prototype AIP implementation planned as part of the PLEDGE project. Since the PLEDGE project only needs AIPs to replicate them under the direction of a policy engine, it was not necessary to create an AIP-based asset store.

About This Implementation

The source changes and additions that implement AIPs have the following other benefits:

Goals of AIP Prototype

Makeup and Definition of AIPs

Issues and Questions

AIP Details: METS Usage

Rob Wolfe's Comments on METS Usage

Crosswalks

DIM Descriptive Elements for Collection objects

Metadata Field

getMetadata() key

dc.description

introductory_text

dc.description.abstract

short_description

dc.description.tableofcontents

side_bar_text

dc.identifier.uri

getHandle();

dc.provenance

provenance_description

dc.rights

copyright_text

dc.rights_license

copyright_text

dc.title

name

DIM Descriptive Elements for Community objects

Metadata Field

getMetadata() key

dc.description

introductory_text

dc.description.abstract

short_description

dc.description.tableofcontents

side_bar_text

dc.identifier.uri

getHandle();

dc.rights

copyright_text

dc.title

name

AIP Technical Metadata for Item

Metadata Field

method and comments

dc.contributor

getSubmitter().getEmail()

dc.identifier.uri

getHandle()

dc.relation.isPartOf

getOwningCollection().getHandle() as URN

dc.relation.isReferencedBy

getCollections() Handle URN of each non-owner

dc.rights.accessRights

isWithdrawn() "WITHDRAWN" if true

AIP Technical Metadata for Bitstream

Metadata Field

method and comments

dc.title

getName()

dc.title.alternative

getSource()

dc.description

getDescription()

dc.format

getUserFormatDescription()

dc.format.medium

getFormat().getShortDescription()

dc.format.mimetype

getFormat().getMIMEType()

dc.format.supportlevel

getFormat().getSupportLevel()

dc.format.internal

getFormat().isInternal()

AIP Technical Metadata for Collection

Metadata Field

method and comments

dc.identifier.uri

getHandle()

dc.relation.isPartOf

getCommunities()[0]

dc.relation.isReferencedBy

getCommunities()[1]

AIP Technical Metadata for Community

Metadata Field

method and comments

dc.identifier.uri

getHande()

dc.relation.isPartOf

getParentCommunity()

Example AIP manifests

These are examples of internal AIPs for some representative DSpace objects:

Creating Internal AIPs for Later Restoration

Start with a DSpace archive that has the AIP Prototype patched into
its code base. Prepare internal AIPs for the first time with the command:

  dsrun org.dspace.administer.AIPManager -u -a -v -e _admin-user_

Be sure that command completed successfully; check standard output
and the DSpace log for errors.

If it runs too long, you may wish to use the "-c" option to limit the number of AIPs it processes and repeat the process at off-hours for several days. Since the -u option updates internal AIPs, it will not re-create existing AIPs unless the underlying objects have changed.

Maintaining Internal AIPs

You should periodically run the command

 dsrun org.dspace.administer.AIPManager -u -a -e _admin-user_

to update internal AIPs for objects that have been changed or added.
Once a day should be enough.

Note that it will always re-create AIPs for Collection and Community objects, since the DSpace object model does not have a last-modified date for them and there is no way to tell if the AIP is out of date or not. Since there are relatively few collections and communities in an archive (compared to Items) this is not seen as a serious problem.

Procedure to Restore an Archive from AIPs

The following steps have been tested for a very small archive
and successfully restored the RDBMS tables from internal AIPs in the
asset store. Note that this is a coarse overview and does not
consider error-handling.

Restoration

  1. Run /dspace/bin/cleanup to clear out unused bitstreams from the asset store.
  2. Shut down your servlet container, if necessary.
  3. Remove the search indices: rm /dspace/search/*
  4. If your archive is configured to use History, save the old History by renaming its directory, and create a new, empty History directory
    e.g. mv history history.old ; mkdir history
  5. Start with an empty database. Either:
    1. Backup the current state of the RDBMS, and destroy it with
      e.g. drop database dspace;
    2. Simply change your DSpace configuration to point to a different database instance, if you have room for another database.
  6. Create a new, empty database:
    createdb -U dspace -E UNICODE dspace
  7. Run the scripts in your install directory to initialize the DB:
    ant setup_database load_registries
  8. Back in the DSpace run directory, create an admin user:
    /dspace/bin/create-administrator
  9. Initialize the search and browse indices:
    /dspace/bin/index-all
  10. In your DSpace configuration, ensure that the AIP restoration application will run with History turned off:
    1. Set up a separate dispatcher for the AIPManager application:
      aipManager.dispatcher = restore
    2. Ensure that the restore Dispatcher does NOT call the History consumer, although it should call the search and browse consumers synchronously:
      event.dispatcher.restore.class = org.dspace.event.BasicDispatcher<br>event.dispatcher.restore.consumers = search:sync, browse:sync
  11. Rebuild the Bitstream table:
    /dspace/bin/dsrun org.dspace.administer.RebuildBitstreamTable -r
  12. Rebuild the InternalAIP table:
    /dspace/bin/dsrun org.dspace.administer.AIPManager -c -a -f -v -e ''admin-user''
  13. Restore archive from the internal AIPs:
    /dspace/bin/dsrun org.dspace.administer.AIPManager -r -a -v -e ''admin-user''

At each stage, carefully monitor the output and the DSpace log for indications of errors. You can retry the restore of an internal AIP, or even the whole set of them, if necessary; it automatically skips any objects that already exist.

Creating and Ingesting External AIPs

Since external AIPs are really just another kind of package, you can manage them with the same package manipulation tools you use with, e.g., METS-based SIPs. The only difference is that you may need to
apply some packager parameters to the AIP ingester since its default
behavior assumes it is restoring an object to the exact same place in the archive, i.e. its former parent and Handle.

You can use external AIPs to migrate objects between archives or even as a backup strategy (similar to the use of internal AIPs).

Creating AIPs

To create an AIP in a file, use this command template:

 /dspace/bin/dsrun org.dspace.app.packager.Packager -d -t AIP -e _eperson_ -i _handle_ _file-path_

for example:

 /dspace/bin/dsrun org.dspace.app.packager.Packager -d -t AIP -e florey@mit.edu -i 1721.1/4567 aip4567.zip

The command needs to run under the identify of an EPerson with permission to read the specified object.

To create an internal AIP, just add the package parameter internal=true to the command.
The resulting "package" will be a METS manifest document, e.g.

 /dspace/bin/dsrun org.dspace.app.packager.Packager -d -t AIP -e florey@mit.edu -i 1721.1/4567 -o internal=true mets.xml

Ingesting External AIPs

To ingest an AIP and create a new object under a parent of your choice, add the ignoreParent and ignoreHandle package parameters to the command:

 /dspace/bin/dsrun org.dspace.app.packager.Packager -s -t AIP -e _eperson_ -p _parent-handle_ -o ignoreParent=true -o ignoreHandle=true _file-path_

If you leave out these package-parameter
options, the AIP package ingester will
attempt to install the AIP under the parent handle it had before,
and give it back its original Handle. After all, the point of
AIPs was to reproduce the exact object that was exported. When you are effectively using the AIP as a SIP, however, you may not want it back under the same parent or handle, so there is a way to override these features.

Restoring Objects from External AIPs

If your goal is to restore the original state of an object from
its external AIP, you can do this as well, by leaving out the
"ignore" parameters and adding the -r option to the Packager application. "-r" tells the Packager to ingest in "replace"
mode, applying the parent and Handle from the package. Note
that you still have to specify a parent with the "-p" option but
it is not used.

Here is an example of ingesting an AIP with the "-r" option:

 /dspace/bin/dsrun org.dspace.app.packager.Packager -s -t AIP -e _eperson_ -p 123456789/0  -r  _file-path_

If you are restoring an entire archive, or a hierarchy of objects,
from external AIPs, then you'll have to ingest the "ancestors" first:
for example, ingest the top-level Communities, then the sub-Communities and Collections under them, and so on, and finally
the Items when all the Collections are ready. You'll have to
examine each package to determine its parent handle, and the
handle of the object it creates, to determine the order.

Internal AIPs

Although it is possible to create internal AIPs and even ingest them
with the Packager, this is not recommended (unless you are just
satisfying curiosity or testing the system). The AIPManager application was created specifically to maintain internal AIPs within
the asset store so there is no need to export them.

Downloads and Installation

IMPORTANT: The patches from EventSystemPrototype must be applied first before attempting to install the AIP Prototype.

First, download the new files and diffs:

  1. source diffs part 1
  2. source diffs part 2
  3. source diffs part 3
  4. new source files

Then apply the changes to your DSpace installation directory:

NOTE: The interface of org.dspace.content.packager.PackageIngester has been changed slightly. This will break any existing package ingesters, although the ones in the DSpace core have been fixed. Look at the changes to e.g. org.dspace.content.packager.PDFPackager for an example of how to update your code. The changes are quite minimal.

  1. Unpack the new source Zip file in your install directory with unzip.
  2. For each of the "diff" files, in order, go to your install directory and apply the diff with the command:
    patch -p 0 -l < ''diff-file''
  3. Build and install the code: ant install_code build_wars
  4. Ensure the configuration changes in config/dspace.cfg get propagated to your run-time config file.
  5. Ensure the new files in config/crosswalks are installed in your run-time directory.
  6. Apply the database change by running the SQL code in the file:
    etc/database_schema_14-15.sql
  7. Be sure to install the new WAR file(s) in your servlet container.
  8. Test by updating internal AIPs as shown above

Configuration for AIPs

The following configuration keys apply to the AIP packager and management infrastructure.
They may also require certain crosswalk plugins to be configured,
but that is a separate issue that is addressed in the sample DSpace
configuration supplied with the system source.

Known Deficiencies

See Also