needsupdate

Contents

Standards-based AIP Asset Store

In the current architecture, all metadata is in a relational database, and all content bitstreams are in the file system on the server. This makes certain preservation-related activities complex, including backups, auditing, and replication/distribution. The proposed new asset store stores metadata and content together as standards-based Archival Information Packages OAIS terminology. This does not replace the current relational database in DSpace. Although these AIPs are the authoritative version of information in the system, the relational database and search indices are still used for performant access; however, these are considered caches of the information in the AIPs.

There may be many different implementations of the asset store available. A simple file system based implementation could be the default. Other implementations based on Grid-style or other storage systems will be possible. The asset store is itself a DSpace module with a defined interface, and as such the implementation can be changed. However, the rest of the system needs to know what behaviour to expect from the asset store, and needs to have a conceptual model of what's in it.

The main reasons for moving to this asset store model is to ease various preservation tasks:

AIP Format

Initially, only items will be AIPs. Later on, AIPs for communities, collections and possibly bitstream formats can be added.

An AIP consists of:

For example, an item AIP stored in a simple filesystem-based asset store might look like this:

aip-identifier/
  mets.xml           core metadata serialisation in METS
  bitstream1
  bitstream2
  checksum           plain text file, containing checksum of METS document

(It may appear differently in other storage mechanisms e.g. Grid-style)

The proposed intention is that communities and collections are also AIPs, which contain references to the items within them. Communities and collections aren't really just containers – they have their own metadata and are archival objects in their own right. Then, from a disaster recovery point of view, your archive is resistant if the top three layers of the architecture are blown away; the asset store contains everything you need to reconstruct the archive. Also it means you can mirror/replicate a DSpace's content just by grabbing the AIPs.

In other words, the intention is that the AIPs are really independent of DSpace – you do not need a DSpace instance to be able to make use of them. The asset store is the archive; the rest of DSpace is really just a tool for managing, allowing deposit of and retrieval of objects in that asset store. AIPs are intended in a sense to transcend the DSpace application, and indeed an organisation.

One thing that is not in the AIPs in the asset store is stuff like information about e-people. That information is specific to a particular instance of DSpace. Say I have an AIP at MIT, and Cornell university is going to mirror it. It doesn't make sense to have e-people records referenced by that AIP since they only make sense in an MIT context. What it does make sense to include is some sort of declaritive policy expression for access control. This can be interpreted by different DSpace instances (or other systems) to decide which e-people have which exact permissions on an AIP. However, this can be developed later.

Metadata serialisation format

The AIP metadata serialisations should all be in one basic format. This lowers so many bars for interoperability, and scaling up via the asset store sharing mechanisms we've been talking about. Having DSpace support arbitrary AIP formats seems to be creating too many problems.

METS and DIDL are two potential options. For DSpace requirements, both are adequate. METS is proposed since it is the best-understood by the DSpace community.

Object Model

Asset Store API

From an asset store API point of view, the salient points are that an AIP consists of:

Prototypes

Various prototype APIs have been created by members of the DSpace community to try out different approaches:

Requirements

Permissions relating to individual e-people record

An open issue is whether an AIP should contain provenance information. If you're talking about the provenance of the bitstreams in the AIP, that seems to work; it's part of the metadata. If the provenance of the metadata itself or of the AIP as a whole is also important, the AIP then contains its own provenance metadata which feels like a potential security/robustness hole.

Issues

Use Cases

See AssetStoreUseCases