Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

Proposed URI Mapping for DSpace Object Model

This page proposes a mapping of objects in the DSpace data model
(aka Object Model) onto Uniform Resource Identifiers (URIs).
The URI scheme was developed specifically for the History system prototype
but it may also find uses in the AIP prototype implementation
and policy expression languages – and in any application that needs a stable,
persistent, URI naming an object in the DSpace object model.

Objectives

The specific goals of this proposal are:

  • Conform to existing, applicable standards.
  • URIs are meaningful and human-readable.
  • Every URI has a one-to-one correspondence with its Object: there is only one valid URI for any given object.
  • The URI is resolvable to an object within its realm of uniqueness:
    • URIs of persistent objects such as Items and Collections are unique and resolvable globally.
    • URIs of archive-dependent objects (such as a Bitstream's asset-store location) are only resolvable within the archive.
  • Follow the RDF convention of a common URI prefix with identifying elements in the URI "fragment", so RDF viewers display it correctly in condensed form.

Design Choices

We propose to base DSpace Object Model URIs on the
Info URI Scheme.
If this proposal is adopted, we will request to
register the

"dspace"

namespace in the

info:

scheme.

Why Not URLs?

Why not use URLs in e.g. the

http:

or

ftp:

scheme?
Recall that it is a goal for the URIs to correspond 1:1 with objects, and
objects may be duplicated (replicated or custody-transferred) at
other archives, so they would then have multiple URLs.
Also, any identifier based on the domain name of a network host
is not going to be persistent.

Besides, the URI does not have to be globally resolvable. It only has to be
resolvable in the context where the object is available, e.g. within
a DSpace archive that contains the Item.

URI Specification

The general formats for a DSpace URI starts with
the scheme

info:

and the namespace

dspace

, followed by
a path element delimiter *

"/"

* (slash). The rest of
the URI depends on the object to be described. We have established rules
for two classes of objects:

1. First-Class DSpace Objects with Persistent Identifiers

Any "first-class" object with a persistent identifer – i.e. a Handle – can
be mapped to a URI based on that Handle, following the pattern:

info:dspace/handle#

handle

:

subfragment

For example, any DSpace Item, Collection, Community, and Site has a globally
unique Handle. An Item with the Handle

1721.1/4325

would have this URI:

info:dspace/handle#1721.1/4325

The subfragment notation is used for the "persistent" identifiers of
Bitstreams. A Bitstream in the preceding example's Item with the
Sequence ID 3 would be identified by this URI:

info:dspace/handle#1721.1/4325:3

NOTE: The "handle" word in the URI path is there to declare that the unique
identifier following is a CNRI Handle. Since DSpace may eventually
implement other persistent identifier schemes, they would each be mapped to
a class of DSpace URI with the name of the type of PID in place of "handle".

2. Representing Internal Objects

Some applications need to refer to objects within the archive with a
URI, e.g. because it is needed within an XML or RDF representation and a URL is
inappropriate. The first such case is in the internal AIP METS document,
which needs to identify a file in the Asset Store, bypassing the object model.

The solution is to add a different unique keyword to the DSpace URI prefix
and let the application dictate the rest of the URI. In this case,
we give it the keyword

asset

, and the format (for file-based
local asset stores, at least) is a path element naming the bit-store type
and an identifying path in the fragment. The general format for a local file is:

info:dspace/asset/

storagetype#assetstore:assetpath

and for a registered asset it's:

info:dspace/registered/

storageType#assetStore:assetIdentifer

In each of the above formats:

  • storageType is either
    file
    or
    srb
    , depending on whether local file storage or the SRB is used.
  • assetStore is the asset-store prefix under which this file is stored; there can be several configured. In the
    file
    case it would be a local
    file:
    URI.
  • assetIdentifier is the identifer, unique only within that asset store, of the file.

Here is an example: The URI

info:/dspace/asset/file#file%3A%2Fvar%2Flocal%2Fdspace%2Fassetstore%2F:47662570435556328444977060694430104239

is broken down into:

info:/dspace/asset -- actual asset in the bitstore (as opposed to "registered").
/file# -- storage type is local file, not SRB.
file%3A%2Fdspace%2Fassetstore0%2F: _-- in asset store rooted at _/dspace/assetstore0
47662570435556328444977060694430104239 -- actual file name (under prefix directories)

Note how only one URI can be derived from the asset file, and likewise the
URI corresponds directly to exactly one file. Given the asset store prefix
and filename it is completely straightforward to match that to a Bitstream
object.

API

The following methods are implemented in patches to DSpace version 1.4.1.
Note that there is only a method to obtain the Handle-based persistent URI
of any archival object; there is no corresponding method to resolve it,
because none of the code needed it. The implementation is quite
straightforward, however.

// in RDFRepository
/**

  • Returns the persistent, globally-unique URI of the given object,
  • if possible. If there is no basis for a persistent URI (i.e. if
  • it has no Handle), returns null.
    *
  • @param context - the dSpace context
  • @param dso - any DSpace object.
  • @return new URI or null if one cannot be created.
    */
    public static URI makeDSpaceObjectURI(Context context, DSpaceObject dso)

// in Bitstream
/**

  • Returns a URI of the storage occupied by this bitstream in the
  • asset store. It can be resolved by the dereferenceAbsoluteURI()
  • method. Note that the "absolute" URI does not depend on the DSpace
  • object model or RDBMS storage, it only depends on the asset store layer.
    *
  • @return external-based URI to bitstream.
    */
    public URI getAbsoluteURI()

// in Bitstream
/**

  • Returns the Bitstream object containing the file in the asset
  • store indicated by the URI, or null if there is none.
  • See getAbsoluteURI().
    *
  • @param context - the context.
  • @param uri a bitstream absolute URI created by getAbsoluteURI()
  • @return a Bitstream object or null.
    */
    public static Bitstream dereferenceAbsoluteURI(Context context, URI uri)

Comments?

  • No labels