Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The following explains how the application layer is built and used.

Table of Contents
minLevel2
outlinetrue
stylenone

Web User Interface

The DSpace Web UI is the largest and most-used component in the application layer. Built on Java Servlet and JavaServer Page technology, it allows end-users to access DSpace over the Web via their Web browsers. As of Dspace 1.3.2 the UI meets both XHTML 1.0 standards and Web Accessibility Initiative (WAI) level-2 standard.

...

The submission UI has an optional feature that came about as a result of MIT Libraries policy. If the block.theses parameter in dspace.cfg is true, an extra checkbox is included in the first page of the submission UI. This asks the user if the submission is a thesis. If the user checks this box, the submission is halted (deleted) and an error message displayed, explaining that DSpace should not be used to submit theses. This feature can be turned off and on, and the message displayed (/dspace/jsp/submit/no-theses.jsp can be localized as necessary.

OAI-PMH Data Provider

The DSpace platform supports the Open Archives Initiative Protocol for Metadata Harvesting (This information now has it's own subpage: OAI-PMH ) version 2.0 as a data provider. This is accomplished using the OAICat framework from OCLC.

The DSpace build process builds a Web application archive, [dspace-source]/build/oai.war), in much the same way as the Web UI build process described above. The only differences are that the JSPs are not included, and [dspace-source]/etc/oai-web.xml is used as the deployment descriptor. This 'webapp' is deployed to receive and respond to OAI-PMH requests via HTTP. Note that typically it should not be deployed on SSL (https: protocol). In a typical configuration, this is deployed at oai, for example:

Code Block

http://dspace.myu.edu/oai/request?verb=Identify

The 'base URL' of this DSpace deployment would be:

Code Block

http://dspace.myu.edu/oai/request

It is this URL that should be registered with www.openarchives.org. Note that you can easily change the 'request' portion of the URL by editing [dspace-source]/etc/oai-web.xml and rebuilding and deploying oai.war.

DSpace provides implementations of the OAICat interfaces AbstractCatalog, RecordFactory and Crosswalk that interface with the DSpace content management API and harvesting API (in the search subsystem).

Only the basic oai_dc unqualified Dublin Core metadata set export is enabled by default; this is particularly easy since all items have qualified Dublin Core metadata. When this metadata is harvested, the qualifiers are simply stripped; for example, description.abstract is exposed as unqualified description. The description.provenance field is hidden, as this contains private information about the submitter and workflow reviewers of the item, including their e-mail addresses. Additionally, to keep in line with OAI community practices, values of contributor.author are exposed as creator values.

Other metadata formats are supported as well, using other Crosswalk implementations; consult the oaicat.properties file described below. To enable a format, simply uncomment the lines beginning with Crosswalks.*. Multiple formats are allowed, and the current list includes, in addition to unqualified DC: MPEG DIDL, METS, MODS. There is also an incomplete, experimental qualified DC.

Note that the current simple DC implementation (org.dspace.app.oai.OAIDCCrosswalk) does not currently strip out any invalid XML characters that may be lying around in the data. If your database contains a DC value with, for example, some ASCII control codes (form feed etc.) this may cause OAI harvesters problems. This should rarely occur, however. XML entities (such as >) are encoded (e.g. to >)

In addition to the implementations of the OAICat interfaces, there is one main configuration file relevant to OAI-PMH support:

  • oaicat.properties: This file resides in [dspace]/config. You probably won't need to edit this, as it is pre-configured to meet most needs. You might want to change the Identify.earliestDatestamp field to more accurately reflect the oldest datestamp in your local DSpace system. (Note that this is the value of the last_modified column in the Item database table.)

Sets

OAI-PMH allows repositories to expose an hierarchy of sets in which records may be placed. A record can be in zero or more sets.

DSpace exposes collections as sets. The organization of communities is likely to change over time, and is therefore a less stable basis for selective harvesting.

Each collection has a corresponding OAI set, discoverable by harvesters via the ListSets verb. The setSpec is the Handle of the collection, with the ':' and '/' converted to underscores so that the Handle is a legal setSpec, for example:

Code Block

hdl_1721.1_1234

Naturally enough, the collection name is also the name of the corresponding set.

Unique Identifier

Every item in OAI-PMH data repository must have an unique identifier, which must conform to the URI syntax. As of DSpace 1.2, Handles are not used; this is because in OAI-PMH, the OAI identifier identifies the metadata record associated with the resource. The resource is the DSpace item, whose resource identifier is the Handle. In practical terms, using the Handle for the OAI identifier may cause problems in the future if DSpace instances share items with the same Handles; the OAI metadata record identifiers should be different as the different DSpace instances would need to be harvested separately and may have different metadata for the item.

The OAI identifiers that DSpace uses are of the form:

Code Block
oai:host name:handle

For example:

Code Block
oai:dspace.myu.edu:123456789/345

If you wish to use a different scheme, this can easily be changed by editing the value of OAI_ID_PREFIX at the top of the org.dspace.app.oai.DSpaceOAICatalog class. (You do not need to change the code if the above scheme works for you; the code picks up the host name and Handles automatically from the DSpace configuration.)

Access control

OAI provides no authentication/authorisation details, although these could be implemented using standard HTTP methods. It is assumed that all access will be anonymous for the time being.

A question is, "is all metadata public?" Presently the answer to this is yes; all metadata is exposed via OAI-PMH, even if the item has restricted access policies. The reasoning behind this is that people who do actually have permission to read a restricted item should still be able to use OAI-based services to discover the content.

If in the future, this 'expose all metadata' approach proves unsatisfactory for any reason, it should be possible to expose only publicly readable metadata. The authorisation system has separate permissions for READing and item and READing the content (bitstreams) within it. This means the system can differentiate between an item with public metadata and hidden content, and an item with hidden metadata as well as hidden content. In this case the OAI data repository should only expose items those with anonymous READ access, so it can hide the existence of records to the outside world completely. In this scenario, one should be wary of protected items that are made public after a time. When this happens, the items are "new" from the OAI-PMH perspective.

Modification Date (OAI Date Stamp)

OAI-PMH harvesters need to know when a record has been created, changed or deleted. DSpace keeps track of a 'last modified' date for each item in the system, and this date is used for the OAI-PMH date stamp. This means that any changes to the metadata (e.g. admins correcting a field, or a withdrawal) will be exposed to harvesters.

'About' Information

As part of each record given out to a harvester, there is an optional, repeatable "about" section which can be filled out in any (XML-schema conformant) way. Common uses are for provenance and rights information, and there are schemas in use by OAI communities for this. Presently DSpace does not provide any of this information.

Deletions

DSpace keeps track of deletions (withdrawals). These are exposed via OAI, which has a specific mechansim for dealing with this. Since DSpace keeps a permanent record of withdrawn items, in the OAI-PMH sense DSpace supports deletions 'persistently'. This is as opposed to 'transient' deletion support, which would mean that deleted records are forgotten after a time.

Once an item has been withdrawn, OAI-PMH harvests of the date range in which the withdrawal occurred will find the 'deleted' record header. Harvests of a date range prior to the withdrawal will not find the record, despite the fact that the record did exist at that time.

As an example of this, consider an item that was created on 2002-05-02 and withdrawn on 2002-10-06. A request to harvest the month 2002-10 will yield the 'record deleted' header. However, a harvest of the month 2002-05 will not yield the original record.

Note that presently, the deletion of 'expunged' items is not exposed through OAI.

Flow Control (Resumption Tokens)

An OAI data provider can prevent any performance impact caused by harvesting by forcing a harvester to receive data in time-separated chunks. If the data provider receives a request for a lot of data, it can send part of the data with a resumption token. The harvester can then return later with the resumption token and continue.

DSpace supports resumption tokens for 'ListRecords' OAI-PMH requests. ListIdentifiers and ListSets requests do not produce a particularly high load on the system, so resumption tokens are not used for those requests.

Each OAI-PMH ListRecords request will return at most 100 records. This limit is set at the top of org.dspace.app.oai.DSpaceOAICatalog.java (MAX_RECORDS). A potential issue here is that if a harvest yields an exact multiple of MAX_RECORDS, the last operation will result in a harvest with no records in it. It is unclear from the OAI-PMH specification if this is acceptable.

When a resumption token is issued, the optional completeListSize and cursor attributes are not included. OAICat sets the expirationDate of the resumption token to one hour after it was issued, though in fact since DSpace resumption tokens contain all the information required to continue a request they do not actually expire.

Resumption tokens contain all the state information required to continue a request. The format is:

Code Block

from/until/setSpec/offset

from and until are the ISO 8601 dates passed in as part of the original request, and setSpec is also taken from the original request. offset is the number of records that have already been sent to the harvester. For example:

Code Block

2003-01-01//hdl_1721_1_1234/300

This means the harvest is 'from'
2003-01-01, has no 'until' date, is for collection hdl:1721.1/1234, and 300 records have already been sent to the harvester. (Actually, if the original OAI-PMH request doesn't specify a 'from' or 'until, OAICat fills them out automatically to '0000-00-00T00:00:00Z' and '9999-12-31T23:59:59Z' respectively. This means DSpace resumption tokens will always have from and until dates in them.Data Provider 2.0 (Internals)

DSpace Command Launcher

Introduced in Release 1.6, the DSpace Command Launcher brings together the various command and scripts into a standard-practice for running CLI runtime programs.

...

  • <command> begins the stanza for a command
  • <name>_name of command_</name> the name of the command that you would use.
  • <description>_the description of the command_</description>
  • <step> </step> User arguments are parsed and tested.
  • <class>_<the java class that is being used to run the CLI program>_</class>
    Prior to release 1.5 if one wanted to regenerate the browse index, one would have to issue the following commands manually:

    Code Block
    [dspace]/bin/dsrun org.dspace.browse.IndexBrowse -f -r
    [dspace]/bin/dsrun org.dspace.browse.ItemCounter
    [dspace]/bin/dsrun org.dspace.search.DSIndexer

    In release 1.5 a script was written and in release 1.6 the command [dspace]/bin/dspace index-init replaces the script. The stanza from launcher.xmlshow us how one can build more commands if needed:

    Code Block
    <command>
            <name>index-update</name>
            <description>Update the search and browse indexes</description>
            <step passuserargs="false">
                <class>org.dspace.browse.IndexBrowse</class>
                <argument>-i</argument>
            </step>
            <step passuserargs="false">
                <class>org.dspace.browse.ItemCounter</class>
            </step>
            <step passuserargs="false">
                <class>org.dspace.search.DSIndexer</class>
            </step>
    </command>

    .