You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Code can be found in our DSpace repository on GitHub, in the DOI branch: https://github.com/tuub/DSpace/tree/DOI.

DOI support

We (at Technische Universität Berlin) want to use DOIs for Items within DSpace. We are thinking about using DOIs for Communities and Collections, but at first we'll concentrate on items. DOI is a well known persistent identifier. With the external identifier support @mire introduced to DSpace 3.0 with the item versioning feature it should be possible to add support to mint, register and delete DOIs using DSpace.

Registration agencies, DataCite and EZID

To register a DOI one has to enter into a contract with a DOI registration agency. Several such agencies exist. Different DOI registration agencies have different policies. Some of them offer DOI registration specially or only for academic environment, others only for publishing companies. Most of the registration agencies charge fees for registering DOIs, all of them have different rules describing for what kind of item a DOI can be registered. To implement DOI support for DataCite we have to be mindful of the fact that every registration agency has their own API (see below).

DataCite is an organization that aims to support the access to, the acceptance of and the archiving of research data. One of the services offered by DataCite members is DOI registration. DataCite has several members that act as a DOI registration agency. Some of the members tell their customers to use the API of DataCite directly, others offer their own APIs. So to register a DOI at a member of DataCite does not automatically mean to use DataCites API directly.

We will register our DOIs using the service of TIB Hannover, a German member of DataCite. We will use the DataCite API directly. EZID is a DOI registration agency in the USA that is also part of DataCite. EZID offers their own API, so that EZID customers won't profit directly from our development.

DOIIdentifierProvider, DOIConnector and DataCiteConnector

Knowing this situation we developed a DOIIdentifierProvider that should perform everything that is necessary to support DOIs on the DSpace side. For example, after minting and registering a DOI it saves the DOI as a metadata value of an Item. To be able to extend our DOIIdentifierProvider, we put a DOIConnector between our DOIIdentifierProvider and the registration agency API. The DOIConnector has to implement seven methods and should be quite easy to implement for any API of a DOI registration agency. The seven methods are:

  • a method to check if a DOI is already reserved,
  • a method to check if a DOI is reserved for a given DSO,
  • a method to check if a DOI is already registered,
  • a method to check if a DOI is registered for a given DSO,
  • a method to reserve a DOI for a given DSO,
  • a method to register a DOI for a given DSO,
  • a method to delete a DOI for a given DSO.

We already developed a DataCiteConnector that implements these methods for everyone who uses the DataCite API directly. As told above, EZID has their own API, but it should be quite simple to implement a DOIConnector providing these seven methods with the EZID API.

Metadata

DataCite wants to get metadata of the objects the DOIs addresses. The DataCite Schema (http://schema.datacite.org) defines an XML structure to describe the metadata of an object. We developed a DIM2DataCite crosswalk that takes the metadata of a DSpace Item and transforms it into a XML using DataCite Schema 2.2. As far as I know, EZID does not use this XML so another crosswalk is probably needed. It should be discussed (see below or in the JIRA ticket) how we want to deal with metadata updates, as the API for external identifiers does not define a mechanism to update metadata for an external identifier yet.

How to Test

To test our code, you will need a login to be able to use the DataCite Test API. The test system can be found here: https://test.datacite.org. This URL is currently used for the API in dspace/config/spring/api/identifier-service.xml. In this file you have to remove to comments around the bean for the DOIIdentifier and around the bean for the DataCiteConnector. In dspace.cfg you'll have to configure the properties identifier.doi.user, identifier.doi.password, identifier.doi.prefix and identifier.doi.namespaceseparator. You can use the "TIB.DSPACE" user, the "duraspace" password, the 10.0128 prefix and as namespace separator a string that you don't expect anyone else would be using. DOIs that gets registered with these settings will be deleted regularly. DOIs that get registered with these properties must address an item using example.org as domain (or any subdomain below it). So you have to configure dspace.url to include example.org as domain (sorry, that is a rule of our DOI registration agency)!

Status

The DSpace wiki recommends us to get in touch with the developer community early. A first version of a DOIIdentifierProvider is now complete. An interface for a DOIConnector is defined. We were able to reserve, register and delete DOIs at the test API of DataCite. All the code can be found in our DSpace repository on GitHub, in the DOI branch: https://github.com/tuub/DSpace/tree/DOI.

What's still to be done?

Of course, documentation for the DSpace manual would be necessary if this contribution gets accepted. Also JavaDoc documentation could be enhanced. I should (but did not yet) write here something about several design decisions I made while implementing the DOIIdentifierProvider.

A lot of testing and possibly some debugging. It would be great if someone could implement a DOIConnector for EZID. We did not write any test classes yet (knowing that this is not good and should be changed). We currently break some tests because DataCite defines five mandatory metadata fields (the DOI, a creator, a title, a publisher and the publication year) and not all TestItems contain all of these.

We did not care for the frontends yet. Currently we can register DOIs but we did not care if they will be presented in any UI.

We just implemented a DOIIdentifierProvider that can be compared to the HandleIdentifierProvider. We did neither care for or tested our implementation with versioning switched on. We want to support versioning but we wanted to get a first implementation done before taking care for versioning. Is a special IdentifierProvider needed for versioning as it exists with the VersionedHandleIdentifierProvider or does every IdentifierProvide can be used when versioning is switched on?

We currently support DSpace Items only. We would like to extend this, so that a DOI can be registered for a community and a collection as well. The main problem is to look how we deal with the metadata that DataCite wants to get for an object thats addressed by a DOI. In a first lose discussion our DOI registration agency seems to be quite open to support us with DOIs for communities and collections but this may differ from registration agency to registration agency so that it should be configurable if a DOI gets registered for a Item only or for communities and collections as well.

There are several things that should be discussed within the DSpace developer community:

  • Currently the API for external identifiers does not inform an IdentifierProvider about updated metadata. Should the DOIIdentifierProvider API be extended to take care for metadata API or does another mechanism already exists in DSpace?
  • The IdentifierProvider API allows most of the methods to throw an IdentifierException. But in our first test it seems that a thrown IdentifierException won't be caught. If the API of the DOI registration agency is down or produces errors, it is impossible to publish any Items in DSpace.

Thanks

I want to thank Mark H. Wood who made the first steps for a DOIIdentifier. His code helped me to understand how DSpace handles metadata and what should be done to support DOIs within DSpace. He started to implement a DOIIdentifierProvider using EZID and also wrote a test class for it. I took the liberty to use some of his code.

The other person helping me was Fabian Fürste, a collegue here at TU Berlin. Thanks goes to him as well.

I also want to thank TIB Hannover for providing a test account to the DSpace community.

Implementation details

Beside the things above I should mention that my code adds a table to the DSpace database schema. It adds a table called "doi" in which I save information about the DOIs that get minted and registered with DSpace.

  • No labels