You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Code can be found in our DSpace repository on github, in the branch DOI: https://github.com/tuub/DSpace/tree/DOI.

DOI support

We (at Technische Universität Berlin) want to use DOIs for Items within DSpace. We are thinking about using DOIs for Communities and Collections at first we'll concentrate on items. A DOI is a well known persistent identifier and with the external identifier support atMire introduced to DSpace 3.0 with the versioning support it should be possible to add support to mint, register and delete DOIs using DSpace.

Registration agencies, DataCite and EZID

To register a DOI one has to make a contract with a DOI registration agency, several agencies exists. Different DOI registration agencies have different rules. Some offers registration of DOI specially or only for academic environment, others only for publishing companies. Most of the registration agencies take fees for registering DOIs, all of them have different rules describing for what kind of item a DOI can be registered. To implement DOI support for DataCite we have to take care that every registration agency has their own API (see below).

DataCite is an organization that aims to support the access to, the acceptance of and the archiving of research data. On of the services offered by DataCite members is to register DOIs. DataCite has several members that act as DOI registration agency. Some of the members tells their customers to use the API of DataCite directly others offers their own APIs. So to register a DOI at a member of DataCite does not automatically means to use DataCites API directly.

We will register our DOIs using the service of TIB Hannover a german member of DataCite. We will use the DataCite API directly. EZID is a DOI registration agency in the U.S. that is although part of DataCite. EZID offers their own API, so that EZID customers one profit directly from our development.

DOIIdentifierProvider, DOIConnector and DataCiteConnector

Knowing this situation we developed a DOIIdentifierProvider that should perform everything on the side of DSpace that is necessary to support DOIs. For example after minting and registering a DOI it safes the DOI as a metadata value of an Item. To be able to extend our DOIIdentifierProvider we put a DOIConnector between our DOIIdentifierProvider and the API for the registration agency. The DOIConnector has to support seven methods and should be quite easy to implement for any API of a DOI registration agency. The seven methods are:

  • one method to check if a DOI is already reserved,
  • one method to check if a DOI is reserved for a given DSO,
  • one method to check if a DOI is already registered,
  • one method to check if a DOI is registered for a given DSO,
  • one method to reserve a DOI for a given DSO,
  • one method to register a DOI for a given DSO,
  • one method to delete a DOI for a given DSO.

We already developed a DataCiteConnector that implements these methods for everyone that uses the DataCite API directly. As told above, EZID has their own API, but it should be quite simple to implement a DOIConnector providing these seven methods with the EZID API.

Metadata

DataCite wants to get metadata of the objects the DOIs addresses. The DataCite Schema (http://schema.datacite.org) defines a XML structure to describe the metadata of an object. We developed a DIM2DataCite crosswalk that takes the metadata of a DSpace Item and transforms it into a XML using DataCite Schema 2.2. As far as I know, EZID does not use this XML so that probably another crosswalk is needed. It should be discussed (see below or in the JIRA ticket) how we want to deal with metadata updates as the API for external identifiers does not define a mechanism to update metadata for external identifer yet.

Status

The DSpace wiki tells as to get in touch with the developer community early. A first version of a DOIIdentivierProvider is complete. An interface for a DOIConnector is defined. We were able to reserve, register and delete DOIs at the test API of DataCite. All the code can be found in our DSpace repository on github, in the branch DOI: https://github.com/tuub/DSpace/tree/DOI.

What's still to do?

Of cause documentation for the DSpace manual would be necessary if this contribution gets accepted. Although javadoc documentation could be enhanced. I should (but did not yet) write here something about several design decisions I made while implementing the DOIIdentifierProvider.

A lot of testing and possibly some debugging. It would be great if someone could implement a DOIConnector for EZID. We did not write any test classes yet (knowing that this is not good and should be changed). We currently break some tests because DataCite defines five mandatory metadata fields (the DOI, a creator, a title, a publisher and the publication year) and not all TestItems contains all of these.

We did not care for the frontends yet. Currently we can register DOIs but we did not care if they will be presented in any UI.

We just implemented a DOIIdentifierProvider that can be compared to the HandleIdentifierProvider. We did neither care for or tested our implementation with versioning switched on. We want to support versioning but we wanted to get a first implementation done before taking care for versioning. Is a special IdentifierProvider needed for versioning as it exists with the VersionedHandleIdentifierProvider or does every IdentifierProvide can be used when versioning is switched on?

We currently support DSpace Items only. We would like to extend this, so that a DOI can be registered for a community and a collection as well. The main problem is to look how we deal with the metadata that DataCite wants to get for an object thats addressed by a DOI. In a first lose discussion our DOI registration agency seems to be quite open to support us with DOIs for communities and collections but this may differ from registration agency to registration agency so that it should be configurable if a DOI gets registered for a Item only or for communities and collections as well.

There are several things that should be discussed within the DSpace developer community:

  • Currently the API for external identifiers does not inform a IdentifierProvider about updated metadata. Should the DOIIdentifierProvider API be extended to take care for metadata API or does another mechanism already exists in DSpace?
  • The IdentifierProvider API allows most of the methods to throw an IdentifierException. But in our first test it seems that a thrown IdentifierException won't be caught. If the API of the DOI registration agency is down or produces errors it is impossible to publish any Items in DSpace.

Thanks

I want to thank Mark H. Wood who mad a first steps for a DOIIdentifier. His code helped me to understand how DSpace handles metadata and what should be done to support DOIs within DSpace. He started to implement a DOIIdentifierProvider using EZID and although wrote a test class for it. I allowed my self to use some of his code.

The other person helping me was Fabian Fürste a collegue here at TU Berlin. Thanks goes to him as well.

  • No labels