Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated documentation.

...

The downside of this design decision is that there's a delay in reservation and registration of DOIs. However the duration of the delay depends on how often you run the cronjob. The big advantage is that DSpace stays independent from the status of external services (in this case the registration agencies API). For further information how to use the DOIOrganiser see the online help ([dspace-install]/bin/dspace doi-organiser --help).

How to Test

Of course you first have to get our code and compile it on your test system. You can find the code on github (https://github.com/tuub/DSpace/tree/DOI). If you use this code on an existing DSpace test installation, you have to add the DOI table to your database. Please make sure you have a backup first. Please dump your database! You can use either https://github.com/tuub/DSpace/blob/DOI/dspace/etc/postgres/database_schema_3-4.sql or https://github.com/tuub/DSpace/blob/DOI/dspace/etc/oracle/database_schema_3-4.sql depending on the database system you use.

To register DOIs you will need an account from a registration agency that allows you to use the DataCite API directly (EZID provides an API on their own). TIB Hannover was so friendly to provide us a test account for DSpace development. At DataCite every account has to define URLs it want to be able to register DOIs for. With the following test account you can register only DOIs that point to example.org or any subdomain of it. So you have to configure your computer to resolve example.org to your test installation and you have to configure your test installation as if it would run under example.org (this is done by setting dspace.hostname and dspace.baseUrl on appropriate values). The username of the test account is "TIB.DSPACE", the password is "duraspace" and the prefix is 10.0128. You can reach the web user interface of the test system under To test our code, you will need a login to be able to use the DataCite Test API. The test system can be found here: https://test.datacite.org. This URL is currently used for the API in dspace/config/spring/api/identifier-service.xml. In this file you have to remove to comments around the bean for the DOIIdentifier and around the bean for the DataCiteConnector where you can see which DOIs got reserved and registered. All DOIs registered in the test system will be deleted from time to time.

DOI support is disabled by default. So even with our code your have to configure DSpace to use it. In dspace.cfg you'll have to configure the properties identifier.doi.user, identifier.doi.password, identifier.doi.prefix and identifier.doi.namespaceseparator. You can use the "TIB.DSPACE" user, the "duraspace" password, the 10.0128 prefix and as namespace separator a string that you don't expect anyone else would be using. DOIs that gets registered with these settings will be deleted regularly. DOIs that get registered with these properties must address an item using example.org as domain (or any subdomain below it). So you have to configure dspace.url to include example.org as domain (sorry, that is a rule of our DOI registration agency)!

Status

The DSpace wiki recommends us to get in touch with the developer community early. A first version of a DOIIdentifierProvider is now complete. An interface for a DOIConnector is defined. We were able to reserve, register and delete DOIs at the test API of DataCite. All the code can be found in our DSpace repository on GitHub, in the DOI branch: https://github.com/tuub/DSpace/tree/DOI.

Update (08/15/13): We changed the code as described in the comments in the JIRA ticket (

Jira
serverDuraSpace JIRA
keyDS-1535
). The code on github is currently not working but we'll update it soon.

What's still to be done?

Of course, documentation for the DSpace manual would be necessary if this contribution gets accepted. Also JavaDoc documentation could be enhanced. I should (but did not yet) write here something about several design decisions I made while implementing the DOIIdentifierProvider.

A lot of testing and possibly some debugging. It would be great if someone could implement a DOIConnector for EZID. We did not write any test classes yet (knowing that this is not good and should be changed). We currently break some tests because DataCite defines five mandatory metadata fields (the DOI, a creator, a title, a publisher and the publication year) and not all TestItems contain all of these.

We did not care for the frontends yet. Currently we can register DOIs but we did not care if they will be presented in any UI.

We just implemented a DOIIdentifierProvider that can be compared to the HandleIdentifierProvider. We did neither care for or tested our implementation with versioning switched on. We want to support versioning but we wanted to get a first implementation done before taking care for versioning. Is a special IdentifierProvider needed for versioning as it exists with the VersionedHandleIdentifierProvider or does every IdentifierProvide can be used when versioning is switched on?

We currently support DSpace Items only. We would like to extend this, so that a DOI can be registered for a community and a collection as well. The main problem is to look how we deal with the metadata that DataCite wants to get for an object thats addressed by a DOI. In a first lose discussion our DOI registration agency seems to be quite open to support us with DOIs for communities and collections but this may differ from registration agency to registration agency so that it should be configurable if a DOI gets registered for a Item only or for communities and collections as well.

There are several things that should be discussed within the DSpace developer community:

  • Currently the API for external identifiers does not inform an IdentifierProvider about updated metadata. Should the DOIIdentifierProvider API be extended to take care for metadata API or does another mechanism already exists in DSpace?
  • The IdentifierProvider API allows most of the methods to throw an IdentifierException. But in our first test it seems that a thrown IdentifierException won't be caught. If the API of the DOI registration agency is down or produces errors, it is impossible to publish any Items in DSpace.

The namespaceseparator will be prefixed to every suffix DSpace generates. If you use 10.0128 as prefix, "abc-" as namespaceseparator the DOIs generated by DSpace will look like 10.0128/abc-1, 10.0128/abc-2, ..., 10.0128/abc-1024, ... The namespace separator is usefull if different software or multiple DSpace installations use the same prefix. For testing please use something you think it's unique as namespace separator (f.e. your name, your login, ...). The second file you have to configure is dspace/config/spring/api/identifier-service.xml. In this file you have to remove to comments around the bean for the DOIIdentifier and around the bean for the DataCiteConnector. The properties for the DataCiteConnector has to be set as following:

Code Block
languagehtml/xml
 <property name='DATACITE_SCHEME' value='https'/> <property name='DATACITE_HOST' value='test.datacite.org'/> <property name='DATACITE_DOI_PATH' value='/mds/doi/' /> <property name='DATACITE_METADATA_PATH' value='/mds/metadata/' /> <property name='disseminationCrosswalkName' value="DataCite" />

When you make new submissions they should get DOIs no. You should be able to see DOIs as part of the metadata of an item. Remember to run the DOIOrganiser (see above) to reserve and register DOIs at the DOI registration agency.

Status

On 2013/09/27 we pushed a new version of our code to github (https://github.com/tuub/DSpace/tree/DOI). We rebased the code onto DSpace master. So if you downloaded our code before 13/09/27 you probably will get problems if you just try to download the new code by using git pull.

As with every code there is always something left that could be improved. But we think our code is ready to get tested and included in DSpace. We would be glad to here about your ideas and everything you think we should change. All discussions should happen in the JIRA ticket (All these an other topics could be discussed in jira:

Jira
serverDuraSpace JIRA
keyDS-1535
). In the comments to this ticket you can see what's currently on our todo list.

Thanks

I want to thank Mark H. Wood who made the first steps for a DOIIdentifier. His code helped me to understand how DSpace handles metadata and what should be done to support DOIs within DSpace. He started to implement a DOIIdentifierProvider using EZID and also wrote a test class for it. I took the liberty to use some of his code.

...

I also want to thank TIB Hannover for providing a test account to the DSpace community.

Implementation details

Beside the things above I should mention that my code adds a table to the DSpace database schema. It adds a table called "doi" in which I save information about the DOIs that get minted and registered with DSpace.