Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To publish content stored in DSpace as Linked (Open) Data the data has to be converted into RDF. The conversion into RDF has to be configurable as different DSpace instances may uses different meta data schemata, different persistent identifiers (DOI, Handle, ...) and so on. Depending on the content to convert, the configuration and other parameters the conversion may be time and performance intensive. Contents of repositories is much more often read then created, deleted or changed as the main target of repositories is to safely store their contents. For this reasons content stored within DSpace is stored in a triple store after conversion. The triple store serves as a cache and provides a SPARQL endpoint to make the converted data accessible using SPARQL. The conversion is triggered by a consumer of the DSpace event system and can be started manually using a command line interface (both are documented below). The triple store can be deleted at anytime as all data stored in the triple store can be restored out of the contents stored in DSpace else-where (in the assetstore(s) and the database).

Beside the SPARQL endpoint the data should be published as RDF serialization as well. With dspace-rdf DSpace offers a module that loads converted data from the triple store and provides it as RDF serialization (it currently supports RDF/XML, Turtle and N-Triples). Repositories use Persistent Identifiers to make content citable and to address contents. Following the Linked Data Principles DSpace uses Persistent Identifier in the form of HTTP(S)-URIs, converting a handle to http://hd.handle.net/<handle> and a DOI to http://dx.doi.org/<doi>. Bringing it all together the Linked Data support of DSpace extends all three Layers: the storage layer with a triple store, the business logic with classes to convert stored contents into RDF and the application layer with a module to publish RDF serializations. As you can use DSpace with Oracle or Postgresql you may choose between different triple stores. The only requirement is requirements are that the triple store must support SPARQL 1.1 as DSpace uses Query Language and SPARQL 1.1 Graph Store HTTP Protocol as DSpace uses them to store, update and delete converted data in the triple store and the triple store shall provide a public read-only SPARQL endpoint.

Warning
titleStore public data only in the triple store!

The Triple Store should contain only data that is public as the access restriction of DSpace won't affect the SPARQL endpoint. For this reason DSpace converts only archived, discoverable (non-privat) Items, Collections and Communities that are readable for anonymous users. Please consider this while configuring and/or extending DSpace's Linked Data support.

The package org.dspace.rdf.conversion contains the classes used to convert the repository's content to RDF. The conversion itself is done by plugins. The interface org.dspace.rdf.conversion.ConverterPlugin is really simple, so take a look if you can program Java and want to extend the conversion. The only thing important is, that plugins must only create RDF that can be made publicly available as the triple store provides it using a sparql endpoint for which DSpace's access restrtictions do not apply. The MetadataConverterPlugin is heavily configurable (see below) and is used to convert metadata of Items. The StaticDSOConverterPlugin can be used to add static RDF Triple (see below). The SimpleDSORelationsConverterPlugin creates links between items and collections, collections and communities, subcommunitites and their parents and between top-level communities and the information representing the repository itself.

As different repositories uses different persistent identifiers to address their content, different algorithms to create URIs used within the converted data can be implemented. Currently HTTP(S)-URIs of the repository (called local URIs), handles and DOIs can be used. See the configuration part of this document for further information. If you want to add another algorithm, take a look on the interface org.dspace.rdf.storage.URIGenerator.

Installation

 

Configuration

 

Maintenance

 

Extending the LOD support

Plugins converting meta data (see below) should check whether as specific meta data field needs to be protected or not (see org.dspace.app.util.MetadataExposure on how to check that).