Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Linked to the new official documentation.
Info
titleOutdated

The information here is slightly outdated. An up to date documentation can be found in the official DSpace Documentation, starting with DSpace 5: Linked (Open) Data support of DSpace 5.

Repositories and the Semantic Web

...

German native speakers can find my thesis here: http://www.pnjb.de/uni/diplomarbeit/repositorien_und_das_semantic_web.pdf.

State of this document

The JIRA-Ticket for this contribution can be found here:

Jira
serverDuraSpace JIRA
serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
keyDS-2061
.

I created a pull request on github, to share the code as early as possible: https://github.com/DSpace/DSpace/pull/568.

State of this document

This document is a rather small documentation of dspace-rdf, but I wanted to share this contribution as early as possible. As I developed dspace-rdf as This document is a rather small documentation of dspace-rdf, but I wanted to share this contribution as early as possible. As I developed dspace-rdf as part of my thesis I was not allowed to speak about it before I handed my thesis in to my university. So this document should be the basis to introduce dspace-rdf and to discuss design decisions I have made. German native speakers should take a look inside my thesis. Chapters 4.2 and 5.1 explain the ideas behind the concept, while chapters 4.3 and 5.2 documents the implementation. Chapter 5.3 contains the things to be done before dspace-rdf can be used in a productive environment. On topic left to be done actually is a good english documentation.

...

dspace-rdf is an extension for DSpace that adds capabilities to convert contents stored in DSpace into RDF, to store the converted data in a Triple Store and to provide it as serializations of RDF. The Triple Store must support SPARQL 1.1 and can be used to provide the converted data over a read-only SPARQL endpoint. dspace-rdf can currently be found on on my github repositoriyrepository, but I would be glad to contribute it to a future version of DSpace.

...

The classes RDFConsumer and RDFUtil integrate dspace-rdf into DSpace. RDFConfiguration is used to centralize configuration properties used by more than one class inside dspace-rdf. The class RDFizer contains the command line interface. To see the online help you have to use the following command:

bash
Code Block
language
[dspace-install]/bin/dspace dsrun org.dspace.rdf.RDFizer --help

...

The installation follows the normal installation of a DSpace source release. In addition you have to provide a Triple Store. You can use any Triple Store you like, if it support SPARQL 1.1 Query Language and SPARQL 1.1 Graph Store HTTP Protocol. If you do not have one yet, you can use Apache Fuseki. Download Fuseki from its official download page and unpack the downloaded archive. The archive contains several scripts to start fuseki. Use the start script appropriated for the OS of your choice with the options '--localhost --config=<dspace-install>/config/modules/rdf/fuseki-assembler.ttl'. Instead of changing into the directory you unpacked fuseki to, you may set the variable FUSEKI_HOME. If you're using Linux, unpacked fuseki to /usr/local/jena-fuseki-1.0.1 and installed DSpace to /var/dspace this would look like this:

bash
Code Block
language
 export FUSEKI_HOME=/usr/local/jena-fuseki-1.0.1 ; $FUSKI_HOME/fuseki-server --localhost --config /var/dspace/config/modules/rdf/fuseki-assembler.ttl

...

The RDFizer is a command line interface administrators can use to convert the complete repository contents, some content specified by its handle or to delete data from the Triple Store. If the Triple Store is reachable the RDFConsumer converts data at the moment it is changed within DSpace so that the Triple Store should stay synchronized with the repository. You can get the online help by executing the following command:

language
Code Block
bash
[dspace-install]/bin/dspace dsrun org.dspace.rdf.RDFizer --help

...

There are some things left to be done before dspace-rdf should be used in a productive environment. The most important topic is a good default configuration as it is configurable which metadata fields gets converted, which vocabularies gets used and which links are generated. While I added some default configuration already, it should be discussed which vocabularies should be used, which links should be generated and so on. G.e. it is impossible to automatically convert the most Bitstreams. But they should be linked at least. I'm not sure which vocabulary should be used to link them. While EPrints made some design decisions I don't follow, they already developed a vocabulary to describe repositories in RDF. We should take a look at the EPrints Ontology and decide whether it can be used in DSpace, whether it must be extended or whether a more generic Ontology could be developed to describe repositories independent from the software that is used to realize them.

The DSORelationsConverterPlugin is not configurable yet. It was just a proof of concept on how to interlink communities, collections, items and bitstreams. This is done with the commit 526a364 (updated 2014-09-01).

As soon As soon as DS-1990 (

Jira
serverDuraSpace JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
keyDS-1990
) is resolved the interface URIGenerator should be changed to use the Identifiers array the PR for DS-1990 suggest. Then a DOIURIGenerator should be developed so that DOIs in the form http://dx.doi.org/<doi> can be used to identify resources in RDF. The DOIIdentiferProviders (the one for DataCite and the other one for EZID) could be enhanced to activate content negotiation. I have a commit ready for this, but I'm waiting for my PR #537 to be merged and to solve DS-1990 (updated 2014-09-01).

The XMLUI and the JSPUI should use link-tags linking to the serializations of the converted RDF as in the following example:

...

I decided to add a Triple Store to the repository so that no data needs to be converted at the moment the converted data is accessed. This decision was done with the idea in mind that contents of repositories will much more often be read as changed. To avoid big changes in the core of DSpace and to make it easy to use dspace-rdf in existing repositories I decided that the Triple Store should extend the repository and not replace the relational database. The Triple Store can be considered as a cache for the converted data. Beside that it should be used to provide a read-only worldwide accessible SPARQL endpoint containing all converted data. The Triple Store should contain only data that is public as the access restriction of DSpace won't affect the SPARQL endpoint. For this reason dspace-rdf converts only archived, discoverable (non-privat) Items, Collections and Communities that are readable for anonymous users. Plugins converting Item metadata should check whether as specific metadata field needs to be protected or not (see org.dspace.app.util.MetadataExposure on how to check that).

 

Work in progress

Info
titleWork in progress

The following information will become part of this documentation as soon as they are ready. The following part of this document is work in progress and not finished yet.

DSpace comes with Dublin Core as default metadata schema and contains several metadata fields out of the box. While new metadata schemas and fields can be added or existing once can be deleted many DSpace users will use the default metadata fields and extend them if necessary. This in mind we should create a default configuration to convert the default metadata fields of DSpace into RDF. This configuration can easily be adjusted to the special needs of a repository or extended to export local fields. Beside the metadata we should describe the repository itself in RDF and convert the structure of the repository (communities and collections) as well. The following part documents the default configuration of the rdf module.

Metadata Conversion

 

 

Describing the repository and its structure in RDF

To describe the repository and its structure (communities and collections) a DSpace Repository Ontology was created. The DSpace Repository Ontology contains classes like Repository, Community, Collection and Item. The class Repository is a subclass of void:Dataset. Everyone using dspace-rdf should consider to add a description of the repository as a Dataset. This can be done with the StaticDSOConverterPlugin as described above. The following part is an example for the contents of the file [dspace-install]/config/modules/rdf/contstant-data-site.ttl which can be used to add static to the description of the repository itself:

Code Block
title[dspace-install]/config/modules/rdf/constant-data-site.ttl
@prefix void: <http://rdfs.org/ns/void#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
 
@prefix : <#> .
 
# void expects the following URI to be a unique reference to the description of an single repository only.
# See http://www.w3.org/TR/2011/NOTE-void-20110303/#webpage for further information.
<>  foaf:homepage <http://dspace.demo.org/jspui> ;
# you may add a link to the homepage of the organization running the repository this way.
    foaf:page <http://www.dspace.org> ;
# you can use dublin core terms to describe the repository, see http://www.w3.org/TR/2011/NOTE-void-20110303/#dublin-core 
    dcterms:title "DSpace Demo Repository" ;
    dcterms:publisher :DSpaceCommiters ;
    dcterms:description "This is a demonstration instance of DSpace. This repository is not a repository that gains persistence or usable content. It is a sandbox repository demonstrating a repository sofware." ;
# use some more information described by void (see http://www.w3.org/TR/2011/NOTE-void-20110303/).
    void:sparqlEndpoint <http://demo.dspace.org/tripestore/dspace/sparql> ;
    void:feature <http://www.w3.org/ns/formats/N3> ;
    void:feature <http://www.w3.org/ns/formats/N-Triples> ;
    void:feature <http://www.w3.org/ns/formats/RDF_XML> ;
    void:feature <http://www.w3.org/ns/formats/Turtle> ;
    void:rootResource <> ;
    .
 
# Describe the publisher:
:DSpaceCommiters    a               foaf:Organization ;
                    foaf:homepage   <https://wiki.duraspace.org/display/DSPACE/DSpaceContributors> ;
                    .