Old Release

This documentation relates to an old version of DSpace, version 5.x. Looking for another version? See all documentation.

Support for DSpace 5 ended on January 1, 2023.  See Support for DSpace 5 and 6 is ending in 2023

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Introduction

Exchanging repository contents

The most sites on the Internet are oriented towards human consumption. While HTML may be a good format to create websites it is not a good format to export data in a way a computer can work with. Like the most software for repositories, DSpace supports OAI-PMH as an interface to export the stored data. While OAI-PMH is well known in the field of repositories it is rarely known elsewhere (e.g. Google retired its support for OAI-PMH in 2008). The Semantic Web is an generic approach to publish data on the Internet together with information about its semantics. It is not limited to repositories or libraries and has a growing user group. The W3C released standards like RDF or SPARQL for publishing structured data on the web in a way computers can easily work with. The data stored in repositories is particularly suited to be used in the Semantic Web, as meta data is already available. It doesn’t have to be generated or entered manually for publication as Linked Data. For most repositories, at least for Open Access repositories, it is quite important to share their stored content. Linked Data is a rather big chance for repositories to present their content in a way it can easily be accessed, interlinked and (re)used.

Terminology

We don't want to give a full introduction into the Semantic Web and its technologies here as there can by found many on the web. Nevertheless we want to give a short glossary about the terms used most often in this context to make the following documentation more readable.

Semantic WebThe term "Semantic Web" refers to the part of the Internet containing Linked Data. As the World Wide Web the Semantic Web is created by links between the data.

Linked Data

Linked Open Data

Data in RDF, following the Linked Data Principles are called Linked Data. The Linked Data Principles describes expected behavior by data publishers that shall ensure that the data published is easy to find, easy to retrieve, can be linked easily and links to other data as well.

Linked Open Data is Linked Data, published using an open license. Technically there is no difference between Linked Data and Linked Open Data (often abbreviated as LOD), it is only a question of the license used to publish.

RDF
RDF/XML
Turtle
N-Triples
N3-Notation
RDF is an acronym for Resource Description Framework, a meta data model. Don't think of RDF as a format, as it is a model. Nevertheless there are different formats to serialize data following RDF. RDF/XML, Turtle, N-Triples and N3-Notation are probably the most known formats to serialize data in RDF. While RDF/XML uses XML, Turtle, N-Triples and N3-Notation don't and are easier for humans to read and write. When we use RDF in the configuration files of DSpace, we currently prefer Turtle (but the code should be able to deal with all serializations).
Triple StoreA triple store is a database to natively store data following the RDF approach. As you provide a relational database for DSpace you have to provide a Triple Store for DSpace if you want to use the LOD support.
SPARQLThe SPARQL Protocol and RDF Query Language is a family of protocols to query triple stores. Since SPARQL version 1.1 it can be used to manipulate triple stores as well, to store, delete or updata data in triple stores. DSpace uses SPARQL 1.1 Graph Store HTTP Protocoll and SPARQL 1.1 Query Language to communicate with the Triple Store. The SPARQL 1.1 Query Language is often referred simply as SPARQL, so expect the SPARQL 1.1 Query Language it if no other specific protocol out of the SPARQL familliy is specified explicitly.
SPARQL endpointA SPARQL endpoint is an SPARQL interface of a triple store. Since SPARQL 1.1 a SPARQL endpoint can be read-only, allowing to query the stored data only or it can be read-writable allowing to modified stored data as well. If talking about a SPARQL endpoint without specifying which SPARQL protocol is used, a endpoint supporting SPARQL 1.1 Query Language is meant.

Linked (Open) Data Support within DSpace

Starting with DSpace 5.0 DSpace supports to provide stored contents as Linked (Open) Data.

Architecture / Concept

To publish content stored in DSpace as Linked (Open) Data the data has to be converted into RDF. The conversion into RDF has to be configurable as different DSpace instances may uses different meta data schemata, different persistent identifiers (DOI, Handle, ...) and so on. Depending on the content to convert, the configuration and other parameters the conversion may be time and performance intensive. Contents of repositories is much more often read then created, deleted or changed as the main target of repositories is to safely store their contents. For this reasons content stored within DSpace will be converted directly after it was created or updated and the converted data is stored in a triple store. The triple store serves as a cache and provides a SPARQL endpoint to make the converted data accessible using SPARQL. The conversion is triggered automatically by the DSpace event system and can be started manually using a command line interface (both are documented below). There is no need to backup the tripel store, as all data stored in the triple store can be restored out of the contents stored in DSpace else-where (in the assetstore(s) and the database). Beside the SPARQL endpoint the data should be published as RDF serialization as well. With dspace-rdf DSpace offers a module that loads converted data from the triple store and provides it as RDF serialization (it currently supports RDF/XML, Turtle and N-Triples).

Repositories use Persistent Identifiers to make content citable and to address contents. Following the Linked Data Principles DSpace uses Persistent Identifier in the form of HTTP(S)-URIs, converting a handle to http://hd.handle.net/<handle> and a DOI to http://dx.doi.org/<doi>. Bringing it all together the Linked Data support of DSpace extends all three Layers: the storage layer with a triple store, the business logic with classes to convert stored contents into RDF and the application layer with a module to publish RDF serializations. As you can use DSpace with Oracle or Postgresql you may choose between different triple stores. The only requirements are that the triple store must support SPARQL 1.1 Query Language and SPARQL 1.1 Graph Store HTTP Protocol as DSpace uses them to store, update, delete and load converted data in/out of the triple store and uses the triple store to provide the data over a SPARQL endpoint.

Store public data only in the triple store!

The triple store should contain only data that is public as the access restriction of DSpace won't affect the SPARQL endpoint. For this reason DSpace converts only archived, discoverable (non-privat) Items, Collections and Communities that are readable for anonymous users. Please consider this while configuring and/or extending DSpace's Linked Data support.

The package org.dspace.rdf.conversion contains the classes used to convert the repository's content to RDF. The conversion itself is done by plugins. The interface org.dspace.rdf.conversion.ConverterPlugin is really simple, so take a look if you can program Java and want to extend the conversion. The only thing important is, that plugins must only create RDF that can be made publicly available as the triple store provides it using a sparql endpoint for which DSpace's access restrtictions do not apply. Plugins converting meta data should check whether as specific meta data field needs to be protected or not (see org.dspace.app.util.MetadataExposure on how to check that). The MetadataConverterPlugin is heavily configurable (see below) and is used to convert metadata of Items. The StaticDSOConverterPlugin can be used to add static RDF Triple (see below). The SimpleDSORelationsConverterPlugin creates links between items and collections, collections and communities, subcommunitites and their parents and between top-level communities and the information representing the repository itself.

As different repositories uses different persistent identifiers to address their content, different algorithms to create URIs used within the converted data can be implemented. Currently HTTP(S)-URIs of the repository (called local URIs), handles and DOIs can be used. See the configuration part of this document for further information. If you want to add another algorithm, take a look on the interface org.dspace.rdf.storage.URIGenerator.

Install a Triple Store

In addition to a normal DSpace installation you have to install a triple store. You can use any triple store that supports SPARQL 1.1 Query Language and SPARQL 1.1 Graph Store HTTP Protocol. If you do not have one yet, you can use Apache Fuseki. Download Fuseki from its official download page and unpack the downloaded archive. The archive contains several scripts to start fuseki. Use the start script appropriated for the OS of your choice with the options '--localhost --config=<dspace-install>/config/modules/rdf/fuseki-assembler.ttl'. Instead of changing into the directory you unpacked fuseki to, you may set the variable FUSEKI_HOME. If you're using Linux and bash, unpacked fuseki to /usr/local/jena-fuseki-1.0.1 and installed DSpace to [dspace-install] this would look like this:

export FUSEKI_HOME=/usr/local/jena-fuseki-1.0.1 ; $FUSKI_HOME/fuseki-server --localhost --config [dspace-install]/config/modules/rdf/fuseki-assembler.ttl

Fuseki's archive contains a script to start fuseki automatically at startup as well.

Make fuseki connect to localhost only, by using the argument --localhost when launching if you use the configuration provided with DSpace! The configuration contains a writeable SPARQL endpoint that allows to change/delete the content of your triple store.

Use Apache mod proxy, mod rewrite or any other appropriate web server/proxy to make localhost:3030/dspace/sparql readable from the internet. Use the address under which it is accessible as the address of your public sparql endpoint (see the property public.sparql.endpoint in the configuration reference below.).

 

The configuration provided within DSpace makes it store the files for the triple store under [dspace-install]/triplestore. Using this configuration, Fuseki provides three SPARQL endpoints. Two read-only sparql endpoint and one that can be used to change the data of the triple store. You should not use this configuration and let Fuseki connect to the internet directly as it would make it possible for anyone to delete, change or add information to the triple store. The option --localhost tells fuseki to listen only on the loopback device. You can use Apache mod_proxy or any other web or proxy server to make the read-only SPARQL endpoint accessible from the internet. With the configuration described Fueski listen to the port 3030 using http. Using the address http://localhost:3030/ you can connect to the Fuseki Web UI, http://localhost:3030/dspace/data addresses a writeable SPARQL 1.1 HTTP Graph Store Protocol endpoint, and  http://localhost:3030/dspace/get a read-only one. Under http://localhost:3030/dspace/sparql a read-only SPARQL 1.1 Query Language endpoint can be found. The first one of these endpoints must be not accessible by the internet, while the last one should be accessible publicly.

Configuration Reference

There are several configuration files to configure DSpace's LOD support. The main configuration file can be found under [dspace-source]/dspace/config/modules/rdf.cfg, all other files are positioned in the directory [dspace-source]/dspace/config/modules/rdf/. You'll have to configure where to find and how to connect to the triple store. You may configure how to generate URIs to be used within the generated Linked Data and how to convert the contents stored in DSpace into RDF. We will guide you through the configuration file by file.

[dspace-source]/dspace/config/modules/rdf.cfg

Property:public.sparql.endpoint
Example
Value:
public.sparql.endpoint = http://${dspace.baseUrl}/sparql
Informational
Note:
Address of the read-only public SPARQL endpoint supporting SPARQL 1.1 Query Language.
Property:URIGenerator
Example
Value:
URIGenerator = org.dspace.rdf.storage.LocalURIGenerator
Informational
Note:

Defines the class that that generates the URIs to be used within the converted data. The LocalURIGenerator gernates URIs using the ${dspace.url} property. The class org.dspace.rdf.storage.HandleURIGenerator uses handles in form of http URLs. It uses the property ${handle.canoncial.prefix} to convert handles into https URLs. The class org.dspace.rdf.storage.DOIURIGenerator uses DOIs in the form of http URLs if possible or local URIs if there are no DOIs. It uses the DOI resolver "http://dx.doi.org" to convert DOIs into http URLs. The class org.dspace.rdf.storage.DOIHandleGenerator does the same but juses handles as fallback if no DOI exist. The fallbacks are necessary as DOIs are used for Items only currently and not for Communities or Collections.

Property:converter
Example
Value:
converter = org.dspace.rdf.conversion.RDFConverterImpl
Informational
Note:
This property sets the class that manage the hole conversion process. Currently there shouldn't be any need to change it.
Property:converter.plugins
Example
Value:

converter.plugins = org.dspace.rdf.conversion.StaticDSOConverterPlugin, \
                    org.dspace.rdf.conversion.MetadataConverterPlugin, \
                    org.dspace.rdf.conversion.SimpleDSORelationsConverterPlugin

Informational
Note:
List all plugins in to be used during the conversion of DSpace contents into RDF. If you write a new conversion plugin you want to add its class path to this property.
Property: converter.DSOtypes
Example
Value:
 converter.DSOtypes = SITE, COMMUNITY, COLLECTION, ITEM
Informational
Note:
 Define which kind of DSpaceObjects should be converted. Bundles and Bitstreams will be converted as part of the Item they belong to. Don't add EPersons here unless you really know what you do. All converted data is stored in the triple store that provides a publicly readable SPARQL endpoint. So all data converted into RDF is exposed publicly. Every DSO type you add here has to get a http URI to be referenced in the generated RDF, which is another reason not to add EPersons here currently.
Property: storage
Example
Value:
 storage = org.dspace.rdf.storage.RDFStorageImpl
Informational
Note:
 Configure which class to use to store the converted data. This class handles the connection to the SPARQL endpoint. Currently there is only one implementation, so there is no need/possibility to change this property.
Property: storage.graphstore.endpoint
Example
Value:
 storage.graphstore.endpoint = http://localhost:3030/dspace/data
Informational
Note:
 Address of a writable SPARQL 1.1 Graph Store HTTP Protocol endpoint. This address is used to create, update and delete converted data in the triple store. If you use fuseki with the configuration provided as part of DSpace 5, you can leave this as it is. If you use another Triple Store or configure Fuseki on your own, change this property to point on a writeable SPARQL endpoint supporting the SPARQL 1.1 Graph Store HTTP Protocol.
Property: storage.graphstore.authentication
Example
Value:
 storage.graphstore.authentication = no
Informational
Note:
 Defines whether to use http basic authentication to connect to the writable SPARQL 1.1 Graph Store HTTP Protocol endpoint.
Properties:

storage.graphstore.login
storage.graphstore.password

Example
Values:

storage.graphstore.login = dspace
storage.graphstore.password =ecapsd

Informational
Note:
Credentials for the http basic authentictaion if it is necessary to connect to the writable SPARQL 1.1 Graphs Store HTTP Protocol endpoint.
Property:storage.sparql.endpoint
Example
Value:
storage.sparql.endpoint = http://localhost:3030/dspace/sparql
Informational
Note:
Beside a writable SPARQL 1.1 Graph Store HTTP Protocol endpoint, DSpace needs a SPARQL 1.1 Query Language endpoint, which can be read-only. This properties allows you to set an address to be used to connect to such a SPARQL endpoint. If you leave this property empty the property ${public.sparql.endpoint} will be used instead.
Properties:

storage.sparql.authentication
storage.sparql.login
storage.sparql.password

Example
Values:

storage.sparql.authentication = yes
storage.sparql.login = dspace
storage.sparql.password = ecapsd

Informational
Note:
As for the SPARQL 1.1 Graph Store HTTP Protocol you can configure DSpace to use HTTP Basic authentication to authenticate against the (read-only) SPARQL 1.1 Query Language endpoint.
Property:contextPath
Example
Value:
contextPath = ${dspace.baseUrl}/rdf
Informational
Note:
The content negotiation needs to know where to refer if anyone asks for RDF serializations of content stored within DSpace. This property sets the URL where the dspace-rdf module can be reached on the internet (depending on how you deployed it).
Property:contentNegotiation.enable
Example
Value:
contentNegotiation.enable = true
Informational
Note:
Defines whether the content negotiation should be activated. Set this true, if you use Linked Data support.
The following properties configure the StaticDSOConverterPlugin.
Properties:constant.data.GENERAL
constant.data.COLLECTION
constant.data.COMMUNITY
constant.data.ITEM
constant.data.SITE
Example
Values:

constant.data.GENERAL = ${dspace.dir}/config/modules/rdf/constant-data-general.ttl
constant.data.COLLECTION = ${dspace.dir}/config/modules/rdf/constant-data-collection.ttl
constant.data.COMMUNITY = ${dspace.dir}/config/modules/rdf/constant-data-community.ttl
constant.data.ITEM = ${dspace.dir}/config/modules/rdf/constant-data-item.ttl
constant.data.SITE = ${dspace.dir}/config/modules/rdf/constant-data-site.ttl

Informational
Note:

The properties define files to read static data from. This data should be in RDF, by default Turtle is used as serialization. The data in the file referenced by the property ${constant.data.GENERAL} will be included in every Entity that is converted to RDF. E.g. it can be used to point to the address of the public readable SPARQL endpoint or may contain the name of the institution running DSpace.

The other properties define files that will be included if a DSpace Object of the specified type (collection, community, item or site) is converted. This makes it possible to add static content to every Item, every Collection, ...

The following properties configure the MetadataConverterPlugin.
Property:metadata.mappings
Example
Value:
metadata.mappings = ${dspace.dir}/config/modules/rdf/metadata-rdf-mapping.ttl
Informational
Note:
Defines the file that contains the mappings for the MetadataConverterPlugin. See below the description of the configuration file [dspace-source]/dspace/config/modules/rdf/metadata-rdf-mapping.ttl.
Property:metadata.schema
Example
Value:

metadata.schema = file://${dspace.dir}/config/modules/rdf/metadata-rdf-schema.ttl

Informational
Note:
Configures the URL used to load the RDF Schema of the DSpace Metadata RDF mapping Vocabulary. Using a file:// URI makes it possible to convert DSpace content without having an internet connection. The version of the schema has to be the right one for the used code, in DSpace 5.0 we use the version 0.2.0. This Schema can be found here as well: http://digital-repositories.org/ontologies/dspace-metadata-mapping/0.2.0 the newest version of the Schema can be found here: http://digital-repositories.org/ontologies/dspace-metadata-mapping/.
Property:

metadata.prefixes

Example
Value:

metadata.prefixes = ${dspace.dir}/config/modules/rdf/metadata-prefixes.ttl

Informational
Note:
If you want to use prefixes in RDF serializations that support prefixes, you can define these perfixes in the file referenced by this property.
The following protperties configure the SimpleDSORelationsConverterPlugin
Property:simplerelations.prefixes
Example
Value:
simplerelations.prefixes = ${dspace.dir}/config/modules/rdf/simple-relations-prefixes.ttl
Informational
Note:
If you want to use prefixes in RDF serializations that support prefixes, you can define these perfixes in the file referenced by this property.
Property:simplerelations.site2community
Example
Value:
simplerelations.site2community = http://purl.org/dc/terms/hasPart, http://digital-repositories.org/ontologies/dspace/0.1.0#hasCommunity
Informational
Note:
Defines the predicates used to link from the data representing the hole repository to the top level communities. Defining multiple predicates separated by commata will result in multiple triples.
Property:simplerelations.community2site
Example
Value:
simplerelations.community2site = http://purl.org/dc/terms/isPartOf, http://digital-repositories.org/ontologies/dspace/0.1.0#isPartOfRepository
Informational
Note:
Defines the predicates used to link from the top level communities to the data representing the hole repository. Defining multiple predicates separated by commata will result in multiple triples.
Property:simplerelations.community2subcommunity
Example
Value:
simplerelations.community2subcommunity = http://purl.org/dc/terms/hasPart, http://digital-repositories.org/ontologies/dspace/0.1.0#hasSubcommunity
Informational
Note:
Defines the predicates used to link from communities to their subcommunities. Defining multiple predicates separated by commata will result in multiple triples.
Property:simplerelations.subcommunity2community
Example
Value:
simplerelations.subcommunity2community = http://purl.org/dc/terms/isPartOf, http://digital-repositories.org/ontologies/dspace/0.1.0#isSubcommunityOf
Informational
Note:
Defines the predicates used to link from subcommunities to the communities they belong to. Defining multiple predicates separated by commata will result in multiple triples.
Property:simplerelations.community2collection
Example
Value:
simplerelations.community2collection = http://purl.org/dc/terms/hasPart, http://digital-repositories.org/ontologies/dspace/0.1.0#hasCollection
Informational
Note:
Defines the predicates used to link from communities to their collections. Defining multiple predicates separated by commata will result in multiple triples.
Property:simplerelations.collection2community
Example
Value:
simplerelations.collection2community = http://purl.org/dc/terms/isPartOf, http://digital-repositories.org/ontologies/dspace/0.1.0#isPartOfCommunity
Informational
Note:
Defines the predicates used to link from collections to the communities they belong to. Defining multiple predicates separated by commata will result in multiple triples.
Property:simplerelations.collection2item
Example
Value:
simplerelations.collection2item = http://purl.org/dc/terms/hasPart, http://digital-repositories.org/ontologies/dspace/0.1.0#hasItem
Informational
Note:
Defines the predicates used to link from collections to their items. Defining multiple predicates separated by commata will result in multiple triples.
Property:simplerelations.item2collection
Example Value:simplerelations.item2collection = http://purl.org/dc/terms/isPartOf, http://digital-repositories.org/ontologies/dspace/0.1.0#isPartOfCollection
Informational
Note:
Defines the predicates used to link from items to the collections they belong to. Defining multiple predicates separated by commata will result in multiple triples.
Property:simplerelations.item2bitstream
Example
Value:
simplerelations.item2bitstream = http://purl.org/dc/terms/hasPart, http://digital-repositories.org/ontologies/dspace/0.1.0#hasBitstream
Informational
Note:
Defines the predicates used to link from item to their bitstreams. Defining multiple predicates separated by commata will result in multiple triples.

 

Maintenance

 

 

 

 

  • No labels