The REST API, Fedora resource identifiers, The Resource index, Named Graphs and the semantic web

Note: The scope of this document goes some way beyond the London Committers' meeting agenda item - Interfaces. Comments welcomed on which are the most relevant issues to cover in the meeting.

This is an attempt to capture this thread

Presentation given at the London 2010 Committer's meeting is here

Fedora in the context of the Semantic web and Linked Data

The basic premise of the proposals below is to support exposing Fedora resources and their relationships in a Semantic Web and Linked Data friendly way. It attempts to "unify" the identifiers and relationships used for Fedora resources with the new REST API and the resource index.

To be Semantic Web and Linked Data friendly involves

The new REST API is a move forward in supporting these requirements as we now have dereferenceable http URIs for Fedora resources.

What this proposal is not about

Current situation

Some Principles

Some basic principles that should be followed by the recommendations below:

Proposals

1 Deprecate the "LITE" APIs.

2 Define canonical dereferenceable URIs for Fedora resources

3 Restructure the resource index as named graphs

A named graph is a set of triples named by a URI.

For instance, the relationships contained in the object myns:somepid could be identified as the graph <#myns:somepid>. Similarly the relationships expressed by the datastream myns:somepid/RELS-EXT could be identified as <#myns:somepid/RELS-EXT>.

Triple query languages such as SPARQL and iTQL support queries across multiple graphs. Using this to query relationships over the repository as a whole would be complex - it would be painful to have to assemble a list of named graphs to query against.

Mulgara is a quad store, relationships are effectively stored as <graph> <subject> <predicate> <object>. Currently all triples are stored in a single <#ri> graph.

Mulgara supports creating models (graphs) that are views of other models (graphs), eg

Thus, a hierarchy of named graphs could be created, for example:

All graphs should be "rooted" in the above structure, there should be no means of creating graphs other than by creating objects and datastreams.

Why do this?

When an object is created (updated, deleted), the object's relationships are propagated to the triple store. If two objects are created expressing an identical relationship, a single triple will be created in the resource index.. If one of those objects is then deleted, the triple will be deleted from the triple store even though it is still being asserted by another object. The resource index will not be an accurate reflection of the triples in the repository. Hence the current restrictions on RELS-EXT and RELS-INT that subjects must be the Fedora object or datastreams from the containing object, to prevent two objects asserting the same relationship.

With named graphs, relationships created by different objects would be in different graphs. Deleting one object would remove the graph for that object - but the graph for a different object asserting the same relationship would remain - the resource index would be an accurate reflection of the triples in the repository.

Therefore this would support indexing of arbitrary RDF metadata datastreams in the resource index - for instance supporting metadata schemes that are not "flat".

There are some Dublin Core examples [DEV:1] where Fedora would currently be unable to index the RDF in the resource index, including

[DEV:1] http://dublincore.org/documents/dc-rdf/#app-a

Questions and issues

4 Declarative specification of triples to create in the resource index

Triples are currently created for

The "specification" of what triples get generated is largely in imperative Java code, both in terms of the individual triples and which datastreams generate triples.

In the future we may wish to allow creation of triples from

To support a flexible and extensible approach, we could define the generation of triples using content models (system and user) and a declarative approach for specifying triples (XSLT, GRDDL[DEV:1]).

Updating of the resource index could then take place by querying the disseminations and datastreams specified by the system and user content models when an object is created, updated or deleted.

The resource index is currently updated by code in DOManager. An alternative to this could be to reimplement the update mechanism using a management decorator pattern (declared in fedora.fcfg).

[DEV:1] GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. It is a technique for obtaining RDF data from XML documents and in particular XHTML pages: GRDDL Primer http://www.w3.org/TR/grddl-primer/

Questions and Issues

5 Extend the REST API to incorporate relationships

The REST API does not currently implement methods for disseminating and managing relationships.

API methods should be implemented for querying and managing relationships.

For example

Alternatives to explicit "relationships" URIs could be

Modifications could be specified by

Additionally, or alternatively, "writeable methods" could be provided as a generic mechanism to implement this, eg PUT a SPARQL Update to /objects/{pid}/methods/{sDefPid}/relationships?datastream=RELS-EXT

All of the relationship API methods should operate directly on Fedora objects to remove dependency on the resource index - relationship GET methods should query the object directly rather than issuing RI queries.

[DEV:1] SPARQL Update - A language for updating RDF graphs: http://www.w3.org/Submission/SPARQL-Update/

Questions and issues

6 Support for dereferenceable http URI resource identifiers in relationships

Fedora resources are currently identified using the info:fedora namespace. If resource identifiers are exposed as dereferenceable http URIs using the REST API URIs, it would be useful to support these identifiers in relationships. Ie the ability to query and manipulate relationships using both the info:fedora namespace for Fedora resources and the http REST URIs.

REST API

RISearch

A Spanner (Wrench) in the works...

Fedora repositories generally sit behind some form of user interface application.
These applications will (in some cases) expose their own URLs for accessing Fedora resources
Should we instead be providing mechanisms to support exposing these URLs as the canonical http URIs for Fedora resources?