Table of Contents

Update: This page is outdated. For the latest information on this topic, read the annotated version of the RESTful Fedora Proposal. A PDF of the original proposal is available at http://www.yourmediashelf.com/reference/fedora/webservices/REST%20Proposal%20-%20v1.1.pdf

Introduction

Concept

The fundamental concept in making web service interfaces RESTful is that almost every method we expose can be reduced to a CRUD (Create, Retrieve, Update, or Delete) action. Based on this observation, it is possible to expose an intuitive and easy to maintain REST interface by breaking up all of your methods into a number of separate controllers, each of which handles a set of CRUD operations for some aspect of the underlying model.

At this level, we are using a running definition of a controller as "something that receives an HTTP request, triggers an action within your model, and returns a response." Each controller in a RESTful interface gets its own URL. This same URL can be used to perform all CRUD operations because HTTP methods (POST, GET, PUT, DELETE) easily map to the different CRUD actions. The end result is an interface with URLs exposing CRUD controllers whose actions are triggered by HTTP methods or verbs. This is the ideal RESTful interface.

An added innovation, which further simplifies a RESTful interface is to interpret file extensions on a REST URL as requests for a specific response format. This means that requests sent to a URL ending in .xml will receive XML responses, .html will receive (X)HTML responses, and so on for any response type you wish to support (RSS, MOBILE, ATOM, etc)

Comment from Chris: Wouldn't http content negotation be considered more RESTful? I do have a bias toward indicating the desired format right in the URI, but I just wonder what a REST purist would say about it.

Further Information

For those of you who would like to learn more about the notion of a RESTful interface, or if you want to get excited about the idea, take a look at David Hinemeier Hansson's keynote speech from RailsConf 2006. The video is available at http://www.scribemedia.org/2006/07/09/dhh/ and the powerpoint can be downloaded from http://www.loudthinking.com/lt-files/worldofresources.pdf

If you watch the video, go ahead and skip the first section of the presentation; it's all conference business and DHH tooting Rail's horn. If you're only interested in the fundamental concept of RESTful interfaces, watch section 2, but the remainder of the talk presents some compelling ideas about a number of topics that anyone using Fedora has to grapple with. The code examples are Ruby/Rails specific, but the general concepts shine through regardless.

Proposed API

The proposed API should be described here.

Notes/Thoughts

Notes and thoughts about applying the RESTful approach to Fedora.

Unpack the domain model

In order to achieve a RESTful interface, we have to fully unpack our domain model, asking ourselves how we can reduce the interface to a set of CRUD controllers. This is relatively easy with Fedora.

Controllers

Full (expose all CRUD actions)

  • pidObject (pid)
    • ex: host:port/fedora/pid
  • datastream (dsId)
    • ex: host:port/fedora/pid/dsId
    • might actually require two controllers. See [ Datastream Controllers|Making Fedora RESTful#Topic__ Datastream Controllers ]
  • disseminator (question)** ex. ?

Limited (only expose GET/Retrieve action)

  • user
  • repository

    Special

  • disseminators exposed as controllers

Topic: Datastream Controllers

Flusing out the problem

The current SOAP API includes two types of methods for modifying datastreams. These are modifyDatastreamByReference and modifyDatastreamByValue. These reflect a distinction that is a bit fuzzy within our domain model. Fundamentally, we have to figure out whether there are actually two types of datastream, in which case each should get its own controller.

Speculation: It seems like the methods you end up using to modify a datastream are dictated by the controlGroup.

  • I and M datastreams are always modified by value
  • E and R datastreams are always modified by reference
    Don't you get a SOAP fault if you try to violate this rule?
    Is there ever crossover? The only crossover might be when you create the datastream, but this is partially because there isn't currently an addDatastreamByValue method.
    Does this mean that there are really two types of datastream, which would indicate the need for two controllers?

Comment from Chris: I think we do have two classes of datastream: those that are stored internally (whose content and attributes can be set), and those that are stored externally (whose attributes and location can be set). As a separate issue, when adding or modifying a datastream's content (which is only applicable to I and M datastreams), there is the question of how that data is passed in. Multipart POST? By a URI that the repository is then expected to dereference? The former seems desirable. The latter may also be valuable.

Questions

  • Is it possible/practical to condense these under one controller?

Wishlist

  • addByValue: However we implement it, there should be a method that allows you to pass datastream content to the controller along with a request to create the datastream (ie addDatastreamByValue). Currently, you have to pass that content by URL reference.
  • base64 encoding should be optional in byValue methods
  • clients should be able to provide URL for a schema to validate XML against

Extra Methods (Non-CRUD)

Often, when a method does not immediately map to a CRUD action, this is simply a product of the fact that we have not completely flushed out our domain model (meaning that you probably need to add another controller that wraps some neglected aspect of your model). However, there are always exceptions to a rule. In Ruby on Rails, they solve this problem by allowing you to add actions to a CRUD controller using a semicolon. ie. an HTTP GET sent to host:port/fedora/pid/dsId;edit should return an editable form of the datastream rather than just an a representation of the datastream.

Comment by Matthias: We had the very same discussion for the eSciDoc API and came to the conclusion to not use actions attached to GET methods. Instead, we decided to use POST for all methods which do not fit into the CRUD schema, e.g. for filter methods (see next section).

History & Version Rollback

The predominant methods that don't map directly to CRUD actions are those for tracking and working with object/datastream histories and rolling back/forward between versions.

Questions:

  • Is it possible to represent histories as a CRUD controller?
  • Is it better to use special methods? ie. GET host:port/fedora/pid/dsid.xml;history
  • What are the actions we want to expose? How are they related? Do they bear any resemblance to CRUD operations? (ie. purge = DELETE?)

Exporting

A simple export operation can be exposed as a GET to a URL with the .foxml or .mets extensions, but how do you distinguish between exporting an entire object and and getting just its foxml?

  • Should we use something like host:port/fedora/pid;export?

Mapping file extensions to response format

  • .../*.foxml returns foxml
  • .../*.xml returns xml
  • .../*.html returns (x)html

Condensing URLs

There are a couple of things we can do to simplify the URLs in the interface.

Use HTTP methods as CRUD verbs

Currently, Fedora's REST API includes verbs in its urls (ie. host:port/fedora/get/pid). By pushing our verbs out into HTTP methods, we can use a single URL (ie. host:port/fedora/pid) for all CRUD operations.

Implied controller names

Since our domain model is relatively simple, we can probably get away with using implied controller names.

Example:

Full URL: host:port/fedora/objects/pid/datastreams/dsId

Condensed URL: host:port/fedora/pid/dsId

This works quite well when we only expose retrieve actions (HTTP GET) because datastreams and disseminators are treated relatively the same on retrieval. This is what the current REST API does. It gets a bit more complicated when you need to expose create and update actions, which differ substantially between disseminators and datastreams. This is also true for exposing CRUD operations for datastreams, as is reflected in the current SOAP methods modifyDatastreamByReference and modifyDatastreamByValue.

DBXML-type Operations

(this is fodder for the wishlist)

In the API for Fedora 2.2, some new API methods like setDatastreamVersionable were added. When it comes down to it, this is just a request to change the value of an XML node. It would be nice if we had the option of accessing and modifying the contents of any (inline) xml datastream in this way. This would probably only be possible if Fedora stored its XML in an XML database like DBXML or eXist, which would introduce its own perks and drawbacks. Nonetheless, this would be a nice feature to add to the API. In addition to allowing low level calls to the API, it would also provide the opportunity to use an approach to versioning that is based on journaling changes rather than caching a new copy of the entire XML datastream with every change.

#trackbackRdf ($trackbackUtils.getContentIdentifier($page) $page.title $trackbackUtils.getPingUrl($page))
  • No labels