All Versions


DSpace Documentation


Page tree

Old Release

This documentation relates to an old version of DSpace, version 4.x. Looking for another version? See all documentation.

Skip to end of metadata
Go to start of metadata

What is DSpace REST API

The REST API module exposes machine readable representations of content in Communities, Collections, Items, and Bitstreams.

The DSpace 4.0 REST API (Jersey) allows for data in DSpace to be re-used by external systems to make new uses of your data. The DSpace 4.0 REST API provides READ-ONLY access via JSON or XML to publicly accessibly Communities, Collections, Items and Bitstreams. Only non-hidden item metadata (e.g. provenance is hidden by default) are exposed at the Item endpoint. We intend that future DSpace releases will grow and evolve the REST API to support a greater set of features, based on community input and support. This Jersey implementation of a REST API for DSpace is not related to other add-on modules providing REST-API support for DSpace, such as GSOC REST API, Wijiti REST API, Hedtek REST API, or SimpleREST. 

REST Endpoints

We have modeled the DSpace entities of Communities, Collections, Items, and Bitstreams. The API is not a straight database schema dump of these entities, but provides some wrapping that makes it easy to follow relationships in the API output.

HTTP Header: Accept

Note: You must set your request header's "Accept" property to either JSON (application/json) or XML (application/xml) depending on the format you prefer to work with. 

Example usage from command line in XML format with pretty printing:

curl -s -H "Accept: application/xml" http://localhost:8080/rest/communities | xmllint --format -

For this documentation, we will assume that the URL to the "REST" webapp will be http://localhost:8080/rest/ for production systems, this address will be slightly different, such as: http://demo.dspace.org/rest/. The path to an endpoint, will go after the /rest/, such as /rest/communities, all-together this is: http://localhost:8080/rest/communities

Another thing to note is that there are Query Parameters that you can tack on to the end of an endpoint to do extra things. The most commonly used one in this API is "?expand". Instead of every API call defaulting to giving you every possible piece of information about it, it only gives a most commonly used set by default and gives the more "expensive" information when you deliberately request it. Each endpoint will provide a list of available expands in the output, but for getting started, you can start with ?expand=all, to make the endpoint provide all of its information (parent objects, metadata, child objects). You can include multiple expands, such as: ?expand=collections,subCommunities .

Communities

Communities in DSpace are used for organization and hierarchy, and are containers that hold sub-Communities and Collections. (ex: Department of Engineering)

  
List Communities/communities/
Specific Community/communities/:communityID
Community ExpandsparentCommunity, collections, subCommunities, logo, all

Collections

Collections in DSpace are containers of Items. (ex: Engineering Faculty Publications)

  
List Collections/collections/
Specific Collection/collections/:collectionID
Collection ExpandsparentCommunityList, parentCommunity, items, license, logo, all

You can access all the collections in a specific community through: /communities/:communityID?expand=all

Items

Items in DSpace represent a "work" and combine metadata and files, known as Bitstreams.

  
Specific Item/items/:itemID
Item Expandsmetadata, parentCollection, parentCollectionList, parentCommunityList, bitstreams, all

You can access all the items in a specific collection through: /collections/:collectionID?expand=items

Bitstreams

Bitstreams are files. They have a filename, size (in bytes), and a file format. Typically in DSpace, the Bitstream will the "full text" article, or some other media. Some files are the actual file that was uploaded (tagged with bundleName:ORIGINAL), others are DSpace-generated files that are derivatives or renditions, such as text-extraction, or thumbnails. You can download files/bitstreams. DSpace doesn't really limit the type of files that it takes in, so this could be PDF, JPG, audio, video, zip, or other. Also, the logo for a Collection or a Community, is also a Bitstream.

  
Specific Bitstream/bitstreams/:bitstreamID
Download a Bitstream/bitstreams/:bitstreamID/retrieve
Bitstream Expandsparent, all

You can access all the Bitstreams in a specific Item through: /items/:itemID?expand=bitstreams

You can access the parent object of a Bitstream (normally an Item, but possibly a Collection or Community when it is its logo) through: /bitstreams/:bitstreamID?expand=parent

Introduction to Jersey for developers

The REST API for DSpace 4.0 is implemented using Jersey, the reference implementation of the Java standard for building RESTful Web Services (JAX-RS 1, JSR 311). That means this API should be easier to expand and maintain than other API approaches, as this approach has been widely adopted in the industry.

Below is some sample Jersey code of how you wire up resources, choose to serialize to HTML, JSON or XML. And between display single-entity vs. display list-of-entities.

@Path("/collections")
public class CollectionsResource {
    @GET
    @Path("/")
    @Produces(MediaType.TEXT_HTML)
    public String listHTML() {...}


    @GET
    @Path("/")
    @Produces({MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML})
    public org.dspace.rest.common.Collection[] list(@QueryParam("expand") String expand) {...}


    @GET
    @Path("/{collection_id}")
    @Produces({MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML})
    public org.dspace.rest.common.Collection getCollection(@PathParam("collection_id") Integer collection_id, @QueryParam("expand") String expand) {...}

There was no central ProviderRegistry that you have to declare your path. Your free to use @annotations to get your code to respond to requests. There are helpful parameter helpers to extract parameters into Java variables.

Configuration for DSpace REST

Propertystats
Example Valuetrue
Informational NoteBoolean value indicates whether statistics should be recorded for access via the REST API; Defaults to 'false'.

Recording Proxy Access by Tools

For the purpose of more accurate statistics, a web-based tool may specify who is using it, by adding parameters to the request:

http://localhost:8080/rest/items/:ID?userIP=ip&userAgent=userAgent&xforwarderfor=xforwarderfor

If no parameters are given, the details of the HTTP request's sender are used in statistics. This enables tools to record the details of their user rather than themselves.

Deploying the DSpace REST API in your Servlet Container

The dspace-rest module is automatically configured to compile and build with DSpace 4.0, so a mvn+ant process will create the webapp. To make it work in your environment, you would just need to add a context entry for it in your servlet container. For example, in tomcat, one might alter $CATALINA_HOME/conf/server.xml and add:

<Context path="/rest" docBase="/dspace/webapps/rest"/>

Additional Information

Additional information can be found in the README for dspace-rest, and in the GitHub Pull Request for DSpace REST (Jersey).

 

  • No labels

9 Comments

  1. The section describing how to create new Jersey endpoints doesn't really belong to User Docs. We don't yet have a separate section in the official docs for Developer Docs, but we should create one. Also, what about describing this in JavaDoc?

  2. We should also add some documentation on the following:

    1) How someone can use more that one expands in the same request (ie: concatenation with a comma)

    2) Examples of responses in both formats. What the user should expect. Are the output fields in their place when no value applies for them or the are ommitted in the output?

    3) New config file (rest.cfg) and the statistics option

     

    1. ad 2): also error reporting - what should user expect in case of error (errorneous input, server errors)

      4) deploying the REST webapp (IMHO we should encourage people to deploy it by default so that we can later rely on it being present and avoid duplicating functionality like e.g. RSS that is common to both UIs by putting it into REST instead), controlling access to the REST endpoint

      1. (In response to Ivan's comments)

        2) This isn't settled ground yet for error reporting, but you can expect to trust your HTTP Status Codes, especially since I haven't produce pretty/json/xml error responses, mostly just a stack-trace in HTML, which is jersey default WebAppException. I think we're throwing:

        • 200-OK, 
        • 401 (or 403 not sure...)-Unauthorized, can't see this restricted content, 
        • 404 - Object doesn't exist, 
        • 406-API can't serialize in the requested format, 
        • 500-InternalServerError

        4) I've added a small section on example of how to add a context for /rest. I think refactoring internal DSpace logic to make use of REST (for search / rss / etc) is premature. To make the RSS go through REST makes the UI's make a dependency on REST, which I'm not ready to do, in this initial release.

        We will want to refactor internal DSpace business logic to simplify how each webapp (JSPUI, XMLUI, OAI, LNI, ..., REST) does certain things. i.e. Move some of that bloated logic out of webapp, and into central DSpace-Core. Once the refactor goes through, then we can determine if things get "outsourced" to REST API.

    2. 1) I've added an example of multiple expands with a comma.

      2) Haven't added examples of response data yet, but there are tools designed exactly for this purpose. Bram has showed me an example of "API DOCUMENTATION" from miredot, and Bill McKinney has generated swagger documentation for a different flavor of DSpace Jersey Rest. So, at this point, I'm not sure how much should be in DSpace Documentation, vs how much should be in a DSpace Developer Documentation, if there were such a thing...

      3) I have just added rest.cfg and statistics information, from Anja's latest PR.

  3. The documentation here should also note additional Query Parameters like

    ?limit= and ?offset=

    The default limit on the /collections/:ID?expand=items query appears to be 100, which means the API probably isn't useful in production without using these additional parameters.

    1. I'll  love to someone teaches me how to use ?limit= and ?offset= because the limit of 100 items is very detrimental for me. Can someone help me?

      1. Diego -

        For my project, I just needed to keep track of the offset into the collection, so that I could get all the items from it.  Here's a Python snippet that does that.  I haven't played with 'limit', but I'd assume you could change it from default if you wanted to bring fewer or more items back on every query.

        # go through every item in collection 60, 100 at a time
        for i in range(100,item_count,100):
            iqry = requests.get('http://192.168.1.5:8080/rest/collections/60?expand=items&offset='+str(i))

        Hope this helps.  You can also google around and find a few other mentions of the use of limit and offset.

        Matt

        1. Matt,

          Thank you! I was trying to use limit=  with expand=all and getting nothing. Now I have just what I need.

          d