Amherst: JSON-LD compaction service

Created by Unknown User (acoburn), last modified on Oct 30, 2015

Title (Goal)	Amherst - JSON-LD compaction service
Primary Actor	Developer
Scope	Component
Level
Author	Unknown User (acoburn)
Story (A paragraph or two describing what happens)	In order to improve front-end (read) performance, it would be useful to store fedora resources as JSON in a key-value store (riak, mongodb, couchdb, etc, etc). That way, the objects can be more efficiently delivered to a web-based framework without needing to access fedora at all. Fedora will already generate JSON-LD in expanded form, but for application-specific use (applications that don't necessarily understand RDF), a compact form would be preferred. This would simply involve applying a context file to the expanded JSON-LD form.

A sample implementation is available here: https://github.com/acoburn/repository-extension-services/, in particular the acrepo-jsonld-service and acrepo-jsonld-cache modules.

Web Resource interaction

This service would expose an HTTP endpoint to convert fedora resources into a compact JSON-LD representation.

Deployment or Implementation notes

This service would be deployed separately from fedora, possibly on a separate machine. I envision that this would be implemented as a combination of OSGi services and camel routes that can be deployed in any OSGi container, written in Java and Blueprint XML. The implementation would require access to Fedora's HTTP API.

API-X Value Proposition

The primary use of this service in the context of API-X would be to allow for service discovery.

uc-api-ext

5 Comments

Aaron Birkland
Ah, so does this encompass two potential use cases?:
1. Providing a means to expose a representation of Fedora resources as compacted JSON
  Maybe filtering responses and translating them on the fly, perhaps in response to a Prefer header, or some other indicator that compact form is desired
  Maybe exposing a URI to a compacted representation
2. Directing requests to the cache where appropriate
  Filter incoming requests for simple GETs. If deemed satisfiable by cache lookup, then poll the cache for the object and return that.
If (2), then would this be in addition to (and behind) a caching proxy such as squid?
- Permalink
- Sep 25, 2015
Unknown User (acoburn)
Yes, this does encompass both cases. See implementations here: https://gitlab.amherst.edu/acdc/repository-extension-services/tree/master/acrepo-jsonld-cache and https://gitlab.amherst.edu/acdc/repository-extension-services/tree/master/acrepo-jsonld-service
Related to your question about (2), one could use a caching proxy as part of this, but I see this as unnecessary. In our current (fedora3) repository, we use Riak as a type of cache. Riak (like other such systems) have several advantages over a simple caching proxy (squid, varnish, etc), including: being able to shard and replicate the data over an arbitrary number of back ends (providing both higher throughput and better fault tolerance), and also allowing for Map-Reduce operations over arbitrary sets of data in the cluster, which a simple proxy cache cannot do.
In my experience, the read performance of riak is so good that an additional proxy is really unnecessary.
- Permalink
- Sep 25, 2015
Aaron Birkland
Fascinating that performance is so good! Your implementation (from what I understand, just quickly looking through the code) could be deployed on an arbitrary karaf instance in someone's backend infrastructure, with the caching service available via requests to http://${some.host}:${some.port}/jsonld. Maybe you have several of these services running on different hosts.
How would you envision API-X making cached representations of objects available to the public? Would it be through filtering incoming GET requests to the repository and polling the caching service (as speculated above in my initial comment) so that it happens transparently? Through providing additional representations of the object, at their own URIs, backed by the cache? Both?
- Permalink
- Sep 25, 2015
Unknown User (acoburn)
Performance is excellent, and if you need it to handle higher throughput, you just add more backend nodes. Typically, with Riak, you have an arbitrary number of nodes (it's masterless and can scale up or down easily), and you set up one or more reverse proxies (e.g. haproxy) pointing at that cluster, so your service points to that single location (I've never needed more than one instance of haproxy running). So yes, you can have one or more instances of karaf running, each pointing to its own local instance of haproxy (which points to the riak cluster). To start, I don't imagine needing more than a single instance of karaf for this, but this architecture is embarrassingly easy to scale, even with a single instance of fedora.
For API-X, I'd have incoming requests pull the data directly from riak. If that fails (404 or otherwise), the request will attempt to extract the resource directly from fedora. But yes, I believe your earlier speculation about how it works is correct. (I also store thumbnails and other small binary objects there, too, since throughput performance is so much better than fedora3 – that does change with fedora4, but I will probably still cache small binaries like this).
- Permalink
- Sep 25, 2015
A. Soroka
This seems like a special case of a more general idea: "Use sophisticated caching (equipped with minimizing abilities) in front of Fedora." I'm not sure in what way it "extends the Fedora API"? There are no new functions here...
- Permalink
- Jan 26, 2016

All content on the LYRASIS Wiki is licensed under the CC BY (Attribution) license, unless otherwise noted.

Page tree