Title (Goal)Amherst - ID service
Primary ActorDeveloper
ScopeComponent
Level 
Author Unknown User (acoburn)
Story (A paragraph or two describing what happens)It would be convenient to separate the internal fedora IDs from the external IDs used in a public front-end application (this way IDs can persist even as Fedora IDs may change – migrating from F3 to F4 is an excellent example of needing to decouple these identifiers). This would be a service to handle these mappings between internal and external identifiers.

A sample OSGi-based implementation is available here: https://github.com/acoburn/repository-extension-services, particularly the acrepo-idiomatic, acrepo-idiomatic-pgsql and acrepo-mint-service modules.

Web Resource interaction

This service would interact with Fedora in two ways. First, it would react to Fedora's event stream (JMS or other) in order to populate an external database of identifiers. It would also expose its own HTTP endpoint to make it possible to create, manage and resolve these external identifiers.

Deployment or Implementation notes

This service would be deployed separately from fedora, possibly on a separate machine. I envision that this would be implemented as a combination of OSGi services and camel routes that can be deployed in any OSGi container, written in Java and Blueprint XML. The implementation would require access to Fedora's HTTP API, event stream and an external database.

API-X Value Proposition

The primary use of this service in the context of API-X would be to allow for service discovery.

 

5 Comments

  1. This implies registry-resolver components which has tons of useful application.  Private one are permitted by the Web architecture so it should fit well into general patterns.

  2. Wasn't the identifier translation machinery in the kernel intended for exactly this purpose? I'm not saying that it's necessary to use it, but I am curious as to why it doesn't meet your needs.

    1. Unknown User (acoburn)

      Perhaps I don't fully understand how the identifier translation machinery works in the kernel, but the way I had been envisioning the ID service was to separate this completely from fedora (rather than running it inside the fcrepo implementation). That is, say there is a fedora resource with this internal (fedora) identifier: /e6/57/75/21/e6577521-7ad4-434a-baa2-4693d8626c2c but with this (public-facing) identifier: /gx8n4w1

      I would expect all public requests for that resource to arrive at <host>/gx8n4w1. Some number of internal systems would then be touched (e.g. map a cookie value to a set of user/group agents, identify what collections the resource belong to, render the RDF as HTML or JSON, etc, etc). This does not necessarily involve touching fedora, as many of these intermediate representations can be served more efficiently from caches that live in external layers. So, if a request doesn't actually touch fedora, how else would I resolve that mapping to the fedora identifier? Plus, this is a mapping that will have to be resolved for every request, so speed is of utmost importance, and a primary key-based lookup in an RDBMS will be very, very fast. Once the public identifier is translated to the fedora identifier, that can be used to query the various ancillary systems involved in handling this request, and if that resolution happens as soon as possible once the request is received by a web server, then fewer of these internal systems need to know anything about that public URL – they can simply use the fedora URL.

      And as such, this service can easily be parallelized, if necessary, using a simple master-slave topology for the RDBMS and with any number of front ends handling user requests.

      1. The scenarios you are giving sound like a combination of caching ("many of these intermediate representations can be served more efficiently from caches that live in external layers") and a sophisticated implementation of Fedora's ID-minting SPI ("the public identifier is translated to the fedora identifier" is exactly the purpose of that SPI). There's no reason you can't put a RDB (or anything else) behind that.