Amherst: embargo handling

Created by Unknown User (acoburn), last modified on Oct 30, 2015

Title (Goal)	Amherst - embargo handling
Primary Actor	Developer
Scope	Component
Level
Author	Unknown User (acoburn)
Story (A paragraph or two describing what happens)	For objects in the repository with embargos, an external process should be able to periodically check on those resources (e.g. weekly, monthly), updating the access controls (e.g. using WebAC) when the access restrictions go out of effect.

Web Resource interaction

As captured in the discussion below, there are two possible paths for this component. One approach would be to capture (filter) incoming requests, and, if an embargo is different than the authorization mode in effect, it could change that authorization mode to align with the embargo. This has the advantage that any embargo that has gone into or out of effect will be "immediately" translated into an authorization policy (by "immediately" I mean upon the next (filtered) request for the resource). The disadvantage is that every request would have the additional overhead of handling embargo dates; furthermore, a user request would require privilege escalation in order to change that resource, which is a situation that gives me pause.

The alternative approach would be to have a background batch process that is initiated either by an HTTP request or by running a script. That process would check on embargo-ed resources to verify whether the (WebAC) authorization mode should be changed. If it has, that change would be made. This script should not have to do a full repository scan: it should be able to generate a list of candidate resources to check based on some index: triplestore, database, solr, etc. That index, presumably, would be generated by a listener on Fedora's event stream.

Deployment or Implementation notes

This service would be deployed separately from fedora. I envision that this would require access to Fedora's HTTP API and event stream. It would probably make sense to implement this in a scripting language: python, ruby or go.

API-X Value Proposition

The primary use of this service would be for supporting an asynchronous background worker process for updating repository resources.

uc-api-ext

18 Comments

Elliot Metsger
Would another mode of operation be to check the embargo at request time, and update the WebAC rules (I'm not familiar with WebAC or its FF implementation)? So one mode of operation could be a periodic scan, and another mode of operation is to check the embargo at the time the resource is requested?
- Permalink
- Sep 23, 2015
1. Andrew Woods
  WebAC has no notion or facility for handling time-based authorization, hence the phrasing of this use case. I do not believe "standard" WebAC would be able to handle embargo checking at request time.
  Permalink
  
  Sep 24, 2015
  1. Elliot Metsger
    Ok, I was thinking that the extension would intercept the request, and check to see if the resource is still embargoed. If the resource isn't embargoed, then the extension could lift it by modifying the WebAC rules, and then return the resource.
    
    Permalink
    
    Sep 24, 2015
    1. Andrew Woods
      
      Yes, true. There would need to be an "admin" account associated with the extension for updating ACL/Authorizations to remove the embargo, but a request-time check/update of Authorizations would be possible.
      Maybe this "use case" should be stated more generally to indicate user facing support for embargoing resources, and leave the implementation (scanning or upon request) up to deeper analysis of performance, etc.
      
      Permalink
      
      Sep 24, 2015
      1. Aaron Birkland
        
        I like the idea of generalizing as well, but do think that, as a potential API-X use case, we'd need to consider possible ways such an extension could be implemented, and what API-X's role would be under those scenarios. For example, I can see API-X involved in 'support for embargoes', as described here and in comments, in a couple ways:
        Exposing an endpoint that provides information related to embargoes for any object; say http://example.org/path/to/object/ext:embargo. An extension could implement this endpoint by providing a document with relays (a) whether the resource is subject to an embargo, (b) possibly the details/definition of the embargo, and (c) whether the resource is subject to the embargo right now, at the time of the request, perhaps even for the user making the request.
        This would support asynchronous use cases where an external tool scans the repository, looks at ext:embargo for relevant objects, and acts upon that object's ACL as appropriate. This external tool would not be involved with API-X per se, other than consuming a service that is exposed by API-X and using it to make a decision
        Intercepting requests to objects and, for objects with defined embargoes, filtering the request via an extension which determines whether access is allowed based on the object's embargo status.
        
        Permalink
        
        Sep 24, 2015
        
        Stefano Cossu
        
        In regard to supporting embargo, see this other thread that does not involve API-X [1].
        In any case, I would avoid having the API-X scan the repository for any purpose. A triplestore or Solr index query is much more efficient and flexible as well as relieving the core repository from potentially intensive operations.
        If you need stats about embargos, an API-X route that sends a SPARQL query to an index seems to be in line with other use cases.
        [1] Re: 2015-09-16 - WebAccessControl Authorization Delegate Planning Meeting
        
        Permalink
        
        Sep 25, 2015
        
        Unknown User (acoburn)
        
        Yes, querying an index will always be better than scanning the entire repository. I would agree that a full repository scan should almost always be avoided.
        
        Permalink
        
        Sep 25, 2015
Unknown User (acoburn)
Yes, in order to handle these intercepts, the user request would need to be escalated to an "admin" account. I am generally wary of scenarios in which privilege escalation plays a central role: there are so many ways things can go really wrong, that I would be uncomfortable with a service that implements that sort of behavior.
- Permalink
- Sep 24, 2015
1. Aaron Birkland
  Is this "admin" escalation necessary for the purpose of updating an arbitrary resource's ACL for intercepted requests (as Andrew suggests), or for the act of intercepting a request and reading the necessary required to evaluate an embargo policy in the first place?
  Permalink
  
  Sep 24, 2015
2. Elliot Metsger
  I am generally wary of scenarios in which privilege escalation plays a central role
  Me too, no disagreement here. If embargos are enforced using WebAC, privilege escalation to remove those rules would need careful consideration.
  Permalink
  
  Sep 24, 2015
Elliot Metsger
I am at a disadvantage because I'm not familiar with WebAC. We've had discussions at JHU in the past regarding embargos, and this use case is often presented:
We want the general public to be able to see that a resource exists, implying that some metadata about the resource is exposed and potentially indexed for search. But we don't want the actual content to be retrievable. Instead, we'd like to see a note that says: please contact the author of this resource if you'd like to obtain a copy.
Would WebAC be able to enforce this "nuanced" version of embargo? Or would an extension provide this information upon request of an embargoed resource?
- Permalink
- Sep 24, 2015
1. Aaron Birkland
  Think of WebAC as partially analogous UNIX permissions. You can define users or groups for various permutations of read, write. Embargo evaluation is out of scope for WebAC, as is providing additional information related to access control evaluation (like specific explanatory text, URI to a representation of policy, etc). So it sounds like an extension may be necessary for handling nuance.
  Permalink
  
  Sep 24, 2015
2. Unknown User (acoburn)
  I believe that WebAC could be used to handle this scenario. For example, if the "public" group had read access to the containing resource (where the descriptive metadata is stored), a separate rule can be applied to the binary (e.g. the "public" group cannot access the resource).
  As for exposing information about an embargoed resource (e.g. at a /ex:embargo endpoint), that seems like a very appropriate API-X use case. That endpoint, in principle, could also provide the machinery to act on the embargo: DELETE to remove the embargo, PUT to update/create the embargo.
  Permalink
  
  Sep 24, 2015
  1. Elliot Metsger
    So if you wanted to create an embargo. Would you first PUT/POST the resource, and then PUT/POST the embargo? Or would you be able to submit all of the resources at once?
    In the scenario where the resource exists at /path/to/object without an embargo, and the user wishes to embargo the resource, would the user
    POST an embargo resource to /path/to/object to create the embargo,
    POST the embargo resource to /path/to/object/ext:embargo, or
    PUT the embargo resource to /path/to/object/ext:embargo?
    I'm guessing that it wouldn't be option 1.
    I guess I'm trying to understand if ext:embargo is a resource representing the embargo extension, or if ext:embargo is identifying the embargo resource itself. Or am I overthinking it?
    
    Permalink
    
    Sep 24, 2015
    1. Unknown User (acoburn)
      
      I was thinking that an embargo would be defined in terms of RDF properties on a resource. So, any client with write access to that resource could add, remove or update those embargo-related properties through the existing fedora API. Therefore, in once sense, the /ext:embargo endpoint would be completely unnecessary. However, for a client designed specifically to handle embargos, such a simplified endpoint might be easier to work with. One advantage of a separate endpoint is that it could provide additional information about the status of a resource (unavailable until 2025), without simply responding with a 401 or returning the entire resource – a resource that might otherwise need to remain inaccessible.
      
      Permalink
      
      Sep 24, 2015
    2. Elliot Metsger
      
      POST an embargo resource to /path/to/object to create the embargo
      This would allow the api-x framework to filter the request based on the type of the object in the request (an embargo), route it to the embargo extension, which would then create the embargo resource where it sees fit (say it decided to create the embargo resource at /path/to/embargo/resource). So basically the user is saying, I want to embargo the resource at /path/to/object using the embargo parameters i've supplied in my POST request, but I'll let the repository decide what URI the embargo resource receives.
      2. Performing a GET on ext:embargo could provide a reference to the embargo resource (which may be /path/to/embargo/resource), and perhaps other metadata about the embargo? Or perhaps performing a GET on ext:embargo returns an embargoed version of the resource at /path/to/object.
      3. Using the reference obtained from the previous GET, you could PUT directly to the embargo resource (/path/to/embargo/resource) to update it, or DELETE to remove it.
      
      Permalink
      
      Sep 24, 2015
Elliot Metsger
Another thought/comment is: is there overlap between an embargo and a tombstone? And if so, does that insight provide hints with regard to implementing embargos with API-X?
For example, could an embargo be considered a temporary tombstone? Or a could a tombstone be considered a permanent embargo?
What is the user interaction when tombstoning /path/to/object, and does that provide any insight w.r.t. managing an embargo?
- Permalink
- Sep 24, 2015
A. Soroka
If the implementation here is not "dynamic" (in other words, if it relies on action that isn't triggered by a client request) then I have to ask how it is "extending the Fedora API"? It seems instead to be asking for a recipe for some message-driven workflow infrastructure, which is interesting and important stuff, but not any part of API extension.
- Permalink
- Jan 26, 2016