Title (Goal)Amherst - embargo handling
Primary ActorDeveloper
ScopeComponent
Level 
Author Unknown User (acoburn)
Story (A paragraph or two describing what happens)For objects in the repository with embargos, an external process should be able to periodically check on those resources (e.g. weekly, monthly), updating the access controls (e.g. using WebAC) when the access restrictions go out of effect.

Web Resource interaction

As captured in the discussion below, there are two possible paths for this component. One approach would be to capture (filter) incoming requests, and, if an embargo is different than the authorization mode in effect, it could change that authorization mode to align with the embargo. This has the advantage that any embargo that has gone into or out of effect will be "immediately" translated into an authorization policy (by "immediately" I mean upon the next (filtered) request for the resource). The disadvantage is that every request would have the additional overhead of handling embargo dates; furthermore, a user request would require privilege escalation in order to change that resource, which is a situation that gives me pause.

The alternative approach would be to have a background batch process that is initiated either by an HTTP request or by running a script. That process would check on embargo-ed resources to verify whether the (WebAC) authorization mode should be changed. If it has, that change would be made. This script should not have to do a full repository scan: it should be able to generate a list of candidate resources to check based on some index: triplestore, database, solr, etc. That index, presumably, would be generated by a listener on Fedora's event stream.

Deployment or Implementation notes

This service would be deployed separately from fedora. I envision that this would require access to Fedora's HTTP API and event stream. It would probably make sense to implement this in a scripting language: python, ruby or go.

API-X Value Proposition

The primary use of this service would be for supporting an asynchronous background worker process for updating repository resources.  

 

18 Comments

  1. Would another mode of operation be to check the embargo at request time, and update the WebAC rules (I'm not familiar with WebAC or its FF implementation)?  So one mode of operation could be a periodic scan, and another mode of operation is to check the embargo at the time the resource is requested? 

    1. WebAC has no notion or facility for handling time-based authorization, hence the phrasing of this use case. I do not believe "standard" WebAC would be able to handle embargo checking at request time.

      1. Ok, I was thinking that the extension would intercept the request, and check to see if the resource is still embargoed.  If the resource isn't embargoed, then the extension could lift it by modifying the WebAC rules, and then return the resource.

        1. Yes, true. There would need to be an "admin" account associated with the extension for updating ACL/Authorizations to remove the embargo, but a request-time check/update of Authorizations would be possible.

          Maybe this "use case" should be stated more generally to indicate user facing support for embargoing resources, and leave the implementation (scanning or upon request) up to deeper analysis of performance, etc.

           

          1. I like the idea of generalizing as well, but do think that, as a potential API-X use case, we'd need to consider possible ways such an extension could be implemented, and what API-X's role would be under those scenarios.   For example, I can see API-X involved in 'support for embargoes', as described here and in comments, in a couple ways:

            • Exposing an endpoint that provides information related to embargoes for any object; say http://example.org/path/to/object/ext:embargo. An extension could implement this endpoint by providing a document with relays (a) whether the resource is subject to an embargo, (b) possibly the details/definition of the embargo, and (c) whether the resource is subject to the embargo right now, at the time of the request, perhaps even for the user making the request.
              • This would support asynchronous use cases where an external tool scans the repository, looks at ext:embargo for relevant objects, and acts upon that object's ACL as appropriate.  This external tool would not be involved with API-X per se, other than consuming a service that is exposed by API-X and using it to make a decision
            • Intercepting requests to objects and, for objects with defined embargoes, filtering the request via an extension which determines whether access is allowed based on the object's embargo status.  

             

            1. In regard to supporting embargo, see this other thread that does not involve API-X [1].

              In any case, I would avoid having the API-X scan the repository for any purpose. A triplestore or Solr index query is much more efficient and flexible as well as relieving the core repository from potentially intensive operations.

              If you need stats about embargos, an API-X route that sends a SPARQL query to an index seems to be in line with other use cases.

              [1] Re: 2015-09-16 - WebAccessControl Authorization Delegate Planning Meeting

              1. Unknown User (acoburn)

                Yes, querying an index will always be better than scanning the entire repository. I would agree that a full repository scan should almost always be avoided.

  2. Unknown User (acoburn)

    Yes, in order to handle these intercepts, the user request would need to be escalated to an "admin" account. I am generally wary of scenarios in which privilege escalation plays a central role: there are so many ways things can go really wrong, that I would be uncomfortable with a service that implements that sort of behavior.

    1. Is this "admin" escalation necessary for the purpose of updating an arbitrary resource's ACL for intercepted requests (as Andrew suggests), or for the act of intercepting a request and reading the necessary  required to evaluate an embargo policy in the first place?

    2. I am generally wary of scenarios in which privilege escalation plays a central role

      Me too, no disagreement here.  If embargos are enforced using WebAC, privilege escalation to remove those rules would need careful consideration.

  3. I am at a disadvantage because I'm not familiar with WebAC.  We've had discussions at JHU in the past regarding embargos, and this use case is often presented:

    We want the general public to be able to see that a resource exists, implying that some metadata about the resource is exposed and potentially indexed for search.  But we don't want the actual content to be retrievable.  Instead, we'd like to see a note that says: please contact the author of this resource if you'd like to obtain a copy.

    Would WebAC be able to enforce this "nuanced" version of embargo?  Or would an extension provide this information upon request of an embargoed resource?

    1. Think of WebAC as partially analogous UNIX permissions.  You can define users or groups for various permutations of read, write.  Embargo evaluation is out of scope for WebAC, as is providing additional information related to access control evaluation (like specific explanatory text, URI to a representation of policy, etc).   So it sounds like an extension may be necessary for handling nuance.

    2. Unknown User (acoburn)

      I believe that WebAC could be used to handle this scenario. For example, if the "public" group had read access to the containing resource (where the descriptive metadata is stored), a separate rule can be applied to the binary (e.g. the "public" group cannot access the resource).

      As for exposing information about an embargoed resource (e.g. at a /ex:embargo endpoint), that seems like a very appropriate API-X use case. That endpoint, in principle, could also provide the machinery to act on the embargo: DELETE to remove the embargo, PUT to update/create the embargo.

      1. So if you wanted to create an embargo.  Would you first PUT/POST the resource, and then PUT/POST the embargo?  Or would you be able to submit all of the resources at once?

        In the scenario where the resource exists at /path/to/object without an embargo, and the user wishes to embargo the resource, would the user

        1. POST an embargo resource to /path/to/object to create the embargo,
        2. POST the embargo resource to /path/to/object/ext:embargo, or 
        3. PUT the embargo resource to /path/to/object/ext:embargo?   

        I'm guessing that it wouldn't be option 1.

        I guess I'm trying to understand if ext:embargo is a resource representing the embargo extension, or if ext:embargo is identifying the embargo resource itself.  Or am I overthinking it?

        1. Unknown User (acoburn)

          I was thinking that an embargo would be defined in terms of RDF properties on a resource. So, any client with write access to that resource could add, remove or update those embargo-related properties through the existing fedora API. Therefore, in once sense, the /ext:embargo endpoint would be completely unnecessary. However, for a client designed specifically to handle embargos, such a simplified endpoint might be easier to work with. One advantage of a separate endpoint is that it could provide additional information about the status of a resource (unavailable until 2025), without simply responding with a 401 or returning the entire resource – a resource that might otherwise need to remain inaccessible.

          1. POST an embargo resource to /path/to/object to create the embargo

          This would allow the api-x framework to filter the request based on the type of the object in the request (an embargo), route it to the embargo extension, which would then create the embargo resource where it sees fit (say it decided to create the embargo resource at /path/to/embargo/resource).  So basically the user is saying, I want to embargo the resource at /path/to/object using the embargo parameters i've supplied in my POST request, but I'll let the repository decide what URI the embargo resource receives.

          2. Performing a GET on ext:embargo could provide a reference to the embargo resource (which may be /path/to/embargo/resource), and perhaps other metadata about the embargo? Or perhaps performing a GET on ext:embargo returns an embargoed version of the resource at /path/to/object.

          3. Using the reference obtained from the previous GET, you could PUT directly to the embargo resource (/path/to/embargo/resource) to update it, or DELETE to remove it.

           

           

  4. Another thought/comment is: is there overlap between an embargo and a tombstone?  And if so, does that insight provide hints with regard to implementing embargos with API-X?

    For example, could an embargo be considered a temporary tombstone?  Or a could a tombstone be considered a permanent embargo?  

    What is the user interaction when tombstoning /path/to/object, and does that provide any insight w.r.t. managing an embargo?

  5. If the implementation here is not "dynamic" (in other words, if it relies on action that isn't triggered by a client request) then I have to ask how it is "extending the Fedora API"? It seems instead to be asking for a recipe for some message-driven workflow infrastructure, which is interesting and important stuff, but not any part of API extension.