Use case 1 - Fedora managing access conditions

Title (goal)
Fedora managing access conditions
Primary ActorLibrarian/archivist/curator
Scope 
LevelUser goal
Story

The producer of Fedora content wants to be able to set access conditions that would allow for the following scenarios:

  1. Content can only be accessed by a specific IP address or list of addresses
  2. Data streams in the object can have different access conditions. i.e. TIF restricted to a single user, JPG open to the campus, PDF open to the world.
  3. Curator can set authorization in an external system which Fedora can access.
    1. EZProxy
    2. homegrown authorization systems
  4. Curator can adjust access conditions per object or for thousands/millions of objects at a time
  5. Curator can set restrictions on the application permitted to open an object. (focus on born digital for this)
    1. example: Disk image created for a Mac SE, the curator indicates in the access condition the emulation environment required to open the file.
    2. note 1: I'm not expecting Fedora to enforce the restriction, only to store it.
    3. note 2: I know emulation information can be stored in PREMIS, but information about the required emulation settings is different from requiring a specific software title.
  6. Curators can set multiple access schemas to an object or data stream in an object. This means a curator could say that a set of IP addresses, an active directory group and a special group in our identity management software may access the materials.
  7. Curators can set an access restriction flag for an external patron registration system or other complex authorization system such as Aeon (Atlas Systems). In this case, we only need Fedora to know that the restriction exists. We would apply the code that reacts to this requirement in Hydra or some other system which would cause the patron to go off and register/login to some patron tracking system and when they meet the requirements, Hydra/Blacklight would release the objects that meet the requirement.

As for implementation, I can offer some examples of what we use now. We specify the file type/size, the authorization type and then any values associated. Some examples:

TIF - Active Directory Group - ManuscriptCurators

TIF - Aeon

JPG 600px - IP - list of IP values or ranges

PDF - external authentication

JPG 1200px - Yale only

JPG 150px - open access

TIF - NetID - yale\mf438 (or a list of NetIDs)

DSK - Active Directory Group - ManuscriptDirectors

DSK - Emulation - AppleWin v1.1.8

Basically our need is for very granular levels of permission to be stored with the object in Fedora. Right now it is stored as XML as a data stream, it would be beneficial to have it stored differently so that we could make mass changes to materials for entire collections.

Another note, we would only be storing a single JPG or possibly no JPG and only a JP2 and will derive the JPG on the fly. So the access condition setup may include conditions for resolutions of digital formats not contained in the data streams. The JPG examples above would indicate that a single JPG exists as a data stream and from that stream we will derive smaller images. But the access conditions are different for ranges of sizes. For Yale, we stick to these sizes, 150px or less (thumb), 151-600px (medium), 600+ (full resolution). For TIF images we use Full, Half page and Quarter page. Right now, all other sizes/resolutions are tied directly to the file type stored as a data stream. But being able to reference access for something that is dynamically generated would make this scale to future needs.

Use case 2 - Programmers use API for access condition support in external systems, i.e. Hydra

Title (goal)
Programmers use API for access condition support in external systems, i.e. Hydra
Primary ActorIT/programming
Scope 
LevelUser goal
Story

A programmer can use an API in Fedora that will provide access conditions for a requested object.

In an ideal implementation I could make a request to Fedora with a PID and the data stream, the API would then return the requirements for access. For example, a URL might look like http://fedora/PID/TIF/access and then returns something sort of JSON or XML that has all the access information for the specific request. If the data stream type is not specified then all access information would be returned.

Since we also would like to contain information related to born digital/emulation environments, this output would also list the restrictions for environments for accessing materials.

Use case 3 - Applications use API for updating access conditions stored in Fedora

Title (goal)
Applications use API for updating access conditions stored in Fedora
Primary ActorIT/programming
Scope 
LevelUser goal
Story

A programmer use the API to update/add access conditions to a PID, PID range or all PIDs in a namespace. This would essentially be API access to do everything in the Use Case 1 above. Since no curators at Yale will have direct access to Fedora and all interaction will be through a different software front end such as Archivematica or our homegrown solution Ladybird, we will need CRUD controls for Access Conditions that can be handled from the other software products after ingest takes place.

Bulk updates are the most important part of this feature. Working with individual objects is time consuming and a wasteful operation. Ideally this is implemented so that a single update to a PID takes about the same amount of time as updating 10,000 PIDs.

 

 

  • No labels

2 Comments

  1. These uses cases are very illustrative and I'd like to understand them better. Sounds like use case 1 involves categorizing data streams by type and additional metadata. Then there are a number of user groups that can be identified in various ways.

    I'd imagine that you'd like to avoid defining that whole ACL per object (JPG+1200px => Yale only). Per object access controls add that burden of adjusting 10,000 settings for a simple change of say 1200px to 2400px. Perhaps instead you'd want higher level policies that say things like "High Resolution Images are accessible to Yale people". Defining "High Resolution Image" and Yale people separately. So then you get a specific group of people assigned a less specific (and therefore adjustable) kind of access.

    You may want to give Yale people different kinds of access inside different collections. That tends towards role-based access control (RBAC) and it is more flexible. Yale people could have one role in collection A and a different role in Collection B. Access controls would be tied to the role and the role would have some kind of higher level meaning. This makes it easier for people to manage access controls, since they only have to memorize the meaning of roles and not every granular restriction that is possible. Later, you can change the meaning of roles in one place, instead of on every ACL. I plan to build one AuthZ option that is RBAC, so that will be an option.

    Roles can help a bit with your use case 2, since they give a simple name to an access policy. Since external apps will enforce security very differently from Fedora, the name is perhaps more useful than a set of granular permissions that don't translate well into application rules. For instance, I can more easily render an app page for a "content owner", than I can for "someone with permissions A through P, but not H,I, or J".

    Let me know what you think. AuthZ will definitely be a pluggable module, with some support for custom external systems. However, I personally have doubts about the manageability of systems that do not aggregate permissions into higher level policies or roles.

    1. You are correct. The image resolutions would map directly to low, medium and high resolution. We come from Luna so we used to speak in size 1, size 3 and size 4 in this same manner. This is how I arrive at the 150px, 600px and more than 600px. 

      For TIF images, it is standard for us to offer full, half or quarter page resolutions. 

      Either way, I think your suggestion of the a-z list of permission types would work. What I know does not work is a project I was on where we had to build a matrix where A might mean, public access for thumb, restricted for anything better, B means public access through high res, restricted for all above. What I mean is that an A-Z list of permission should focus on specific file types and not try and create some blanket permission. The matrix for us would become unwieldy and just too complicated to use. So I'd be interested in hearing more about how that would look. 

      A future use case, which is covered above, involves finding aids. In some cases we have two versions, the public one with names redacted and the private one which names intact. This is one of the main reasons we have a need. I also realize I am missing a permission type which I will edit above. It involves the use of Aeon but in this case I would rather think of it as an external patron registration system that may or may not be automated, meaning you may be required to register online and then appear at the circ desk with two forms of ID in hand before being granted access. But I offer this to say that when I say external authorization system, I actually meant for system to be plural in that we will have multiple external systems providing authorization.