You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

This page lays out the considerations and activities surrounding the Fedora 4 Audit Service.

Actions

  1. Define required Audit Service queries
  2. Perform comparative analysis of PROV-O vs. PREMIS-RDF
    1. Reference: http://dcpapers.dublincore.org/pubs/article/view/3709
  3. Define repository events that should be recorded by the Audit Service
  4. Define event agents that should be supported by the Audit Service
  5. Define capability of the Audit Service REST-API

Unresolved Questions

The following questions need to be resolved by the next audit service meeting (to be scheduled between March 4-6). In each case, a default answer has been provided in case there is insufficient community input in the allotted time. The default answers are highlighted in green.

Should there be support for adding external events to the Audit Service?

  1. If yes, what restrictions, if any, should be enforced on this capability? (e.g. only when migrating from Fedora3? only by administrators?)
  2. If yes, what should the import format be?
AnswerSubmitted by
Yes. By default, no restrictions will be enforced.David Wilcox
  
  

For event tracking, where is the user principal expected to come from?

AnswerSubmitted by
Fedora will use servlet-request#getUserPrincipal to get the principal. This means that applications will need to pass user principals to Fedora in order for them to be recognized by the audit service.David Wilcox
  
  

How will user principals be mapped to persistent user identifiers?

This is related to the previous question, and need not be resolved as quickly as the other questions.

AnswerSubmitted by
  
  
  

Proposed Requirements

  1. Audit service should automatically record who updated which resource when and with which action.
  2. Audit service should be able to include/import events that were performed external to the repository. (question)
    1. Ideally, we would use a system that at the point of digitization, an object is assigned a globally unique identifier. At this point we could then trigger the CREATE event in a Fedora 4 system, likely a separate one that is only intended to store event data and provide the API/REST services as well as SPARQL interface for queries. The GUID would then stay with the digital object through its lifecycle which could be a significant amount of time before it is prepare for digital preservation. This is why we would use a separate Fedora 4 to store this data and would continue to do so even after ingest into the digital repository.
      The problem at hand is simply that our workflow is not to create an immediately ingest digital content into our Hydra repository. We could configure other systems to store this data and in some cases this is taking place. But this puts us in a position of having some event data stored in plain text files, some is stored in microsoft excel/access and then some is stored in various SQL instances. So as others have indicated, we do need the ability to bring in event data that occurred prior to ingest of the object into Fedora.
      It could also be possible that we use this separate Fedora 4 as the generator for the GUID making, what seems, like a smooth integration point between the instance that only handles event data and the instance that handles digital preservation. It also avoids a potential infinite loop. An event to say that we updated the record effectively updates the record which triggers an event to say we updated a record, again. For very sensitive materials, the level of event logging we would perform may be just that granular. 
      The reason behind using a separate system for logging events is a fundamental principle of not having a system audit itself. So using a separate instance helps to maintain this separation, in my eyes it is separating the prison guards from the inmates, we should not trust the inmate to count themselves. But more importantly, we may want to track inmates that are on their way into the system, not just after their arrival. 
    2. One of the primary external use cases at UCSD is the transfer of objects to preservation management systems such as Chronopolis and Merritt. This will be triggered and performed external to Fedora, but the resulting Event metadata should be captured and linked to each Object for future querying. The common workflow would be as follows:
      1. Query Fedora for all Objects that have been created or modified since the last preservation transfer date.
      2. Attach Event metadata to each transferred Object in Fedora that includes: event type (PREMIS Event Types), Date, Agent, and optional outcome notes.
  3. Audit service should be able to purge events. (question)
  4. Audit service should be RDF-based, and use PATCH (question) semantics for updates.
  5. PROV-O ontology may be better suited than PREMIS.
  6. Audit service would ideally support map-reduce-style analytics.
  7. Evidence of fixity checking on a "routine basis", and with logs "stored separately or protected separately from the AIPs themselves" should be available.
  8. Fedora 4 REST API should support dissemination of event/audit information.

Supplementary Documentation

UCSD

  • No labels