This page lays out the considerations and activities surrounding the Fedora 4 Audit Service.

Guiding Principles

  1. Any Fedora4 feature should be available through an API which is an implementation of LDP or an optional extension (ideally an existing standard)
  2. Fedora4 features should favor existing tools over custom code
  3. Fedora4 features should establish integration patterns where an implementation is not a part of the core code

Actions

Proposed Requirements

Legend: (question) - Needs refinement, consensus, or removal

Functional - Write/Import

  1. Audit service MUST ensure that all events minimally include the following information
    1. Event Agent
      1. When connecting through a service account, the Agent should use the standard header principal provider to pass the actual agent information.
    2. Event Date/Time
    3. Event Activity
    4. Event Entity
  2. Audit service MUST be able to include/import events that were performed external to the repository.
    1. External events should be clearly labeled so they can be easily filtered from internal events
    2. Ideally, we would use a system that at the point of digitization, an object is assigned a globally unique identifier. At this point we could then trigger the CREATE event in a Fedora 4 system, likely a separate one that is only intended to store event data and provide the API/REST services as well as SPARQL interface for queries. The GUID would then stay with the digital object through its lifecycle which could be a significant amount of time before it is prepare for digital preservation. This is why we would use a separate Fedora 4 to store this data and would continue to do so even after ingest into the digital repository.
      The problem at hand is simply that our workflow is not to create an immediately ingest digital content into our Hydra repository. We could configure other systems to store this data and in some cases this is taking place. But this puts us in a position of having some event data stored in plain text files, some is stored in microsoft excel/access and then some is stored in various SQL instances. So as others have indicated, we do need the ability to bring in event data that occurred prior to ingest of the object into Fedora.
      It could also be possible that we use this separate Fedora 4 as the generator for the GUID making, what seems, like a smooth integration point between the instance that only handles event data and the instance that handles digital preservation. It also avoids a potential infinite loop. An event to say that we updated the record effectively updates the record which triggers an event to say we updated a record, again. For very sensitive materials, the level of event logging we would perform may be just that granular. 
      The reason behind using a separate system for logging events is a fundamental principle of not having a system audit itself. So using a separate instance helps to maintain this separation, in my eyes it is separating the prison guards from the inmates, we should not trust the inmate to count themselves. But more importantly, we may want to track inmates that are on their way into the system, not just after their arrival. 
    3. One of the primary external use cases at UCSD is the transfer of objects to preservation management systems such as Chronopolis and Merritt. This will be triggered and performed external to Fedora, but the resulting Event metadata should be captured and linked to each Object for future querying. The common workflow would be as follows:
      1. Query Fedora for all Objects that have been created or modified since the last preservation transfer date.
      2. Attach Event metadata to each transferred Object in Fedora that includes: event type (PREMIS Event Types), Date, Agent, and optional outcome notes.
    4. Examples of external events,

      1. During ingest, audit service should accept audit log of an external application scanning a file for viruses.
      2. During ingest, audit service should accept audit log of an external application validating a file's content against an external schema, profile, or using domain-specific validation tools.
      3. Periodically, audit service should accept audit log of an external application, or a internal service provided by the repository itself, verifying a file's checksum.
    5. Audit service should accept audit log of an external application that moves a resource file.

  3. Audit service MUST be able to maintain events for purged resources
  4. Audit service MUST be able to perform with a large number of audit events
  5. Audit service MUST not be able to remove events (question)
  6. Audit service MUST allow events to be stored separately from the repository resources themselves
  7. Audit service MUST import events with RDF triples drawn from the specified ontologies

Functional - Read/Export

  1. Audit service MUST export and answer queries in RDF format
  2. Audit service MUST be able to export all events in the repository
  3. Audit service MUST service queries that vary by:

    1. Single or all resources
    2. Date range
    3. Event type
    4. Agent
  4. Audit service MUST provide a single search endpoint for all repository resource-related events
  5. Audit service MUST provide a SPARQL-Query search endpoint (question)
  6. Audit service MUST be able to limit the number of audit events returned by a query, e.g., the first and most recent fixity check events

Non-Functional

  1. Scale?
  2. Security?
  3. Performance?

Role Commitments

Development

Stakeholder

Supplementary Documentation

UCSD

  • No labels

10 Comments

  1. "For event tracking, where is the user principal expected to come from? servlet-request#getUserPrincipal?"

    Servlet-sourced identity is only going to work for servlet-based action: that might not play well with anyone doing integration directly via Java.

  2. "Fedora 4 REST API should support dissemination of event/audit information."

    It's not a killer point, but if the audit info is either in the repo, or projected over by the repo, then this gets done automatically.

  3. "Audit service should be able to export full logs in formats that can be ingested by: Tableau "

    This seems really product-specific. Wouldn't an RDF exposure be a better, more general facility?

  4. I do think an RDF exposure is better and would work with my use case with Tableau.

  5. "Audit service MUST be able to purge events."

    Is this going to conflict with using this service to support trusted repository certifications?

    1. In principle an audit log should be tamper-proof to some extend. But from operations perspective it makes sense to "garbage collect" unneeded records from any kind of database. The same applies for audit records. An requirement would be then, to be able to hand over stale audit logs to some archive and remove them from the repository. Including leaving behind an audit log record of the archiving activity, of course.

      1. That is a little bit tricky, Ralf Claussnitzer. It sounds like you are suggesting that on the one hand,

        • the audit service must be tamper-proof, but on the other hand
        • the audit service must provide for export and removal of entries

        The service can certainly support either being tamper-proof or purgeable. I am not sure how the service would be able to guarantee that a purge/export was indeed going to an external archive.

        Maybe the requirement is along the lines of: "The Audit Service will limit purge operations to system administrators"?
        ... although this brings the service into a gray area. 

        1. The service can certainly support either being tamper-proof or purgeable

          Depending on how we define tamper-proof.

          I am not sure how the service would be able to guarantee that a purge/export was indeed going to an external archive

          It cannot guarantee this. However, it can guarantee to log this operation and the responsible agent.

          Maybe the requirement is along the lines of: "The Audit Service will limit purge operations to system administrators"?

          It for sure should limit such operations to authorized agents. I think allowing purging is fine for system administrators.

  6. "Audit service MUST store logs separately or protected separately from the repository resources themselves"

    This seems like an implementation concern, not a concern of the specification. Certainly it's important, but this may be the wrong place to discuss it. It's definitely not clear to me that everyone who wants an audit service would subscribe to this, and putting it in as a requirement up front seems awfully expensive.

    1. Possibly the wrong place, but the right time.  Perhaps not everyone interested in an audit service is specifically looking to reach ISO 16363/TDR compliance, but the certification requirements are nevertheless an appropriate guide for this development, I think.  

      I will add though that if the audit records could be easily copied over to a second location (i.e. with each new write), that would probably satisfy the need, with ISO-interested implementers choosing this optional configuration.  But I'd rather see the default architecture of the service meet the certification requirements.