Audit Service Implementation Proposal

The proposed implementation of the audit service is to use the existing eventing system, Camel workflow engine, and external triplestore.

Phase 1

The first phase of implementation will be to use the existing event system to emit messages about audit events, process those events with Camel, and creating RDF for events in an external triplestore. The primary goal of this phase is to satisfy the audit service requirements with minimal impact on the repository.

Make sure that all internal audit events generate JMS messages
Make sure that generated messages contain enough information to create event RDF
Create Camel workflow to process messages and create event RDF in an external triplestore
Propose RDF classes and properties that event RDF should use
Document recipe for creating event RDF for external events in an external triplestore using SPARQL Update
Document recipe for disabling deleting event triples from external triplestore
Document end-to-end recipe for configuring event service
Verify that all audit service requirements are satisfied

Phase 2

The second phase of the implementation will be to create an optional component for persisting audit information in the repository. The primary goal of this phase is to improve the durability of the audit persistence using the repository.

Create a REST API endpoint for audit events attached to each resource, which allows creating external events and retrieving all events
Update the repository to create audit event records in this container for internal events
Create configurable option to allow or disallow deleting events in the repository
Make sure that other repository functionality is not impacted by enabling or disabling in-repository audit event persistence
Document end-to-end recipe for configuring event service with in-repository audit event persistence

RDF Vocabulary

A typical event encoded in RDF would look like this:

@prefix fedora: <http://fedora.info/definitions/v4/repository#> .
@prefix foaf:   <http://xmlns.com/foaf/0.1/> .
@prefix premis: <http://www.loc.gov/premis/rdf/v1#> .
@prefix prov:   <http://www.w3.org/ns/prov#> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .


<event1> a prov:InstantaneousEvent, premis:Event ;
  premis:hasEventRelatedAgent <agent1> ;
  premis:hasEventType <http://id.loc.gov/vocabulary/preservationEvents/cre> ;
  fedora:hasParent <http://localhost:8080/rest/55/59/ec/05/5559ec05-6ab1-4d61-905a-a5f3da360b23> ;
  prov:atTime "2012-04-30T20:40:40"^^xsd:dateTime .


<agent1> a premis:Agent ;
  premis:agentType <http://id.loc.gov/vocabulary/preservation/agentType/sof> ;
  foaf:name "Client Software v1.2.3"^^xsd:String ;
  prov:actedOnBehalfOf <agent2> .


<agent2> a premis:Agent ;
  premis:agentType <http://id.loc.gov/vocabulary/preservation/agentType/per> ;
  foaf:nick "jquser"^^xsd:String .

Is fedora:hasParent the right predicate to use to link to the resource being acted on? Is there a more appropriate predicate to use?
Should we use prov:atTime or premis:hasEventDateTime for recording the event timestamp?
Should we simplify the agents down to strings, e.g.:

<event1> a prov:InstantaneousEvent ;
  premis:hasEventRelatedAgent "jquser"^^xsd:string, "Client Software v1.2.3"^^xsd:string .

Should we include checksums produced by fixity checks, e.g.:

<event1> a prov:InstantaneousEvent ;
  premis:hasFixity <event1#fixity1> ;
  premis:EventOutcomeInformation "SUCCESS" .

<event1#fixity1> a premis:Fixity ;
  premis:hasMessageDigest "cf23df2207d99a74fbe169e3eba035e633b65d94"^^xsd:string ;
  premis:hasMessageDigestAlgorithm "SHA1"^^xsd:string .

Page tree

Audit Service Implementation Proposal

Phase 1

Phase 2

RDF Vocabulary