Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
titletodo
  • ocfl library links
  • ocfl wrapper implementation
  • otm links
    • bridge
    • deletion workflow

At the start of all workflows in Chronopolis we have our tools to package data into a file layout. Typically this will be handled by one of our Intake services, but occasionally is done in an ad-hoc fashion if data needs to be ingested manually. This is an action which as taken once so that the file layout can be distributed throughout the Chronopolis network. Currently we only supply our own packaging library for the BagIt format.

...

  • Creating an initial OCFL Object
  • Updating with a partial OCFL Object
  • Purging of data from an OCFL Object
  • Validating an OCFL Object, a specific version of an OCFL Object, and the metadata for an OCFL Object (e.g. making sure the inventory.json contains all files, the sidecar has the correct checksum, etc)
  • Handling additional metadata (ACE Tokens, application logs)
  • Understanding different addressing schemes

OCFL Storage Roots
Status
colourBlue
titleDecision

For Chronopolis, we have a decision to make here in terms of where the Storage Roots will exist. It is important to keep in mind that a Storage Root can contain zero or more OCFL Objects. 

...

Finally we can place the Storage Root at the package level. While this creates some extra data which needs to be transferred, it provides the opportunity to create the Storage Root at Intake and distribute that through the network. This also provides the opportunity to explore multi-part OCFL packages by creating multiple OCFL Objects under a single Storage Root for a given deposit. It also allows for a more staggered approach to having OCFL versions in the Chronopolis repository, as new versions of OCFL layouts could be propagated as work is done in order to support them.

OTM Considerations

In the OTM specification, a versioning scheme is used which allows for a Repository pass data to a DDP such that it can have multiple versions. In the Deposit DDP Workflow, a version-id is specified as an extra json field in order to store this information. In order to be compatible with the OTM specification, we will need to ensure that we can read and write arbitrary json fields to the inventory.json of an OCFL Object.

The OTM specification also includes a workflow for deleting data from a DDP. Depending on how the Chronopolis team plans on implementing this workflow for the DDP, we may want to include some knowledge about this in the packager.

Homegrown

If we opt to create our own OCFL packaging library, there are many things which we need to be aware of and make decisions on. Some of these are not necessarily technical decisions, but things which we can drive through our own idea of what constitutes best practices for OCFL in Chronopolis.

We should keep the implementation notes from the OCFL team in mind to keep things such as versioning numbering consistent with what is expected from the specification.

OCFL Storage Root

The first piece which our OCFL library would need is creating and listing OCFL storage roots. This is a fairly basic operation, and includes the namaste and some optional files describing the OCFL spec. If the Storage Root is going to be part of the preservation object, then the files would need to be generated, and directories created for each OCFL Object to be created.

...

The packager itself should not be concerned with creating forward deltas, and as such should not have to understand any semantics of different content addressing schemes. Therefor Therefore the amount of addressing the packager should have an understanding of is limited to knowing the following:

...

This would be part of a larger workflow, and the steps handled by the packager would be centered around creating a new OCFL Object and inserting versions for each existing version of the object. If a stub object is to be used in place of what was purged, this would be included in order to overwrite any existing data at a Chronopolis node.

...

External Library

Currently there are a few libraries which are attempting to implement the OCFL specification:

Many of the decisions we make on applying and creating OCFL Storage Roots and Objects would apply to any existing libraries as well. If choosing to go this route we would need to make sure all operations are supported and that we can handle any errors gracefully in the event of failure. Executing In the case of the Go library, executing external processes in java does not always make for the prettiest workflow , but can be managed. Likewise if there is a java library available, we could contribute to it in order to support more workflowsThe java library would integrate easily with our existing codebase and could be forked or modified in order to suit our workflow needs.

Evaluation

Each of the two libraries should go through an evaluation phase in order to determine

  • if our workflow needs are met and if not how they can be resolved
  • ease of integration with our codebase
    • including additional development we may wish to consider, e.g. creating an API which exposes the library for Chronopolis workflows
  • ease of contribution to the library

Integration

One a library is chosen for use, incorporating it into the codebase is the next step. How this gets exposed to other Chronopolis services will be up to the developers implementing the changes, as it can be done in multiple ways.