You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Unlike the functionality migration, migrating fedora 3 data to fedora 4 can be lossless (and therefore reversible).  This page is meant to discuss strategies and tooling to accomplish this migration.

Considerations that shape these strategies include:

  • need to preserve fedora 3 content, history and audit trail
  • ability to leverage fedora 4 features
  • need to make data accessible and functional in the new environment
  • desire to make migration easier, faster and less error-prone

 

Proposal 1:

Develop a framework for pluggable migration tool that is based on processing of FOXML xml.

This has the strength that:

  • foxml (when exported in the "archive" context, or persisted in the low level store) is a complete representation of the object
  • foxml offers a wide range of compatibility with various versions of Fedora
  • foxml migration doesn't require the fedora 3 repository software to be running
  • large number of existing frameworks for efficiently processing XML

Considerations:

  • migration of data that's not in the repository (like configuration, global xacml policies, etc.) will require special handling
  • ability to write and use plugins (special configurations) for mapping complex metadata or fedora 3 constructs into fedora 4 must be made as easy as possible since most institutions will need to write their own or adapt existing ones

 

Pluggable

The main framework will take as its source FOXML from a fedora repository.  This may be just pointing to the fedora data store directory or pointing it to a running repository and fetching each record through the export API call.  What happens during the processing of each object must be highly configurable.  For the purpose of this proposal, consider the term "processing plugin" to refer to a bit of code or algorithm to handle a part of a fedora object.

Identifier plugins

A place that represents the implementation of your institutional pid migration strategy.  This could be as simple as "Store the PID as a DC identifier and mint a new fedora 4 id for this item" to something more complex like "escape the existing pid into a fedora 4 path".

Datastream plugins

I envision lots of datastream plugins whose applicability is based on characteristics of the datastream such as control group, mime type, dsid or even based on the content model that defines it.  Presedence for such rules should be simple and well-defined. 

You should be able to express strategies like some of the following examples:

  • the DC datastream should be translated into RDF assertions
  • datastreams called descMetadata should be translated to RDF assertions using a given template
  • the content datastream on objects with the cmodel:images content model should be handled X
Access Control plugins
  • POLICY datastreams should be mapped as follows
Reporting plugins

In addition to processing plugins, the framework allows for no-op plugins that instead of migrating data, performs some sort of check, gathers some statistics or otherwise generates a report that may be useful in developing and planning migration.

 

 

  • No labels