Attendees

  • Andrew Woods
  • Danny Bernstein
  • David Wilcox
  • Scott Prater

Agenda

Topic

Specifying the validation tool

  1. What else is needed in order to complete this specification?

Developing the validation tool

  1. What programming language will be used?
  2. Who can contribute to this effort?
  3. What are our timelines?

Testing the validation tool

  1. Who can test the tool?
  2. How will we know when the tool is ready for use in production?
Wrap-up and next steps

Notes

Specifying the validation tool

  1. Specification is based on going from coarse to fine-grained validation
  2. Doesn't deal with disseminators, just standard Fedora 3 objects
  3. Fedora 6 does generate from specific files so we'll need to check if any of the migrated information ends up there
  4. Making some assumptions about versioning, checksums, etc.
  5. Could we also validate relationships?
    1. Fedora 3 doesn't do this so this might not be appropriate
  6. Possible issues
    1. XML attributes can appear in a different order post migration - how could we handle this?
      1. Size would be the same but checksum would be different
      2. Could run the files through sort and generate checksums to compare
    2. Is there a 1:1 mapping between a Fedora 3 object and a Fedora 6 object?
      1. Yes, F3 objects are created as F6 objects in archival groups
  7. Output
    1. How should the results be laid out?
      1. A high-level report could be generated that says "I looked at 10,000 objects and they look ok"
      2. A second level could indicate how many objects have problems and what those problems are
      3. Need to also report on the number of objects that were examined to ensure nothing was skipped
    2. Levels of validation 
      1. Coarse to fine grained
      2. Objects, object content, datastream content, versions
      3. Select level(s) to validate each run
    3. Validation based on a list of IDs rather than the entire repository

Developing the tool

  1. Is this a command line tool?
    1. migration-utils is a command line tool so this should be fine
  2. Can it be multi-threaded?
    1. This would help performance for large repositories
  3. What programming language would be most appropriate?
    1. This will be file-system to file-system validation
    2. We already have good Java tools based on migration-utils
    3. Use F3 libraries to read F3 and OCFL libraries to read F6 content
  4. Danny and Andrew will do most of the development
  5. Tool should be delivered by December or January

Testing the tool

  1. Need a set of test fixtures that contain errors we want to tool to detect
  2. Need to validate against objects in a variety of repositories
  3. Also need to validate exported F3 objects

Next Steps

  1. Come up with an initial design
  2. Set up basic infrastructure 
    1. Basic validation to start - number of objects
  3. Create JIRA tickets for the work