Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

These mechanisms use a variety of different formats (MODE/ISPN/LevelDB binary, JCR XML, RDF, JSON) and use a variety of different workflows (externally-triggered, event-driven, automatic).

Questions:

  1. What is the impact of using the different options on a running repository?
    1. How do each of the methods scale?
    2. What performance impact do they have?
    3. What additional disk space, memory, etc. resources are needed?
  2. How suitable for preservation are the different formats?
    1. Can the datastream contents be accessed as/exported to files on disk?
    2. Can the metadata be accessed in/exported to a human- and machine-readable format?

Testing Plan:

  1. Ingest UCSD DAMS repository content (50K objects, ~8 TB) into Fedora 4.
  2. Use each bulk-copying approach while running a performance test suite.
  3. Examine output files from each approach to assess preservation value.