Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

The Chain of Responsiblity pattern is a popular pattern for easing the evaluation of which command in a set would be responsible for execution of a responsibility

Limitations in current Package Ingest framework...

  1. Packagers rely on being designated by NAMES from the command-line and or designated by SWORD.
    1. In the face of not knowing the appropriate packager, we would like the repository to offer an option to allow it to guess the packager to use.
  2. No graceful degradation exists if the package is just a file or simple zip package
    1. We would like to devise a gracefull degredation when specific "sub-type" can still be determined
  3. A series of default Packagers should be able to be constructed that take sane values and use a simple heuristic to determine which packagers can be applied:
    1. If the face of errors in a subtype packager, parent packagers could be set into a debug state, place the item in the submission workflow, allowing the user to review the result prior to to accepting/rejecting.
    2. Treats Submission Workflow as a "Staging tool" prior to pushing content into the repository.

...

  1. FileIngester: Takes any file and stores it in a bitstream, attempts to match BitstreamFormat where possible.
    1. Custom implementations can support format specific tasks (extract metadata of pdf or doc, acquire metadata from mpeg
  2. ZipIngester: Takes any Zip file and expands it into attached bitstreams, named by the directory path found in the archive.
    1. Custom implementations may handle different types of archives (ZIP, RAR, TAR, TAR.GZ)
  3. ManifestBasedIngester: Takes a Manifest and processes it to embellish the Item and Bitstreams from the manifest file. (Either placed inside a ZIP or standalone)
    1. Custom types to support websites and other types saved as zip packages

Challenges with current METS packager design.

  1. The problem is that the current packagers always assume you are processing a manifest file. I simply want to process the contents of the zip and name the item and contents solely on those feathers by default, then build customizaed packagers up based on this. It will make creating new packagers much easier because the basic functionality for getting the bitstreams created will be separated from the evaluation of the manifest.

...

I'd like to write something that uses a chaining or filtering model, where each packager operates like a filter on the content.

Input File - > File Ingester - > Zip Ingester -- > Manifest Based Ingester

InputFile - > FileIngester - > Manifest Based Ingester.

...

Because the ingest chain fails gracefully, they know at which stage the failure is occurring at, can review the state of the current item and can review their SWORD output to determine what when wrong in that packager.

Use Cases : Areas this will be useful in

Submission Workflow

Adding Packager Processing into the Submission File Upload Step so that an Item can be easily viewed and adjusted after being processed at upload. Example Requirements:

  1. Upload Step
    1. Initially Processes file or package into Item + Bitstreams, attempts to determine optimal packager type
    2. user is offered selection of processing options.
    3. based on choice, packager is executed
      1. for instance if PDF metadata is available, it is inserted into Item metadata
      2. for instance if Softchalk Package is detected, index.html or other files are processed to attach Item Title and or other details extracted from source.
  2. Edit Metadata Step, give the user a chance to review and add metadata, required fields that may have been absent on upload can now be added.
  3. CC License Step, if License we detected in Upload stage, its now attached to the Item and can be viewed.
  4. Final Repo License Step
  5. Final review and Submission