Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titledetails

Excerpt Include
2016-12 Import - Export Sprint 03 Meetings
2016-12 Import - Export Sprint 03 Meetings
nopaneltrue

Sprint 4

Expand

Excerpt Include
2017-05 Import - Export Sprint 04 Meetings
2017-05 Import - Export Sprint 04 Meetings

Use cases

  1. Transfer between Fedora and external preservation systems, such as APTrust, MetaArchive, LOCKSS, DPN, Archivematica, etc

  2. Package [Export] the content of a single Fedora container and all its descendant resources

  3. Transfer between fedora instances or (more generally) from Fedora to an LDP archive

  4. load [Import] the contents of a package into a specified container.

  5. Round-tripping resources in Fedora in support of backup/restore

    1. A start has been made on this in FCREPO-1990

    2. The implementation referenced in the above ticket is not dead, though not actively being worked on at the moment; pull requests welcomed (though others may well wish to take it in a different direction).

    3. A rebuilder that:

      1. Is not solely dependent on a intact backup of the repository index

      2. Works off shredded serializations that can be supported with file preservation techniques

      3. Can recover as much as possible of a repository in the face of integrity issues (supports partial recovery)

      4. Supports gathering copies of the shreds (serializations) from multiple sources to recover a repository

  6. Round-tripping resources in Fedora in support of Fedora repository version upgrades

  7. Batch loading arbitrary sets of resources from metadata spreadsheet and binaries (may well be difficult – or not worth it – to try to generalize such a feature).

  8. Import or export containers or binaries using add, overwrite, or delete operations. Configure the data model and the source and the target for each resource that will be updated. Allow target containers to be non-empty before import and source containers to be non-empty after export. Maintain ordering, etc. Support versioning. Examples: add issues to a publication; add fragments to a manuscript; add data sets to a longitudinal study; add time-series images from telescopes; remove resources determined to be under copyright; release resources after restrictions on access have expired.

    1. Perform multiple metadata-only exports, and then restore an earlier version from an export.

Use cases yet to be rolled into requirements

  1. Import objects from an external system (such as Figshare, where a research data object might be prepared) into a Fedora preservation repository with either Hydra or Islandora on top. (Implies compliance with Hydra and/or Islandora object models)

  2. To migrate from internal content to external content, export metadata only and then import it into another repository.  The links to the new external content locations would be added afterwards.

Requirements

External Systems

  1.  

    Status
    colourBlue
    titlePhase 2
    Support import from and export to a TBD list of external systems.

    1. APTrust - University of Maryland (Joshua Westgard)

    2. Archivematica - Artefactual Systems (Justin Simpson)

    3. MetaArchive - Penn State (Ben Goldman)

    4. Perseids - Tufts - Bridget Almas

General

  1. Status
    titlePhase 1
    Support transacting in RDF

  2. Status
    titlePhase 1
     Support allowing the option to include Binaries

  3. Status
    titlePhase 1
     Support references from exported resources to other exported resources

  4. Status
    colourBlue
    titlePhase 2
     Support transacting in BagIt bags

  5. Status
    titlePhase 1
     Support import into a non-existing Fedora container

  6. Status
    colourBlue
    titlePhase 2
     Support import into an existing, empty Fedora container

  7. Status
    subtletrue
    colourBlue
    titlePhase 3
    Support import into an existing, non-empty Fedora container with various policies: add, overwrite, delete, version, skip

  8. Status
    subtletrue
    colourBlue
    titlePhase 3
    Support export of resource versions

  9. Status
    subtletrue
    colourBlue
    titlePhase 3
    Support import of resource versions

  10. Status
    titlePhase 1
     Support export of resource and its "members" based on the ldp:contains predicate

  11. Status
    colourBlue
    titlePhase 2
     Support export of resource and its "members" based on a user-provided membership predicate

  12. Support recursive RDF insert/updates with LDP Indirect Container specified POST (and PUT / PATCH?) (ref: FCREPO-2042)

Round-tripping

Defined as: Export all or a subset of a Fedora repository and importing the export artifacts into a Fedora repository.

  1. Status
    subtletrue
    colourBlue
    titlePhase 3
    Support preservation of dates during round-tripping 

  2. Status
    subtletrue
    colourBlue
    titlePhase 3
    Support preservation of version snapshots during round-tripping 

  3. Status
    titlePhase 1
     The URIs of the round-tripped resources must be the same as the original URIs

  4. Status
    subtletrue
    colourBlue
    titlePhase 3
    Support lossless round-tripping.  (ie, if you export a resource, delete that resource and import there is no difference from if you had never performed any of those operations).

BagIt

  1. Status
    colourBlue
    titlePhase 2
     Single resource bags

  2. Status
    colourBlue
    titlePhase 2
     The structure and scope of accepted and produced BagIt bags must be configurable (resource)

    1. Clarification: structure relates to required and optional tagfiles in the bag

    2. Clarification: scope relates to contents of the bag, e.g. single object or object and all members based on specific membership predicate

  3. Status
    subtletrue
    colourBlue
    titlePhase 3
    Multi-resource bags

  4. Status
    subtletrue
    colourBlue
    titlePhase 3
    Unambiguously support linking between resources within a bag, and from resources in the bag to resources outside the bag

    1. e.g. for bagged resources A and B, if A contains statement <A> myns:rel <B>, then it is unambiguous that B is a resource in the bag.  Suppose some archive ingests the bag and exposes its contents as web resources with URIs P and Q. If the archive preserves intra-bag links, resource P will have statement <P> myns:rel <Q>.  Likewise, if A contains external link <A> myns:rel2 <http://example.org/outside/the/bag>, then an archive that preserves links will have <P> myns:rel2 <http://example.org/outside/the/bag>

Verification Tool

  1. Status
    colourBlue
    titlePhase 2
    Verify same number of resources on disk as in fcrepo

  2. Status
    colourBlue
    titlePhase 2
    Verify same number of resources in fcrepo as on disk

  3. Status
    colourBlue
    titlePhase 2
    Verify same checksum for binaries

  4. Status
    colourBlue
    titlePhase 2
    Verify same triples for containers

  5. Status
    colourBlue
    titlePhase 2
    Record which resources have been verified (Include checksum for binary resources)

  6. Status
    colourBlue
    titlePhase 2
     Verify subset of repository resources

  7. Status
    subtletrue
    colourBlue
    titlePhase 3
     Verify fcrepo to fcrepo

  8. Status
    subtletrue
    colourBlue
    titlePhase 3
     Verify disk to disk

  9. Status
    subtletrue
    colourBlue
    titlePhase 3
     Use generated config file as sole input

Considerations

  • Import/export performance as is possible under the assumption that this work is done via the REST interface

Resources

Meetings