You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Current »

This page details the specification of DPN Content Packages.

DPN BagIt Bags (Content Packages)

BagIt is a hierarchical file packaging format designed to support disk-based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" (the arbitrary content) and "tags", which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding secure hash /message digest (checksum).

Specification

  1. DPN packages will conform to the BagIt packaging format (spec)
  2. DPN packages may either be 
    1. serialized (e.g. a single tar)
    2. un-serialized (e.g. exploded directory structure)

DPN Bagit Structure

DPN requires packaging content for transport, but does not specify how each replicating node incorporates digital objects into their respective repositories.

DPN Bag Structure

<DPN-Object-ID>/
         |   bagit.txt
         |   manifest-sha256.txt
         |   bag-info.txt
         |   tagmanifest-sha256.txt
         \--- data/
               |   [payload files]
         \--- dpn-tags/
               |   dpn-info.txt
         \--- [optional tag directories]/
               |   [optional node tag files]

Description

DPN-Object-ID (directory)
  • Name of the root directory of the bag required by bagit spec
  • The Unique DPN UUID of the objects, same as the dpn-info.txt: DPNObject-ID value
bag-it.txt
  • As listed in required element in BagIt spec.
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
manifest-sha256.txt
bag-info.txt
  • bagit spec section 2.2.2
  • Using this to add additional information to help with succession
  • Fields that may have been redundant with local dpn-info.txt fields are recommended to be kept in dpn-info.txt to avoid confusion
  • DPN requires the presence of the following fields, although they may be empty.  Please note that the values of "null" and/or "nil" should not be used.  The colon (:) should still be present. 


   Source-Organization
   Organization-Address
   Contact-Name
   Contact-Phone
   Contact-Email
   Bagging-Date
   Bag-Size
   Bag-Group-Identifier
   Bag-Count
  • Other fields are optional for use by the Ingest Node but are ignored by all common DPN processes.
tagmanifest-sha256.txt
  • bagit spec section 2.2.1
  • Contains secure hash of tag files
  • This will ensure the metadata we are storing with the bag is preserved
  • All objects in the bag, including those in the optional tag directories must be represented in the tag manifest.
fetch.txt
  • Not supported DPN as we do not support Holey-bags.
data (directory)
  • Required directory for payload items
  • May be encrypted for dark content.
dpn-tags (directory)
  • Directory for DPN specific tag files (covered under optional tag directories of the bagit spec section 2)
  • All DPN tag files go under this directory with the naming convention ‘dpn-<filename>.txt’ following bagit text tag file specifications
dpn-tags/dpn-info.txt
  • DPN tag file containing field set below:
DPN-Object-ID: Unique ID generated by Ingest Node. 
Local-ID:  Local identifier from originating repository.
Ingest-Node-Name:  Name of the ingest node or source repository
Ingest-Node-Address:
Ingest-Node-Contact-Name:
Ingest-Node-Contact-Email:
Version-Number: Sequential positive integer
First-Version-Object-ID: Object-ID of the first version of the item
Interpretive-Object-ID: DPN UUID of Interpretive bag for this object
Rights-Object-ID: Reference to DPN and repository agreements
Bag-Type: data | interpretive | rights # Bags will be only one of these three types of objects.
  • The naming convention of fields that hold DPN UUIDs have the suffix "Object-ID"
    • Alternative naming conventions to also be considered include: "OID", "DPN-ID", "DID", "Reference-ID", "Ref-ID", etc
  • Every field must appear.  If a field does not have a value, it should still appear but be left blank.  
  • All fields must have a value, except for:
    • First-Version-Object-ID
    • Interpretive-Object-ID ("rights" and "interpretive" only)*
    • Rights-Object-ID ("rights" and "interpretive" only)*

  • Fields that could contain more than one value should be repeated for each value.  Do not separate with commas
    • Currently, the only fields that may be repeated are "Interpretive-Object-ID" and "Rights-Object-ID".]
    • Example:
Interpretive-Object-ID: UUID #1
Interpretive-Object-ID: UUID #2
Interpretive-Object-ID: UUID #3
optional node tag directory and files
  • As with the convention used with the DPN optional tags we recommend the directory naming convention of `<node name>-tags` and file naming convention of `<node name>-<filename>.txt` following the bagit specification
  • If included, the files must be represented in tagmanifest-sha256.txt.

DPN Bag Transfer Protocol

  1. DPN will transfer valid DPN bags that have been 'tar'red. I.e. serialized bags.
  2. Upon finishing the transfer of a bag-tar file - the replicating node will compute the SHA256 hash of the serialized file. This is the hash that will be sent to the first-node and shows that the tarred bag was transferred without errors.
  3. The SHA256 hash of the bag's tagmanifest-sha256.txt file will be calculated by the originating node, used as the fixity_value for the bag, and kept in the registry.

 

  • No labels