Overview

Hypatia has the goal of effectively managing born-digital components of an archival collection, collections traditionally described by Finding Aids in the form of EADs (Encoded Archival Descriptions). There is an established body of practice describing the physical components of such collections at many levels of detail, including physical artifacts (e.g., diskettes) associated with born digital materials. Describing, and delivering, digital content itself faces many challenges. The phrase "unprocessed collection" when applied to born digital materials means there is description only down to the physical media itself, with a requirement to associate digital artifacts, like a raw binary disk image and photos, to that description. The phrase "processed collection" means there is more detailed description, down to the file level, supporting an intellectual arrangement of the content as well as access to the individual files.

This analysis addresses both processed and unprocessed collections, assuming the Hypatia solution of mapping an EAD into Hydra/Fedora objects, then managing born-digital content as Fedora objects through the Hypatia interface.

Processing born-digital archival materials into Hypatia involves several discrete considerations.

EAD element reference

EAD hierarchical structure mapping and content mapping

This section addresses how EADs map into Hypatia set and item objects. See Hypatia content and layout for Sets and Items for more details.

Skeletal structure of sample EADs:

Site

Collection

EAD structure / location of born digital materials [type of hydra object]

notes

Hull

Gallagher

collection [set]

 

Hull

Socialist Health Assoc.

 

 

Stanford

Xanadu

collection [set]
- <c> file (10 of 10) -- unittitle "Born Digital" [set]
- - <c> item -- unitid: CM01 [item]
- - <c> item -- unitid: CM02 [item]
- - <c> item -- unitid: CM03 [item]

1. Target "born digital" sub-level identified by <unittitle>
2. Collection only described to the container level (hard drives).
3. EAD "item" level  corresponds to target Fedora "item".
4. Item <unitid> is used as a filename stem to bind content files to Hypatia objects.

Stanford

Gould

collection -- unittitle "Stephen Jay Gould papers" unitid: M1437 [set]
- <c> series -- unittitle "Series 6: Born Digital Materials" [set]

EAD only goes down to the single "Born digital" series description, with no details expressed at lower levels. A rationalized directory structure and FTK output are intended to support a direct translation into Hypatia objects for both unprocessed and processed views without an intermediary EAD.

Virginia

Cheuse

 

 

Yale

Conn. Oral Histories

 

 

Yale

Love Makes a Family

 

 

Yale

Pelli

 

 

Yale

Tobin

collection [set]
- <c01> series (3 of 7) -- unittitle "Accession 2004-M-088" [set]
- - <c02> file (28 of 29) -- unittitle "Computer diskettes [3.5 inch]" [set]
- - - <c03 file -- unitid: 2004-M-088.0001 [item]
           :
- - - <c03 file -- unitid: 2004-M-088.0027" [item]

1. Target sub-level identified by <unittitle>
2. Collection only described to the container level (diskettes).
3. EAD "item" level  corresponds to target Fedora "item".
4. Item <unitid> is used as a filename stem to bind content files to Hypatia objects.

Yale

Turner

 

 

Yale

Welch

 

 

Assumptions:

More assumptions:

Sets vs items

A "collection" in Hypatia is the primary set established by the information at the initial <archdesc> level of the EAD, regardless of its "level" designation, e.g.,

Intermediate levels defined by the hierarchical arrangement of <c> or <c0n> tags form a structural hierarchy of Hypatia sets.  The "level" vocabulary, while remembered, is not itself significant except as a aid in determining the boundaries between sets and individual Hypatia objects.  In practice, they might be any one of 

In general, "item" level entries in an EAD will map to individual Hypatia objects. The "file" level" may also be considered an item node for a specific EAD. Unless otherwise indicated for a specific conversion, other levels will translate to set objects in Hypatia, even if they are empty sets because the EAD had no item-level description.

Stanford FTK-backed Digital Object creation

Stanford uses the Forensic Toolkit (FTK) software to analyze and characterize the contents of computer media. Starting with the Gould collection, we will only provide a single series node in the EAD to represent "Born Digital Materials". Conversion routines will be able to auto-generate objects representing the unprocessed collection (the media artifacts themselves, e.g., hard drives and floppy discs) as well as detailed file content objects from a modified form of the FTK output. See Stanford FTK to Hypatia object mapping

EAD-to-MODS - general information

There is a scarcity, dare I say dearth, of tools available to do this mapping or to offer an existing implemented conversion.  So the mappings here are based on data encountered in the Hypatia sample EADs and can be augmented as we go along. They are informed by two sources:

Assumption: Working assumption is that all descriptive metadata, for collections, intermediate sets (levels) and digital objects will be MODS.

Issue: the mapping from EAD to MODS is not perfect and is not fully reversible:

Issue: The EAD schema makes extensive use of complex XML types with mixed content. This is a pattern where an XML element contains a free mix of free text and other sub-elements.

We will refer below to this set of embedded-element set of translations as "embedded element conversion".

Note that for Stanford, a lossy translation is not an issue as long as affected parts of the EAD are still sourced and maintained externally, e.g., in Archivist Toolkit. Eventually a transition away from the EAD support for markup will have to be addressed.

Issue: Tags that have no mapping into MODS

With one exception, we will map these into Notes, using displayLabel to let them appear with specific labels in the Hypatia display.

Conversion rule (Stanford): Use of <head> at the beginning of text fields as a labeling convention ...

<scopecontent id="ref13">
   <head>Collection Scope and Content Summary</head>
   <p>The collection includes files from XOC, VHS tapes, and Drexler drafts and galley proofs.</p>
 </scopecontent>
 <bioghist id="ref11">
   <head>Biography</head>
   <p>Keith Henson and his wife Arel Lucas founded XOC (Xanadu Operating Company).</p>
 </bioghist>

Issue: Tag attributes that have no mapping into MODS

These are numerous and will not be enumerated in full. Some examples:

<repository encodinganalog="3.1.2">Hull University Archives</repository>

<unitid encodinganalog="3.1.1" label="Reference" countrycode="GB" repositorycode="50">U DGA/1/2/5/a</unitid>

<unitid label="Call Number:" countrycode="US" repositorycode="US-CtY">MS 1746</unitid>

<physdesc encodinganalog="3.1.5" label="Extent">

<accessrestrict id="ref5"> ...
<userestrict id="ref4"> ...
<prefercite id="ref6"> ...

<origination label="creator">
     <persname rules="aacr" source="ingest">Gould, Stephen Jay</persname>
</origination>

<persname rules="aacr" source="naf">Shearer, Rhonda Roland, 1954- </persname>

<unitdate normal="1951/1996" type="inclusive" calendar="gregorian" era="ce">1951-1996</unitdate>

<note type="bpg">
     <p>This encoded finding aid is compliant with the Yale EAD Best Practice Guidelines, Version 1.0.</p>
</note>

Conversion rule: Attributes not specifically targeted for conversion will be ignored/lost.

Issue: retaining ref and level information, do these map to appropriate container descriptions?

Issue: "otherlevel" levels -- <c level="otherlevel" otherlevel="SubSeries"> (Hull)

Issue: Stanford <container> conventions and mapping into a MODS "Location" note (revised 10/24/11 to split out Collection title in item record and nest this information in a relatedItem):

We will create a concise representation of the physical/logical location (as appropriate) of the materials in the context of the collection and its hierarchy. It will be a MODS <relatedItem><physicalLocation type="location">. It will be a concatenation of the following information:

Assembles as "Series 6: Born Digital Materials - Box 11 - Folder 3"

Is this generalizable, across Stanford collections? across institutions?

Examples:

Collection

EAD

MODS

Gould

<c id="ref432" level="file">
    <did>
       <unittitle>Gardner, Howard</unittitle>
       <container id="cid57883022" type="Box" label="Mixed materials">4</container>
       <container parent="cid57883022" type="Folder">27</container>

<mods:relatedItem type="host">
   <mods:title>
      <mods:titleInfo>Stephen Jay Gould papers</mods:titleInfo>
    </mods:title>
   <mods:typsOfResource collection="yes"/>
</mods:relatedItem>
<mods:relatedItem type="host">
   <mods:location>
      <mods:physicalLocation type="location">Series 1: Correspondence - Box 4: Mixed materials - Folder 27</physicalLocation>
    </mods:location>
</mods:relatedItem>

was:
<mods:location>
   <physicalLocation displayLabel="Located in">Stephen Jay Gould papers - Series 1: Correspondence - Mixed materials - Box 4</physicalLocation>
</mods:location>

Hensen

<c id="ref50" level="item">
     <did>
         <unittitle>CM01</unittitle>
         <unitid>CM01</unitid>
         <container id="cid59523001" type="Carton" label="Computer disks / tapes">11</container>

<mods:relatedItem type="host">
   <mods:title>
      <mods:titleInfo>Keith Henson. Papers relating to Project Xanadu, XOC and Eric Drexler</mods:titleInfo>
    </mods:title>
   <mods:typsOfResource collection="yes"/>
</mods:relatedItem>
<mods:relatedItem type="host">
   <mods:location>
      <mods:physicalLocation type="location">Series 6: Born-Digital Materials - Carton 11: Computer disks / tapes</physicalLocation>
    </mods:location>
</mods:relatedItem>

was:
<mods:location>
   <physicalLocation displayLabel="Located in">Keith Henson. Papers relating to Project Xanadu, XOC and Eric Drexler - Born-Digital Materials - Computer disks / tapes - Carton 11</physicalLocation>
</mods:location>

Issue: Derived <mods:location> information

Where all items objects are derived from FTK information about files in a directory, how is this logical_physical location information assembled and presented?

Collection

FTK

MODS

Gould

 

<mods:relatedItem type="host">
   <mods:title>
      <mods:titleInfo>Stephen Jay Gould papers</mods:titleInfo>
    </mods:title>
   <mods:typsOfResource colleciton="yes"/>
</mods:relatedItem>
<mods:relatedItem type="host">
   <mods:location>
      <mods:physicalLocation type="location">Series 6: Born Digital Materials - [directoryname]</physicalLocation>
    </mods:location>
</mods:relatedItem>

was
<mods:location>
   <physicalLocation displayLabel="Located in">Stephen Jay Gould papers - Series 6: Born Digital Materials - [directoryname]</physicalLocation>
</mods:location>

Issue: Recursively nested <descgrp>

Virginia, Yale:

<descgrp id="ai" type="admininfo">
     <head>Administrative Information</head>
     <descgrp id="prov" type="provenance">
         <head>Provenance</head>
         <acqinfo id="acq">
             <p>Purchased from Ken Lopez on various funds, 1994-2007.</p>
         </acqinfo>
     </descgrp>
     <userestrict id="urest">
         :
     <prefercite id="cite">
         :
     <accessrestrict id="arest">
         :
     <processinfo id="pi">
         :
</descgrp>

So far, other, more complex examples have not been found in the samples, e.g., nested <bioghist> to partition a biography with separate headings.

Conversion: Ignore the wrapping <descgrp> of type="admininfo"

EAD-to-MODS mapping for an individual item

Note: this specifies the conversion from EAD metadata to MODS for an individual item. conversion should use a "mods" namespace declaration and qualified tags, e.g.,

<mods:mods xmlns:mods="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="3.3"
      xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
   <mods:titleInfo>
      <mods:title>Keith Henson. Papers relating to Project Xanadu, XOC and Eric Drexler</mods:title>
   </mods:titleInfo>

EAD element

MODS element

Notes

Example                                                                                                  

<unittitle>

<titleInfo>
   <title>

• Requires embedded element conversion

 

<origination>
   <persname>
or
   <corpname>
    

<name type="...">
   <namePart>

• EAD/persname maps to MODS <name type="personal">
• EAD/corpname maps to MODS <name type="corporate">
• EAD/origination label=creator (case insensitive) maps to MODS/name <role> sub-element (else ignore. no other value occur in sample data)
• EAD/origination source attribute maps to MODS/name authority attribute

<origination label="creator">
    <persname rules="aacr" source="ingest">Gould, Stephen Jay</persname>
</origination>

<mods:name type="personal" authority="ingest">
     <mods:namePart>Gould, Stephen Jay</mods:namePart>
     <mods:role>
         <mods:roleTerm authority="marcrelator" type="text">creator</mods:roleTerm>
     </mods:role>
</mods:name>

<repository>

<name>

Map <repository><corpname> to
<location><physicalLocation type="repository">

Ignore other embedded elements, e.g., <address>

<repository>
      <corpname>Stanford University. Department of Special Collections and University Archives</corpname>
</repository>

<mods:location>
    <mods:physicalLocation type="repository">Stanford University. Department of Special Collections and University Archives</mods:physicalLocation>
</mods:location>

was (revised 10/24/11):
<mods:name type="corporate">
      <mods:namePart>Stanford University. Department of Special Collections and University Archives</mods:namePart>
      <mods:role>
         <mods:roleTerm authority="local" type="text">repository</mods:roleTerm>
      </mods:role>
   </mods:name>

No corresponding EAD element

<typeOfResource>

For any Hypatia set created, create an entry indicating a collection.
(Mark top collection only; intermediates series are sets but not collections)

The following values are applicable to born digital materials: "sound recording", "still image", "moving image", "software, multimedia".  Attempt to generate based on format?

<mods:typeOfResource collection="yes"/>

<controlaccess>
   <genreform>

<genre>

• EAD origination source attribute maps to MODS/genre authority attribute
• DLF guidelines suggest creating separate MODS <genre> elements from this controlaccess subelement -- one at document level and one as a subject entry.

<controlaccess>
    <genreform source="aat">Videorecordings.</genreform>

<mods:genre authority="aat">Videorecordings</mods:genre>

<unitdate>

<originInfo>
   <dateCreated>

If only one <unitdate> is present for a <did>, add attribute keydate="yes". If more than one <unitdate>, only add keydate="yes" if EAD type="inclusive".

<mods:originInfo>
     <mods:dateCreated ketDate="yes">1977-1997</mods:dateCreated>
</mods:originInfo>

<langmaterial>
   <language>

<language>
   <languageTerm>

For <langmaterial>
• If no <language> sub-element, map  <langmaterial>#PCDATA to <langTerm type="text">#PCDATA
• else map each <language> sub-element to a separate MODS <langTerm> element and ignore any #PCDATA

For <language>
• If langcode attribute and no #PCDATA, create
        <languageTerm type="code" authority="iso639-2b"> (Stanford: prefer converting this to text)
• If #PCDATA and no langcode attribute, create
        <languageTerm type="text">
if both, create only the Text form
• Ignore scriptcode

<langmaterial label="Language(s):">The materials are in <language langcode="eng" scriptcode="Latn">English</language>.</langmaterial>

<mods:language>
    <mods:languageTerm type="Text">English</mods:languageTerm>
</mods:language>
-----
<langmaterial>
    <language langcode="eng"/>
</langmaterial>

<mods:language>
    <mods:languageTerm authority="iso639-2b" type="code">eng</mods:languageTerm>
</mods:language>

No corresponding EAD element

<physicalDescription>
   <digitalOrigin>

Add a "born digital" indication only for the born digital items in the collection, else omit.

<mods:physicalDescription>
      <mods:digitalOrigin>born digital</mods:digitalOrigin>

<physdesc>
   <extent>

<physicalDescription>
   <extent>

• Each EAD <extent> subelement will become a MODS/extent element
• If EAD <physdesc> has no sub-elements, map its #PCDATA into MODS/extent

<physdesc>
      <extent>1.0 computer media</extent>
      <extent>hard drive</extent>
</physdesc>

<mods:physicalDescription>
      <mods:extent>1.0 computer media</mods:extent>
      <mods:extent>hard drive</mods:extent>
</mods:physicalDescription>
-----
<physdesc label="Physical Characteristics">This collection consists of ca. 3,200 items</physdesc>

<mods:physicalDescription>
      <mods:extent>This collection consists of ca. 3,200 items</mods:extent>
</mods:physicalDescription>

<abstract> or <scopecontent>

<abstract>

Map EAD label attribute to MODS displayLabel attribute

Note DLF guidelines suggests the first paragraph of <scopecontent> could be used as an abstract, but it does not otherwise map <scopecontent>. Recommend a simple clean mapping of each as described here. 

<abstract label="Summary:">The papers consist of correspondence, subject files, and writings, primarily documenting the professional career and personal life of James Tobin as an economist and educator.</abstract>

<mods:abstract displayLabel="Summary:">The papers consist of correspondence, subject files, and writings, primarily documenting the professional career and personal life of James Tobin as an economist and educator.</mods:abstract>

<descgrp>
<bioghist>
<acqinfo>
<prefercite>
<userestrict>
<processinfo>
<note>

<note>

• Requires embedded element conversion
• Ignore a wrapping <desdgrp type="admininfo">
• They should be converted to notes in the order encountered in the EAD.
• A leading <head> value should map to the MODS displayLabel attribute, else provide a default displayLabel as follows:

  • <bioghist> = "Biography"
  • <acqinfo> = "Acquisition Information"
  • <prefercite> = "Preferred Citation"
  • <userestrict> = "Use restrictions"
  • <processinfo> = "Processing information"
  • <note> = "Note"

<prefercite id="ref6">
    <head>Cite As</head>
    <p>James Tobin Papers. Manuscripts and Archives, Yale University Library.</p>
</prefercite>

<mods:note displayLabel="Cite As">James Tobin Papers. Manuscripts and Archives, Yale University Library.</mods:note>

<arrangement>

<tableOfContents>

Mapping per DLF guidelines, with default displayLabel of "Arrangement".

<arrangement id="ref206">
     <head>Arrangement note</head>
     <p>The records are arranged in three series: I. Administrative Records,1991-2010. II. Audiovisual Recordings ...</p>
</arrangement>

<tableOfContents displayLabel="Arrangement note">The records are arranged in three series: I. Administrative Records,1991-2010. II. Audiovisual Recordings</tableOfContents>

No corresponding EAD element

<targetAudience>

mapping not applied to sample EADs

 

<odd>

<note>

not found in sample EADs

 

<controlaccess> with
   <corpname>
   <famname>
   <function>
   <genreform>
   <geogname>
   <name>
   <occupation>
   <persname>
   <subject>,
   <title>

<subject> with
   <topic>
   <geographic>
   <temporal>
   <titleInfo>
   <name>
   <genre>
   <hierarchicalGeographic>
   <cartographics>
   <geographicCode>
   <occupation>

Mappings of EAD <controlaccess> subelements to MODS's <subject> subelements:
• EAD <corpname> = MODS <name type="corporate">;
• EAD <famname> = MODS <name type="personal">;
• EAD <function> = MODS <topic> with no @authority attribute on <subject>;
• EAD <genreform> = MODS <genre>;
• EAD <geogname> = MODS <geographic>;
• EAD <name> = MODS <name>;
• EAD <occupation> = MODS <occupation>;
• EAD <persname> = MODS <name type="personal">;
• EAD <subject> = MODS <topic>; and
• EAD <title> = MODS <titleInfo>.

Map EAD source attribute for any <controlaccess> subelement to MODS authority attribute on <subject>.

<controlaccess>
   <persname rules="aacr">Lucas, Arel</persname>
   <corpname rules="dacs" source="ingest">Xanadu Operating Company (XOC)</corpname>
    <subject source="lcsh">Electronic publishing.</subject>
    <genreform source="aat">Videorecordings.</genreform>
    <subject source="lcsh">Word processing.</subject>
</controlaccess>

<mods:subject>
    <mods:name type="personal">
        <namePart>Lucas, Arel</namePart>
    </mods:name>
</mods:subject>
<mods:subject authority="ingest">
    <mods:name type="corporate">
        <namePart>Xanadu Operating Company (XOC)</namePart>
    </mods:name>
</mods:subject>
<mods:subject authority="lcsh">
    <mods:topic>Electronic publishing.</mods:topic>
</mods:subject>
<mods:subject authority="aat">
    <mods:genre>Videorecordings.</mods:topic>
</mods:subject>
<mods:subject authority="lcsh">
    <mods:topic>Word processing.</mods:topic>
</mods:subject>

No corresponding EAD element

<classification>

No mapping in samples

 

No corresponding EAD element

<relatedItem>

No mapping in samples

 

<unitid>

<identifier>

• All mapped to identifier of type=unitid
• EAD label attribute mapped to MODS displayLabel attribute

<unitid>M1437</unitid>

<identifier type="unitid">M1437</identifier>
-----
<unitid label="Call Number:" countrycode="US" repositorycode="US-CtY">MS 1746</unitid>

<identifier type="unitid" displayLabel="Call Number:">MS 1746</identifier>

No corresponding EAD element

<location><url>

No candidate sample data, through conversions could provide useful additions for born digital materials

 

<accessrestrict>

<accessConditions>

• Requires embedded element conversion
• Map <head> subelement to MODS attribute displayLabel
• Apply attribute type="restrictionOnAccess"

<accessrestrict id="ref5713">
    <head>Access to Collection</head>
    <p>Open for research. Audio-visual materials are not available in original format...</p>
</accessrestrict>

<accessConditions type="restrictionOnAccess" displayLabel="Access to Collection">Open for research. Audio-visual materials are not available in original format...</accessConditions>

<userestrict>

<accessCondition>

• Requires embedded element conversion
• Map <head> subelement to MODS attribute displayLabel
• Apply attribute type="useAndReproduction"

<userestrict id="ref5">
    <head>Publication Rights</head>
    <p>Property rights reside with the repository. Literary rights reside with the creators of the document....</p>
</userestrict>

<accessConditions type="useAndReproduction" displayLabel="Publication Rights">Property rights reside with the repository. Literary rights reside with the creators of the document....</accessConditions>