Introduction

Metadata remediation, or cleanup, is a common (though not essential) component of migration projects. This document outlines possible goals, considerations, tools, and timelines for metadata remediation in the context of a repository migration.


Goals

Before embarking on a metadata remediation project it is important to set out a clear set of goals. The following are examples of goals you may wish to pursue.

  1. Standardization of metadata elements. This may include such things as date encoding and date conventions, and presentation of proper names (e.g. last, first) that may improve sorting and faceting.
  2. Standardization of metadata fields. This may include:
    1. Field consolidation: converting functionally similar fields to a single standardized field
    2. Elimination of duplicate or redundant fields
    3. Introduction of new fields to suit particular use cases
  3. Documentation. Things like controlled vocabularies, requirements for field use, and spelling and capitalization conventions for metadata elements are all useful to document.
  4. Re-evaluation of titles, descriptions, or collection structures. This could include improving titles and descriptions to be more accurate or complete, as well as addressing the meaning or usage of an object or collection as it has evolved over time.


Resources

Metadata remediation is best considered in the context of existing standards and best practices. A lot of work has already been done in this area so it is a good idea to review these resources before making any decisions. This list is not exhaustive but it should serve as a good starting point.

Standards

  1. Library of Congress Subject Headings
  2. Describing Archives: A Content Standard (DACS)
  3. Resource Description and Access (RDA)
  4. Dublin Core (DC)
  5. VRACore

Community Best Practices

  1. Sunshine State Digital Network Metadata Participation Guidelines
  2. DPLA Metadata Quality Guidelines
  3. DPLA Metadata Application Profile
  4. DigiNole Metadata Guidelines
  5. UMass Amherst Libraries Metadata Guidelines
  6. California Revealed Guidelines For Participation
  7. Northern Illinois University Data Dictionary
  8. Digital Culture of Metropolitan New York (DCMNY) Metadata Requirements and Guidelines
  9. University of Washington Libraries Data Dictionaries
  10. Louisiana State University Libraries Metadata Guidelines for “Free People of Color in Louisiana”
  11. California Digital Library Guidelines for Digital Objects
  12. Emory University Core Metadata
  13. Metadata Guidelines for Digital Resources at Texas A&M University Libraries
  14. Metadata Application Profile, University of Notre Dame, Hesburgh Libraries


Tools

There are a number of tools that may prove useful for metadata remediation work. These include:

  1. Tools for the evaluation and decision-making process
    1. Google Docs and Sheets 
    2. Trello
    3. Islandora Metadata Interest Group Documentation
    4. Protege
    5. Desktop notepad application (e.g. Notepad++)
  2. Tools to perform the remediation work
    1. Google Docs and Sheets
    2. Microsoft Excel
    3. OpenRefine

Process

It is important to establish and document a clear process for making decisions so the remediation effort does not stall out. This documented process will also be beneficial to anyone in the future who may need to review the process and determine how and why particular decisions were made. Ideally, the decision making process will involve multiple people who represent stakeholder interests and have relevant expertise. Forming a subcommittee is one way to achieve the appropriate representation and expertise.

The decision making process should be established and documented before beginning the work. In the case of a subcommittee, a process of reaching consensus might be preferred, with a majority-rules vote if consensus cannot be reached. Additionally, if a decision relates to a particular area where a subcommittee member has expertise, that member could be deferred to for those specific decisions. Sometimes consultation with outside groups will be appropriate, in which case it is important to have as much communication as possible with these groups, including an outline of options for remediating their metadata and timelines for doing so. It is also helpful to set expectations for the scope of remediation. All decisions should be documented, along with any areas where the subcommittee does not have authority to make decisions.

For an example of a metadata remediation process, including links to historical documents and spreadsheets, please see the Whitman College case study.


Information Management

A lot of documentation will be generated during the remediation process, so a good information management strategy is important. We recommend establishing a shared location for all documentation (e.g. Google Drive) along with a clear structure based on your needs. This may include folders for each collection along with field-specific documentation and a record of all decisions that were made.


Timelines

Metadata remediation efforts tend to be time consuming, so it is important to allocate sufficient time for the project. The time required will depend on the scope of the efforts, but 8-12 months is a good starting point. This should allow enough time for decisions to be made prior to the implementation of the remediation; otherwise, changing decisions may require multiple remediation attempts. The process may be much faster if there are sufficient metadata guidelines, standards, workflows, and how-to documentation. If these resources are lacking they can be generated as part of the remediation effort, which will reduce the time required to complete future efforts.


Templates

The following templates have been adapted from the Whitman College metadata remediation project. For more information on their process and examples of specific spreadsheets and documents they used please see the case study.

  1. IMI_template_use: This document can be copied and associated with each field or cluster of related fields to be remediated. It allows field usage to be compared across collections.
  2. IMI_template card: This document can be copied and associated with each field or cluster of related fields to be remediated. This document can be copied and associated with each field or cluster of related fields to be remediated. After evaluating a field, this document can be used to describe what it should look like moving forward.
  3. Repository Fields Template: This spreadsheet keeps track of which fields are used in which collection. If no collection uses a field, that field may be a candidate to be removed upon discussion. 
  4. Field Maintenance Template: This spreadsheet can be used to track required maintenance for fields.
  5. Field Card Information Template: This spreadsheet can be used to document decisions that were made for future reference.