This Confluence wiki site, maintained by DuraSpace prior to the recent merger with LYRASIS, will transition from the duraspace.org domain to the lyrasis.org domain on Saturday, Nov 16 beginning at approximately 7pm ET. A period of downtime of 2-3 hours is expected. After the transition, this wiki will be available at https://wiki.lyrasis.org/. All links to duraspace.org wiki pages will be redirected to the correct lyrasis.org URL. If you have questions prior to or following the transition please contact: wikihelp@lyrasis.org.
Page tree
Skip to end of metadata
Go to start of metadata

This page will be used to collect materials for an environmental scan of literature related to software upgrades and migrations as well as planned or recommended Fedora 3.x - Fedora 4.x upgrade projects.

Sources

Summary

Repository upgrades and migrations are quite common, and the literature covers several important aspects of this process: motivations for undertaking a migration, the difficulty of migrations, the possible benefits of a migration, and advice for those looking to undertake a migration in the future.

A common motivation for repository migrations is the cost of a commercially licensed product. Gilbert and Mobley were facing an increased cost to their CONTENTdm license due to reaching the item limit of their current tier, and Stein and Thompson cited license and maintenance fees as one of the main drivers of repository migrations based on survey data. Issues with the commercial platform itself, from performance and scale limitations (Neatrour et al., Witkowski et al.) to a lack of flexibility with regard to file and metadata formats (Gilbert and Mobley, Wu et al.), were also key motivators. Finally, better support for digital preservation (Stein and Thompson, Berghaus et al., Fallaw et al.) and linked data (Wu et al., Stein and Thompson) rounded out the top motivators in the literature.

There are many factors that make migrations difficult, but there is one primary problem category throughout the literature: metadata. Van Tuyl et al. cite metadata remediation as the biggest time sink during their migration project, and many others (Bridge2Hyku Team, Gilbert and Mobley, Neatrour et al.) present case studies that involve significant time spent on metadata normalization, de-deduplication, and remediation. This speaks to a related difficulty often cited in the literature: inconsistent or “messy” source data. The process of mapping metadata from one repository system to another would be much simpler were it not for the fact that many legacy systems tend to have metadata quality problems in the form of custom local fields, duplicate fields, and misspelled entries.

There is a great deal of migration advice to be found in the literature, based primarily on lessons learned from migration projects. Tripp summarizes much of this advice into four categories: planning, metadata normalization, migration, and verification. Each of these categories is represented in the rest of the literature; Nowak et al. undertook a great deal of planning for their migration project, while Simic and Seymore invested a lot of time in large scale metadata normalization prior to migration. The migration phase itself was often accomplished with a combination of scripts and manual intervention, and the same is true of the verification step.

Common Themes

  • Motivations for migration
    • Commercial license costs
    • Lack of flexibility
    • Staff investment vs. licensing fees
    • Performance and scale issues
    • Better integration with other applications/services
    • Support for linked data
    • Support for digital preservation
  • Migration difficulty
    • Custom metadata fields
    • Inconsistent data
    • Different data models
    • Metadata mapping: e.g. MODS XML to RDF
    • OSS documentation is not always complete/accurate
  • Migration benefits
    • Metadata improvement/enrichment
    • Skills development
    • Streamlined workflows
    • Enhanced discovery via metadata enrichment
  • Migration advice
    • Importance of communication
      • Engaging with stakeholders, collecting feedback, reporting on progress
    • Working with a representative sample
    • Requirements, scope
    • Normalize metadata before migration
      • But carefully scope this effort
    • Iterate, spot check
    • Agile methodologies
    • Contingency planning: staff turnover, learning curve, no single points of failure
    • Need for clear roles and responsibilities
  • Repository requirements
    • Flexible object types and metadata
    • Batch ingest (e.g. from a spreadsheet)
    • Large community
    • Modularity
  • Status of Fedora
    • Most still using Fedora 3
    • Plans to migrate but few timelines
    • Samvera/Fedora has major performance issues
    • Is the value of Fedora worth the complexity it introduces in the Samvera stack?
  • Tools
    • Several examples of tools developed to aid/automate migration activities
  • No labels