Stanford is proposing two projects

conversion of four key technical services workflows to linked data-based workflows (called Tracer Bullets)

According to the Agile Dictionary, a tracer bullet is “a set of work where interfaces are developed from beginning to end of a process. These interfaces may be very simplified or may just pass through. The purpose of the tracer bullet is to examine how an end-to- end process will work and examine feasibility.” Key to any successful transition of technical services functions to linked data will be the transition of its traditional workflows as these workflows account for the dominant part of a department’s throughput. Stanford will be applying the tracer bullet principles to its fundamental workflows.

development of a performed music ontology

Tracer Bullets

Overview

As part of Stanford’s participation in the first Linked Data for Libraries project, much effort was put into the conversion of its MARC data to linked data, in this case, BIBFRAME. The predominance of a library’s metadata is captured in MARC and its conversion to linked data will be key to any successful shift in technology. This conversion process revealed a number of problem areas, however. MARC was intended not only to capture metadata about a resource, but also metadata about the metadata (for instance, when the metadata was created). These different types of metadata are difficult to sort out in the conversion to linked data. Likewise, many relationships, such as the link between a subject heading and a specific work, or the link between a performer and the particular work they performed, is not captured in MARC. It is assumed that the person viewing the data at a computer screen will make those connections. Because of this lack of internal connection within a MARC record, these relationships, and many others, will not be expressed in linked data after its conversion. All of these must be added by hand post conversion. It became clear, in order to avoid this extra labor, libraries would need to begin creating their metadata as linked data directly. This realization drove Stanford to choose the conversion of its workflows as its choice for an LD4P Institutional project. Although Stanford’s production workflows will be unique to that institution, the elements in the production chain can be generalized and shared with other members of LD4P. In addition, agreements for support with common vendors such as Casalini Libri or Backstage will support all.

The individual elements in this new production workflow are tantalizingly familiar. Stanford has had a many-year history with the conversion of MARC data, both bibliographic and authority, to linked

data. They have had experience with a variety of triple-stores and the ingest of data. They have in-depth knowledge of RDF and extensive experience with identifiers. They have evaluated the Library of Congress’ linked-data production tools. And the Library of Congress tools are sophisticated enough at this point that LC will be training 40 catalogers to use them at that institution. But even given the experience with the elements of this new workflow, concerted effort will be needed to bind them together into a workflow that will function in Stanford’s environment.

Building upon the roadmap developed by BIBFLOW and its analysis of needs in a linked data environment,Stanford will focus these requirements on four key production pathways. Each pathway will be examined, from acquisition to discovery, as a tracer bullet. All key elements in those workflows will be converted to a process rooted in linked data but in a basic way. Emphasis will be on the completeness of the pathway. The workflows themselves will be expanded in future to account for additional complexities once the initial pathway has been established.

Stanford will be creating a parallel LOD processing stream to accomplish these goals. Resources flowing through these pathways first will be processed in the traditional way with traditional MARC or MODS metadata. This traditional metadata will be used for discovery purposes in the university’s discovery environment and for contributions to cooperative cataloging programs such as the Program for Cooperative Cataloging. A parallel, linked-data workflow will be created, however, for LD4P and duplicative metadata created. This metadata will feed into a parallel discovery environment as well so that we can mimic the entire processing stream. This parallel metadata can also be sent to various library vendors and programs so that they can begin to adjust their businesses to incorporate linked data. Although this proposed solution will require duplicative effort, it will allow Stanford to experiment with an alternative pathway without being dependent on the results for discovery. It also has the benefit of testing the new pathway with actual library resources and staff so that a true measure of effort and cost to implement the new paradigm can be evaluated.

In preparing for this grant proposal, Stanford held a series of meetings involving its Acquisitions Department, Metadata Department and Digital Library Systems and Services (DLSS). Key to the analysis was the mapping of its MARC workflows, MODS workflows, and Original linked data workflows (including data from vendors, self-created linked data using LC’s BIBFRAME Editor, etc.). The elements of Stanford’s workflows and how they interact with these data types, data stores (Symphony (ILS), Digital Repository (SDR3)), and the triple store (Store Cache) were carefully mapped out. An analysis was done of what tools and environmental changes were needed in order to allow the individual workflows to proceed as a tracer bullet and the amount of effort needed to finish the work.

The resultant analysis led Stanford to choose four key workflows for conversion to linked data, two for traditional materials and two for digital: copy cataloging through the Acquisitions Department, original cataloging, deposit of a single item into the Digital Repository, and the deposit of a collection of resources into the Digital Repository. The tracer bullet will follow the life cycle of a resource, from its acquisition to discovery. Each process along the way will be converted to a linked data strategy. These processes simply need to be good enough to support an experimental workflow. Requirements for a full production workflow will be gathered iteratively and passed on through regular meetings to LD4L-Labs for development. The development of this skeletal architecture, however, will still demand a sizeable amount of effort. Through workflow analysis, we have determined eight key areas for initial development: implementation and enhancement of the LC MARC2BIBFRAME converter, functional installation of the LC BIBFRAME Editor, development of storage and caching mechanisms, development of a BIBFRAME bridge to the ILS, a BIBFRAME to Solr34 mapping for discovery in Blacklight35, the publishing of the linked data output to the web, integration of the BIBFRAME Editor to the Digital Repository, and development of the systems architecture to answer such questions as database of record for resource metadata.

Objectives

Pathway 1 - Traditional Vendor-supplied Cataloging

Currently, 80% of Stanford’s monographs come with some form of MARC copy. The material is received by lower level paraprofessional staff in Acquisitions and cataloged on receipt. Because of the large volume of materials, throughput must remain high. As the transition of the entire library ecosystem to linked data will be slow, having at least one flow that begins with MARC data will be inevitable. And because of its volume, the transition of this copy- cataloging workflow will be essential. The effort needed to convert this workflow has been analyzed. Of equal importance to the conversion of the workflow itself, will be the broader questions that the Project Co-Manager can explore based on data gathered as resources are processed. The answers to these questions are not necessary for the successful completion of this phase of LD4P, but will prove invaluable for the planning of the next iteration of LD4P after these first two years are completed. For example:

what is the best configuration of the MARC to BIBFRAME converter for this level of staff?
is it possible to automate the creation of identifiers for controlled access points?
what reconciliation processes are available during the automated process?
what linked-data elements are most desirable post-conversion for highly functioning/integrable linked data
how much additional cost does this workflow require?

Pathway 2 - Traditional Original Cataloging

Stanford produces original cataloging in all formats (books, serials, sound recordings, etc.) according to national standards for its own internal needs and to share with the broader library community. Although needing to make use of many of the same tools as Pathway 1 (e.g., a BIBFRAME Editor), as professional staff, the approach will be much different. These staff will be creating new linked data directly, not converting MARC data and enhancing it. As in Pathway 1, of equal importance to the conversion of the workflow itself, will be the broader questions that the Project Co-Manager can explore based on data gathered as resources are processed. For example:

what is the best configuration of the BIBFRAME editor for the creation of new data
how best to create new authorities and identifiers to support new controlled data
how best to integrate these identifiers with the local identity management
how best to expand the BIBFRAME ontology to cover multiple formats

Pathway 3 - Self-Deposit to the Digital Repository

Another major flow of metadata is for born digital objects deposited into Stanford’s digital repository and from there into the discovery environment. This metadata currently is stored and maintained in the MODS format. Pathway 3 explores the self-deposit of a single digital resource into the digital repository. Issues to be explored:

conversion of MODS metadata to BIBFRAME
automated assignment of identifiers for controlled headings
storage of linked data within the digital repository

Pathway 4 - Ingestion of a Collection into the Digital Repository

Stanford’s digital repository also hosts an increasing number of large collections of digital objects. Metadata is often received in the form of a spreadsheet and is converted for deposit and typically enriched or remediated afterwards. All processes must be automated as the collections are large. Issues to be explored:

conversion of a large collection of metadata to linked data
automated remediation of the metadata either before or after processing

The Tracer Bullet projects will make use of common tools and environments as they proceed. A number of these tools, such as the BIBFRAME Editor, have already been developed and tested by the Library of Congress and will be used by them in their LD4P projects. Other elements such as the triple store or MARC conversion flow have been developed by LD4L and are already in place. The Tracer Bullet Technologist will be responsible for integrating these tools into a single flow within the local Stanford environment. As the tools are used, optional enhancements will be uncovered that would improve their functionality outside of the simple, tracer-bullet workflows.

Tracer Bullets: According to the Agile Dictionary, a tracer bullet is “a set of work where interfaces are developed from beginning to end of a process. These interfaces may be very simplified or may just pass through. The purpose of the tracer bullet is to examine how an end-to- end process will work and examine feasibility.” Key to any successful transition of technical services functions to linked data will be the transition of its traditional workflows as these workflows account for the dominant part of a department’s throughput. Stanford will be applying the tracer bullet principles to its fundamental workflows.

Performed Music Ontology (PMO)

The performed music ontology project is divided into four phases: use case development, MARC conversion remediation, ontology modelling, and profile development in association with the Program for Cooperative Cataloging (PCC). BIBFRAME development can be conceived of as a core ontology with additional domain-specific ontologies added for special communities. The library community maintains itself by creating metadata to very specific standards that can be exchanged with little or no additional effort. The danger in the BIBFRAME development model is that conflicting ontologies may be developed for the same domain and data exchange hindered. The PMO hopes to establish a model by which these extensions can be created and maintained by the library community without conflict. Key to this will be the involvement of the major domain stakeholders and the PCC to officially endorse the new ontology for all of its members. In this particular domain extension, the key domain stakeholders are the Music Library Association (MLA)36 and the Association of Recorded Sound Collections (ARSC).37 Both of these associations, and the PCC, have been contacted and are eager to participate. The work will be accomplished by a committee of five members drawn from these institutions, the Library of Congress, and Stanford. As refinements to the PMO are established, they will be fed directly to both Cornell and the Library of Congress, supporting their efforts in the transition of their music cataloging workflows. Most work will be conducted electronically but two face-to-face meetings are scheduled.

Space shortcuts

Page tree

Tracer Bullets

Overview

Objectives

Pathway 1 - Traditional Vendor-supplied Cataloging

Pathway 2 - Traditional Original Cataloging

Pathway 3 - Self-Deposit to the Digital Repository

Pathway 4 - Ingestion of a Collection into the Digital Repository

Space shortcuts

Page tree

Stanford Project Proposal

Tracer Bullets

Overview

Objectives

Pathway 1 - Traditional Vendor-supplied Cataloging

Pathway 2 - Traditional Original Cataloging

Pathway 3 - Self-Deposit to the Digital Repository

Pathway 4 - Ingestion of a Collection into the Digital Repository