Summary

Linked Data for Production: Pathway to Implementation (LD4P Phase 2) builds upon the foundational work of Linked Data for Production (LD4P) Phase 1 and Linked Data for Libraries Labs (LD4L Labs), to begin the implementation phase of the cataloging community’s shift to linked data for the creation and manipulation of their metadata.

A collaborative project among four institutions (Cornell, Harvard, Stanford, and the University of Iowa) and the Program for Cooperative Cataloging (PCC), this phase of LD4P will have seven goals:

the creation of a continuously fed pool of linked data expressed in BIBFRAME from a core group of academic libraries
development of a cloud-based sandbox editing environment in support of an expanded cohort of libraries to create and reuse linked data
the development of policies, techniques and workflows for the automated enhancement of MARC data with identifiers to make its conversion to linked data as clean as possible
the development of policies, techniques, and workflows for the creation and reuse of linked data and its supporting identifiers as libraries’ core metadata
better integration of library metadata and identifiers with the Web through collaboration with Wikidata
the enhancement of a widely-adopted library discovery environment (Blacklight) with linked-data based discovery techniques
the orchestration of continued community collaboration through the development of an organizational framework called LD4, ensuring continued exchange of ideas and techniques across a distributed developing community.

These seven goals will be realized through six work packages

the collaborative creation of new metadata in a cloud environment in partnership with the PCC
the development of techniques for the reuse of pre-existing metadata
linkage to external authorities and web context (e.g. Wikipedia)
enhancing discovery
production workflows for native linked data descriptions
transferring our accumulated experience with this technology to a community of collaborators

Collaboration will be key in this phase of LD4P, both internal and external. The partners will be collaborating on the development of the cloud environment and Blacklight, the Library of Congress, and the Program for Cooperative Cataloging will collaborate with the project through training in the use of the BIBFRAME Editor, Harvard will foster a close relationship with the PCC in the development of policy decisions, and the core institutions will collaborate with Wikidata in the publishing, linking, and enriching linked data through the Wikimedian-In-Residence program.

Background/Context and Rationale

LD4P (Phases 1 and 2) is focused on the transition of basic Technical Services workflows from their current infrastructure built upon MARC to linked open data and the Web. The importance of the transition away from MARC to linked data can be elucidated by two questions.

First, why not continue with MARC? MARC was a revolution in its day. It allowed data from library card catalogs to be encoded in machine readable form, enabling the catalog cards to be reproducible on the computer screen and the data to be exchanged freely among libraries. It is a fifty-year-old technology, however, originally designed for magnetic tape-based computers, and now only understood by library systems. In addition, the MARC formats are semantically inexpressive and have isolated libraries from the development of the Web.

And second, why linked data? It is apparent that library patrons have preferred searching for information on the Web for quite some time. By integrating library data into the Web in a semantic way, our patrons can find well-formed library data there as well as in library catalogs. By taking advantage of the semantic web, library patrons can directly benefit from other important data sources on the Web. A third advantage is that the Web is an international environment. By shifting to linked data, libraries worldwide can take advantage of the bibliographic and authoritative data many national libraries create and make available now as linked data. And last, the Web is a continually evolving environment. Without a doubt, linked data will evolve into some other standard with time. But in order to move along with this evolution, libraries will need to make that first important step in the transition to a Web environment.

Current Work: Linked Data for Production Phase 1 and LD4L Labs

In Linked Data for Production Phase 1, the partners proposed the development of a communal work environment based in linked data; the strengthening and expansion of the BIBFRAME ontology to cover the multiple formats (e.g., books, music, maps, etc.) that libraries must catalog; the tools needed to perform the work itself; and the development of lightweight workflows (Tracer Bullets) to prove that the transition to linked data was both possible and practical. In the LD4L-Labs work, the partners piloted the development of an editing tool to support cataloging using BIBFRAME and variations, together with selected extension ontologies; the integration of linked data authority lookup and management into cataloging; and the use of linked data for discovery and visualization.

Communal Work Environment:The partners were fortunate for the development of a communal work environment in support of the project called SHARE-VDE, or the SHARE-Virtual Discovery Environment (http://share-vde.org/sharevde/). The achievement of the SHARE-Virtual Discovery Environment project is based on the partnership between Casalini Libri and a software development company called @Cult along with the cooperation of academic library partners such as the University of California Berkeley, Stanford, and Duke. The environment includes a semantically enhanced MARC to BIBFRAME converter allowing not only for the simple conversion of MARC fields to BIBFRAME but the extraction of additional free text data from MARC fields (such as role) and its conversion to linkable identifiers. It also includes an advanced reconciliation system that goes beyond the use of simple text-string matching to equate two entities, by gathering multiple points of information about an entity (such as birth date, co-editors, institutional affiliation, etc.) so that matches can be made with confidence even if the text strings are not identical. The availability of this environment will promote the reuse of metadata in the second phase of Linked Data for Production.

BIBFRAME Ontology:The Library of Congress (LC) has been very open to working with LD4P in the refinement and expansion of the BIBFRAME ontology over the past year. The partners proposed a number of changes to BIBFRAME, some of which have been incorporated in the current version, and some that may be adopted in future versions. In addition, the partners have made extensions to BIBFRAME in the areas of performed music, rare books, art, and cartographic materials. These extensions to BIBFRAME willallow libraries to be able to catalog all materials passing through their traditional workflows.

Tooling:Tooling will be key to any linked data transition, and one of the most important tools will be an “editor” to both create and edit data. LD4P and LD4L Labs are currently experimenting with three linked-data editors. The first is the LC BIBFRAME editor. This editor is receiving a thorough shake-down as LC prepares to train over seventy staff to use it in linked data creation. The second is a new editor being developed at the Biomedical Informatics Research Lab at Stanford calledCEDAR.

Last, during the LD4L-Labs grant, the partners began exploring how to extend and customize Vitro (the platform behind VIVO, seehttps://github.com/vivo-project/Vitro) to serve as a cataloging editor (called VitroLib). VitroLib can support editing using BIBFRAME as well as extension and variation ontologies. While the partners do not expect to directly use the VitroLib editing environment in LD4P Phase 2, the user feedback from catalogers in the use of all three editors in such areas as UI development, transition to a recordless environment, and the use of multiple ontologies will significantly inform the creation of the planned communal work environment.

Workflows:In LD4P Phase 1, Stanford focused on the conversion of four key workflows to a linked data strategy: two related to the traditional ILS (copy cataloging and original cataloging) and two to the digital repository (deposit of a single item through self-deposit and the deposit of a collection of items through bulk loading).The ILS workflows have been established in a lightweight fashion and tested, and we are now ready to expand them both in depth and in participants for the next phase of LD4P. The expansion of these workflows will be the next critical development as they cover the predominate resources libraries must handle in their day-to-day production.

Authorities:Current cataloging practice uses authority strings (e.g., “Twain, Mark, 1835-1910”) that include disambiguating properties (e.g, birth and death dates) as well as the name. The shift to linked data, and the use of unique authority URIs (e.g.,http://id.loc.gov/authorities/names/n82045653), has many benefits but requires additional tooling for lookup and matching across systems. The LD4L-Labs partners created Questioning Authority, a lookup system that integrates with VitroLib and other systems such as VIAFto support authority lookup in multiple authority resources, with the ability to add new authority sources to the system.

Rationale

The focus of the second phase of LD4P is implementation. Building upon the expertise, structure, and workflows developed during the first phase of LD4P and LD4L-Labs, the four partners (Cornell, Harvard, Stanford, and the University of Iowa) will implement a prototype environment, from metadata acquisition/creation through to discovery. An important enhancement in this phase will be collaborating with the Program for Cooperative Cataloging (PCC) and the Library of Congress to expand the number of libraries moving to implementation of linked data. Sub-grants for committed libraries will help them defray transition costs.

The choice of working with the PCC was deliberate. Within the United States, we work within the concept of a virtual, distributed “national library” for the creation of high-level metadata. The PCC provides the community with a forum for the development of policy and training programs for member libraries. The full buy-in of the PCC, along with their ability to provide training and support, will be key to expanding the transition to linked data from the core libraries within LD4P to the broader academic library community.

Discovery will also be a key development in LD4P Phase 2. By focusing on linked-data enhancements to current discovery systems such as Blacklight, LD4P hopes to take immediate advantageof linked data for library patrons through such developments as the addition of knowledge panels, authority-based browse, and semantic search (see Work Package 4: Discovery).

Page tree

LD4P2 Grant Proposal

Summary

Current Work: Linked Data for Production Phase 1 and LD4L Labs