Stanford’s linked data production project focuses on technical services workflows. For each of four key production pathways we will examine each step in the workflow, from acquisition to discovery, to determine how best to transition to a linked data production environment. Our emphasis is on following each workflow from start to finish to show an end-to-end linked data production process, and to highlight areas for future work. The four pathways are: copy cataloging through the Acquisitions Department, original cataloging, deposit of a single item into the Stanford Digital Repository, and deposit of a collection of resources into the Stanford Digital Repository.
Overview Presentations
- Overview of Tracer Bullet project, with details on Tracer Bullet 1: Conversion of Vendor-Supplied MARC Records (PowerPoint)
- Screenshots of Tools used in Tracer Bullets 2-4: local instance of Library of Congress BIBFRAME Editor, Blazegraph triple store (PowerPoint)
Tracer Bullet 1: Conversion of vendor-supplied MARC records to BIBFRAME
Existing MARC workflows
Overview of existing MARC workflows
Firm orders
Approval orders
Receiving
Automatic record enhancements
MARC-to-BIBFRAME conversion workflows
Pilot workflow: parallel processing in MARC and BIBFRAME, MARC record primary
Implementation workflow: operational record in MARC, discovery data in BIBFRAME, BIBFRAME data primary
Conversion workflows by process
Dataflow for conversion workflow
Reactive Pipeline for Larger Scale Conversion of MARC Records
- Reactive pipeline for converting MARC (github repository)
Supporting Material
- User stories for converting MARC records to BIBFRAME (pdf)
- Plans for a Minimum MARC Bibliographic Record & Attachments (pdf)
- BF to Solr mapping (Google doc)
- SPARQL queries for Solr mapping (pdf)
Tracer Bullet 2: Original Cataloging in BIBFRAME
See overview presentations above for screenshots
- Workflow for original cataloging
Tracer Bullet 3: Deposit item to digital repository with RDF metadata; deposit by item creator
See overview presentations above for screenshots
Existing deposit workflows
Metadata created in self-deposit interface (non-SUL users)
This diagram represents metadata only (not file management or other admin tasks) for objects described entirely through the self-deposit tool Hydrus. The boxes under "Self-deposit interface" represent the metadata-related tasks a user can perform through that interface. The leftmost column of boxes are metadata tasks contained within the self-deposit tool. The right column of boxes involves writing data to DOR. Except where otherwise specified, these tasks apply to description of both collections and items. Currently this model is more commonly used for deposits originating from non-SUL users.
Metadata created in Symphony/MARC (SUL users)
This diagram represents metadata only (not file management or other admin tasks) for objects that are accessioned through the self-deposit tool Hydrus by SUL staff and cataloged by the MDU in MARC. The boxes under "Self-deposit interface" represent the metadata-related tasks an internal user performs through that interface in this workflow. Currently this model is commonly used for digital files received by Acquisitions or acquired by curators. The self-deposit tool serves as a convenient way to get a file into the SDR, notify MDU staff that cataloging is needed, and pass information such as the catkey and purl for the object to MDU. New items are deposited to existing collections that have been set up with the appropriate rights, permissions, etc.
RDF-based Workflow
The tracer bullet focuses on the metadata flow rather than the file management portion of this scenario, as the upcoming adoption of Hyku will have a significant impact. For the purposes of the tracer bullet, we are working with digital objects already deposited into the SDR and described via Hydrus. The "depositor" (actually metadata staff) will describe the objects in CEDAR based on the records in SearchWorks, but will reformulate the metadata in CEDAR independently rather than doing a simple mapping from MODS. One possible step is to generate an operational MODS record for SDR use from the RDF description, but for the purposes of the tracer bullet this operational record will not actually be written to the repository.
Tracer Bullet 4: Deposit set of items to digital repository with RDF metadata
See overview presentations above for screenshots
Tracer Bullet 2 | Original cataloging in BIBFRAME | flowchart(s) |
Tracer Bullet 3 tool cedar to triple store CEDAR templates | Deposit item to digital repository with RDF metadata; deposit by item creator | 3 figures |
Tracer Bullet 4 | Deposit set of items to digital repository with RDF metadata | diagram sample data and mappings? |
Modeling authority data
- Tool recommendations for conversion and original creation of linked data
- Best practices for pre- and post-conversion enhancements
- Linked data descriptions of set of Stanford library and digital repository resources
Project Co-Managers
- Philip Schreur, Assistant University Librarian for Technical and Access Services
- Tom Cramer, Assistant University Librarian, Chief Technology Strategist, and Director of Digital Library Systems and Services
Acquisitions Department
- Alexis Manheim, Head of Aquisitions Department
- Linh Chang, Receiving and Access Librarian
Metadata Department
- Nancy Lorimer, Head of Metadata Department
- Joanna Dyla, Head of Medata Development Unit
- Vitus Tang, Head of Data Control and E-resources Unit
- Arcadia Falcone, Metadata Coordinator
Digital Library Systems and Services
- Darsi Rueda, Head of Library Systems Department
- Naomi Dushay, Digital Library Software Engineer
- Joshua Greben, Library Systems Programmer / Analyst
- Darren Weber, Digital Library Software Engineer
Analysis/Modeling
- Mapped Stanford's vendor-supplied copy cataloging and original cataloging workflows
- Mapped workflow for converting vendor-supplied records to linked data
- Generated requirements for work-based discovery environment, to take advantage of RDF
- Evaluated BIBFRAME profiles for original cataloging
Linked Data Creation
- Worked with vendor on improvements to supplied MARC data to enhance conversion to BIBFRAME
- Tracer Bullet 1: Converted set of 38,000 MARC records from Symphony to BIBFRAME using Library of Congress converter, loaded to Blazegraph triplestore, and indexed to Blacklight Solr environment via automated scripts
- Tracer Bullet 2: Created original descriptions of 50 items with local instance of BIBFRAME 2.0 Editor
- Tracer Bullet 3: Created original descriptions of about 30 digital assets using CEDAR RDF editor
- Tracer Bullet 4:
- Piloted automated pipeline approach for conversion of MARC records to BIBFRAME, loading to triplestore, and indexing to Solr
Discovery Environment Creation
- Created Blacklight/Solr instance-based discovery environment with source data a mix of linked data and MARC data
- Developed a mapping from BIBFRAME 2.0 to Solr document for book materials
- Developed a mapping from RDF to Solr for digital assets
Tool Exploration / Requirements Definition
- Gathered requirements for conversion and editing tools
- Set up Registry of Tools
- Evaluated CEDAR template creation and metadata editing tool
- Developed a validation suite for MARC-to-RDF converters
- Created local instance of Library of Congress BIBFRAME 2.0 Editor