Samvera (aka Hydra) Community Linked Data Support

NOTE: The Hydra Community recently rebranded as Samvera due to trademark issues with the name Hydra.

Technical Outputs

Presentations

Table of Contents

Overview

Efforts in this area are targeting the Samvera development community to increase the ease of adoption of linked data. The tools being created by this work integrate with the Samvera stack (ruby based app). Several tools are standalone with an access API that allows them to be used in other development stacks which has been demonstrated with VitroLib (java based app) and the Hip Hope Collection visualizations (javascript based app).

The following diagram shows the architectural overview of the relationship between the Authorities (e.g. Library of Congress, OCLC Fast, dbPedia, etc.), the Questioning Authority (QA) normalization layer, and the display of content for selection in an application's UI.

Authorities

Direct access to authorities is subject to outages and service delays that are outside the control of the library. To address stability of authorities, we are exploring several backend caching schemes. These include work at Iowa University into caching authority data in a Jena-Fuseki Triplestore with a Lucene search layer for querying. Harvard has explored a http://linkeddatafragments.org linked data fragments server implementation. Cornell investigated the Samvera community's Active-Triples linked data fragments implementation which uses Marmotta or Blazegraph as the backing triplestore.

We are moving forward with extending the Jena-Fuseki-Lucene approach for caching external authority data. This approach will serve as the back-end for VitroLib installations for cataloging experimentation and Hyrax repository applications. The QA and UI components being developed will continue to work with the other back-end approaches, but we will not be exploring them further at this time.

Questioning Authority (QA) normalization layer

The original Questioning Authority (QA) code used non-linked data APIs to access authorities. This required coding of a one-off process for each authority to access the API and interpret the proprietary results format. We extended the QA code to include a single process for accessing linked data based APIs and interpreting the linked data results. There are still differences in the access APIs and results can be returned in a number of ontologies. These differences are encoded in a configuration for each authority which includes the access URLs and identification of the predicates serving particular roles in the UI. The results are normalized into a JSON structure allowing front-end UIs to process all data the same.

Application UI

The UI experimentation looks at how to get linked data into an application and how to leverage linked data once it is in.

Getting Linked Data

The primary challenge that we are addressing for getting linked data into an application is through a selection process from an authority. The methods of selection we have explored are:

Autocomplete - The user begins typing in a form field. As they type, a list drops below the box showing potential matches. The user can select from the list. The list is driven by data accessed from an authority and normalized by QA for use in an autocomplete list. These work has been successfully demonstrated in a Hyrax application and in VitroLib.
Lookup with Context - The user chooses to do a lookup for a form field. The user types a search query that accesses an authority through a normalization process in QA. QA returns matching terms and values for predetermined context fields for the matching terms. The user is then able to select one or more terms using the additional context to aid in their decision making process. The additional context is expected to enable users to make more accurate selections thus improving overall data quality. This work is just beginning. See mockups for current design.

In both cases, the URI for the selected term is stored by the application. In some cases, the label of the selected term is also stored. For Hyrax applications, the URI is stored with the repository object and the label is stored in the solr index to allow for search results matching in the application.

Leverage Linked Data

Linked data can be leveraged in several ways.

Page tree