(DRAFT)

This is a guide for LD4P2 Partners (cohort and PCC affiliates included) on how to request a new dataset for QA so corresponding lookups can be added to Sinopia (the LD4P2 supported BIBFRAME editor).

Email Steven Folsom (sf433 @ cornell dot edu) with related questions or comments on how to improve this guide.

Process for Prioritizing Requests (Coming Soon)

Please note, all requests for new data sources in QA will be prioritized by the LD4P2 project. Due to time restrictions there is no guarantee that all requests will be added to QA during the lifetime of the LD4P2 grant; regardless of resources it is still useful to know which datasets the community would find useful in such a lookup service.

Creating a request in the form of a GitHub issue (see Step 5) will allow the request to be prioritized and tracked.

Steps

1.) Make sure the dataset is not already in QA

Please consult the LD4P2 QA Authority Support Plan to confirm the dataset isn't already being supported through QA, or not currently being considered for QA because the dataset is already supported by type ahead searching available in the BIBFRAME editor.

2.) Identify the new dataset

We'll need the name of the dataset and homepage URL.

N.B. You will be asked for this information when making a formal request as a GitHub issue, see Step 5 below.

3.) Gather Data Download and/or API Information

Provide a link to where to download the RDF dataset. A link to API documentation is needed if the authority does not have an acceptable download dump that can be ingested into services.ld4l.org

The requirements for the API are:

URL that can receive a string query AND returns results as linked data
URI (or URL that can receive a URI) that returns linked data about the entity

N.B. You will be asked for this information when making a formal request as a GitHub issue, see Step 5 below.

4.) Decide how contextual information should be used

As you might know by now, QA has the ability to provide contextual information about an entity during the look-up experience. In order to do so, decisions need to be made about how to index the RDF descriptions of entities in the dataset.

a.) Using this spreadsheet, add a tab for the new dataset. For each new tab, please use the following column headers and value guidelines (See the existing LCGT tab in the spreadsheet as an example).

N.B. You will be asked to confirm this has been done when making a formal request as a GitHub issue, see Step 5 below.

EntityType

PropertyPath

Search

Display

RankingNotes

URI for the class of entity in the lookup

URI/s for the property or property path to get to the information to be indexed in QA

Use an 'X' to mark if this data should be used to search against.

N.B. some data is important to display to the cataloger, but perhaps would create messy results if searched against in a lookup environment, e.g. some notes are administrative in nature.

Use an X to mark if the value should be displayed. Include a label for the field. The label may simply be the property name in the property path column or you may decide another term is more appropriate.

If applicable provide notes on whether a particular property path should weigh heavier on the search rankings than others.

5.) Create an issue to formally request the new dataset

a.) From https://github.com/LD4P/qa_server/issues/new/choose, create an issue by clicking on "Get started" for the Request a New Dataset for QA. You will be asked to provide the information gathered in Steps 1-4.

This will prompt the prioritization of the request, and (if prioritized) the initial set up of the dataset in QA. The requester will be then contacted and asked to create accuracy test parameters. No further action is required of the requestor until issue is updated to indicate the process is ready for Step 6.

6.) Add Accuracy test parameters in YAML file

In order to make sure the QA search behavior (recall and relevancy) are meeting expectations, QA uses YAML to define test parameters. These parameters include the ability to declare for a particular text string searched, the results should include a particular resource (identified by a URI) and what is the maximum position in the results the resource should be found.

a.) A link to a YAML file will be shared with the requestor via a comment in the issue in order to complete the Accuracy Test portion of Writing Tests for an Authority.

b.) Edit directly the YAML file in Github, save.

c.) Create a Pull Request to be reviewed. Be sure to including a meaningful commit message (e.g. adding accuracy tests for Authority X).

Page tree

Request a New Dataset For QA

(*DRAFT*)