Page tree
Skip to end of metadata
Go to start of metadata

These files were produced during the LD4L project. They are all available on the server for http://draft.ld4l.org/downloads/.

RDF Files

Converter output

The MARC records from each library were converted to BIBFRAME 1.0 RDF by the Library of Congress mar2bibframe converter. LD4L's bib2lod converter was then used to produce RDF in the LD4L data model. The result is RDF in the N-Triples format.

These dumps are available:

Usage data

StackScore usage data is available for the Cornell and Harvard holdings. The scores appear as annotations on the individual bib_ids. Each file contains the usage data for the corresponding, similarly named file of converter output. Data is in N-Triples format.

These data files are available:

Additional triples

Additional triples were created to supplement the converter output, adding Work IDs to the Works, and creating links across institutions, between corresponding Works and Instances.

A concordance file was created, associating all known OCLC numbers with their corresponding Work IDs. This file was made with data extracted from a recent Research snapshot of WorldCat, and is structured as follows:

  • Column 1: every OCLC number found in a record from both 001 and 019
  • Column 2: the current OCLC number for the record, from 001
  • Column 3: the current Work ID associated with the record

Fields are tab-delimited. For example:

100000569	100000569	49300684
100000668	100000668	83546218
100000767	100000767	83546282

Using this concordance file, each work was assigned a Work ID, based on the OCLC number of its instances. For example:

<http://draft.ld4l.org/cornell/n556b336629626fa2> 
    <http://www.w3.org/2000/01/rdf-schema#seeAlso> 
        <http://worldcat.org/entity/work/id/57063107> .

Although the data from the three institutions were stored in three separate triple-stores, owl:sameAs statements were created where possible to link matching works or matching instances in the separate collections.

Instances with matching OCLC identifiers were linked with owl:sameAs, as were Works with matching Work IDs.

These files are available:

Linked data blobs

The linked data at draft.ld4l.org is served by a Sinatra application, reading from a MySQL database. The database looks like this:

mysql> use ld4l;
Database changed

mysql> show tables;
+----------------+
| Tables_in_ld4l |
+----------------+
| lod            |
+----------------+

mysql> describe lod;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| uri   | varchar(200) | NO   | PRI | NULL    |       |
| rdf   | mediumblob   | NO   |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+

 

'uri' corresponds to the uri of the requested linked data:

mysql> select uri from lod where uri like 'http%' limit 5;
+---------------------------------------------------------------------+
| uri                                                                 |
+---------------------------------------------------------------------+
| http://draft.ld4l.org/                                              |
| http://draft.ld4l.org/cornell                                       |
| http://draft.ld4l.org/cornell/n000000d1-1ab5-4fc0-a33f-2fb4204eddb4 |
| http://draft.ld4l.org/cornell/n000000e4-e1cd-4fbb-872c-01456a8a8396 |
| http://draft.ld4l.org/cornell/n00000118-f2b5-4005-b20e-65e09af06a2a |
+---------------------------------------------------------------------+

 

'rdf' is the data that will be served, in Turtle format, zipped. As such, it is not readable until unzipped:

mysql> select substring(rdf, 1, 70) from lod where uri = "http://draft.ld4l.org/";
+------------------------------------------------------------------------+
| substring(rdf, 1, 70)                                                  |
+------------------------------------------------------------------------+
F?&D}??(I?R<p?st?6D?(?P?Dj3t?.?V
?N$!ϥsM?G?? |
+------------------------------------------------------------------------+
 

These dumps are available:

Solr index capture

The application at search.ld4l.org is built on Blacklight. Blacklight is a Rails app that includes a Solr search engine.

The structure of the search index is determined both by the Solr schema and the Blacklight catalog controller script

These dumps are available:

Triple-store captures

The triple-stores used were instances of Virtuoso OpenSource 7 (taken from the develop branch). More specifically, the Virtuoso instances were built from this source:

$ git remote -v
origin	git://github.com/openlink/virtuoso-opensource.git (fetch)
origin	git://github.com/openlink/virtuoso-opensource.git (push)

$ git status
On branch develop/7
Your branch is up-to-date with 'origin/develop/7'.
nothing to commit, working directory clean

$ git log -1
commit ea51ed3b81a43250ed2e3cfa77ee6e0116388b4b
Merge: 74a23e7 8ee2cfe
Author: VOS Maintainer 
Date:   Mon Mar 7 13:44:06 2016 +0100
    Merge branch 'develop/6' into develop/7
 

These dumps capture the data directories of the three triple-stores:

  • No labels