Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For examples of integration of these services into other elements of the project, please see Architecture for Authority Lookup.

Authorities

Our deployment process became regularized to the extent that a number of authority sources were included:

...

All of these services, including versions supporting human interaction with the results, are available at http://services.ld4l.org/ld4l_services/index.jsp. Direct human exploration of the various triplestores using SPARQL is available at http://services.ld4l.org/fuseki/.

Request Parameterization

To simplify both the creation of new services and the understanding by developers of applications consuming these services, we standardized the parameters accepted by the various services as much as possible:

...

Hence the following query - http://services.ld4l.org/ld4l_services/getty_batch.jsp?query=Picasso&maxRecords=10&entity=Person - will return the triples relevant to 10 entities of class Person (i.e., from the Getty ULAN authority) where the word Picasso appears. Note that the actual number of triples return can vary widely due to differences in coverage between entities, even within a single authority source.

Technology Stack

The overall architecture was implemented entirely with open source tools:

  • Apache HTTPD - the standard v. 2.4 web server deployed with macOS
  • ld4l_services - this is a Java Server Pages (JSP) application (available at https://github.com/eichmann/ld4l_services) heavily reliant on two JSP tag libraries:
  • Apache Tomcat application container - we specifically are using version 9.0.0.M9, although pretty much any version of Tomcat would work, as we're not using an particular features of this version.
  • Apache Jena Fuseki - the SPARQL endpoint, version 2.4.0
  • Java SE Runtime Environment - version 1.8.0

Processing Flow

  • a request arriving at services.ld4l.org is routed to one of two redundant application servers (see the server configuration discussion below)
  • the relevant JSP page runs a Lucene query, receiving back a set of entity URIs specific to the particular authority
  • for each entity URI, the JSP page constructs a SPARQL query and submits it to Fuseki (using the virtual host name to allow load balancing)
  • Fuseki executes the SPARQL query and returns a set of RDF triples
  • the JSP page returns the triples to the requesting site, injecting synthesized triples representing rank into the result corresponding to the entity URI's position in the Lucene search results

Server Configuration

  • Mac Pro (late 2013), 3 GHz, 8 cores, 64 GB memory, macOS High Sierra (v. 10.13.6)
  • Promise Pegasus2 disk array, 8x4tb RAID5, Thunderbolt2 connection to the Mac Pro

...

    CustomLog "/private/var/log/apache2/ld4l-access_log" combined
</VirtualHost>

A Complete List of GitHub Repositories Related to the Project

...