Services on linked data

LD4L Workshop Breakout Session, Tuesday, February 24

Risk of not knowing what to search for

may be addressed by

discovery endpoints and what they hold
- ‘hardened’ SPARQL endpoints may be less prone to down time – e.g., Fuseki documentation states that "authentication and control of the number of concurrent requests can be added using an Apache server"
standard extracts and starting points with examples may help
- emulate Social Explorer http://socialexplorer.com as a way to query the contents of a larger data source, in that case census data
- the linked data fragments technology (http://linkeddatafragments.org) may facilitate hosting linked data without the server-side overhead and risk of a public SPARQL endpoint

Risk of harvested or aggregated information going out of sync
resource sync (http://www.niso.org/workrooms/resourcesync/) — the need to repeatedly synchronize and update

risk of not knowing what to search for

publish starting points & examples of queries and/or canned responses

reconciliation services — not necessarily monopolies or centralized

iterative, with curation and provenance

common API for reconciliation building on the OpenRefine API — specify as much metadata as you have, get ranked results back

mashup tools that test connections

sameAs website

validation

RDF data shapes

DMCI RDF validation

extension mechanisms - Schema.org

query on different axes — query OCLC by VIAF id to get works

ability to push bookmarks but as small graphs of data, consumable by others

semantic web crawling

bookmark

a service where I can push the results of my search, organized by topic

a sort of Mendele but for everything

add it to a collection I have

similar to an annotation service

you search, you refine it, you step back — now only save as bookmarks at one level

nobody can use your bookmarks

2

a tool that would facilitate entity reconciliation

to put together UN and LC

a first pass, then improve that manually, then 2nd iteration

then publish — surface

manage difference of opinion

provenance

exclude some

centralized entity mapping

feedback by users on the mapping

need protocols

want to discover annotation — known servers with protocols

collections have been done by many different places

if we do linked data, my list is a list of URIs from many sources

on the UI won’t see that

assuming accessible SPARQL endpoints

3

other cleanup tasks — validation? consistency of ontology use

entity recognition — text mining or analytics for tools — autotaggers

4

constant crawling graphs of linked data

semantically aware web crawling — is it worth going down this path, what’s attached, what has changed

5

provenance space — who’s made a particular assertion for that

in the library domain, could imagine a layer about who’s responsible for an assertion

unspecified.

crowd sourcing — as move up toward the general public, typically track less who did it

variable credibility

acknowledge that

nanopublications

===== group 4 ====

reconciliation services — contains no data, queries a distributed set of resources

individual libraries will become the authorities for special collections — items, people, events

queries to a central area would find a match

cache the sameAs so don’t have to re-query

everybody who consumes has the cross-links

the sort of thing that OCLC might end up doing —

could be any type of object — logical to start with works

brings up the questions of the degrees of sameAs ness

when a new match is known, publish that — a notification mechanism

you would provenance those links to indicate where came from

used to be a plug-in for Netscape where a side-wiki and annotate — anybody could see what everyone else had done

now in the world of unique identifiers — a linkerator - for people to rank what they see

build up ant trails over time, around an object

how to make it in any way central — get it to the browser

how about the annotation example?

regular expressions against EAD for an object to suggest what they link to

feed into a system to validate

then give pointers to the link

other levels of relationship than sameAs

over time it would aggregate and

a clustering algorithm — the more a link is traversed, the space reduces

emergence sorting

software crawling the graph - how do you figure out what to trust? the world according to professor X or Y

trust is very tricky

a page rank algorithm for linked data — more for asserters

strenghthen the nodes to repeat confidence

repeating assertions in multiple repositories — I agree with them, the +1 or thumbs up

Reddit gets a lot of traction

nanopublications

if you reify assertions — to add confidence where have more knowledge or curation

confidence levels

wikipedia has a way to accept

no confidence in semantic search engines

too siloed

visualizations have to be crafted