Services on linked data

LD4L Workshop Breakout Session, Tuesday, February 24

Risk of not knowing what to search for

Providing discovery endpoints
- ‘hardened’ SPARQL endpoints may be less prone to down time – e.g., Fuseki documentation states that "authentication and control of the number of concurrent requests can be added using an Apache server"
publishing starting points with examples and standard extracts may help
- emulate Social Explorer http://socialexplorer.com as a way to query the contents of a larger data source, in that case census data
- the linked data fragments technology (http://linkeddatafragments.org) may facilitate hosting linked data without the server-side overhead and risk of a public SPARQL endpoint
VIVO/Vitro 'rich export' – augmenting standard linked data responses with standard queries
- e.g., get all a person's publications from a single request rather than client having to issue multiple requests
Semantic Web crawling leveraging HTML web crawler experience

Synchronizing harvested information

Risk of harvested or aggregated information going out of sync
- Resource sync standard addressed the need to repeatedly synchronize and update

Desire to be able to query on different axes

e.g., query OCLC Works by VIAF identifier to get a list of works by that author

Reconciliation services

not necessarily centralized or monopolies

would work best in an iterative mode, with curation and provenance to manage difference of opinion (or evidence)
incorporate feedback from users

need protocols – could leverage a common API for reconciliation building on the OpenRefine API — specify as much metadata as you have, get ranked results back
surface (publish) the results
sameAs.org

Validation

RDF data shapes working group
DCMI tutorial on RDF validation
Linked data needs mashup tools that test connections and illustrate bringing data together

Ontology extension mechanisms

Schema.org extensions being proposed and managed on GitHub
- Bib Extend group and BiblioGraph

Ability to push bookmarks

Small graphs of data, consumable by others, to a platform similar to Mendeley but not limited to bibliographic material

A service where I can push the results of my search, organized by topic

Add things to a collection I have

Similar to an annotation service

You search, you refine it, you step back — now only save as bookmarks at one level

Nobody can use your web bookmarks now

centralized entity mapping

feedback by users on the mapping

need protocols

want to discover annotation — known servers with protocols

collections have been done by many different places

if we do linked data, my list is a list of URIs from many sources

on the UI won’t see that

assuming accessible SPARQL endpoints

3

other cleanup tasks — validation? consistency of ontology use

entity recognition — text mining or analytics for tools — autotaggers

4

constant crawling graphs of linked data

semantically aware web crawling — is it worth going down this path, what’s attached, what has changed

5

provenance space — who’s made a particular assertion for that

in the library domain, could imagine a layer about who’s responsible for an assertion

unspecified.

crowd sourcing — as move up toward the general public, typically track less who did it

variable credibility

acknowledge that

nanopublications

===== group 4 ====

reconciliation services — contains no data, queries a distributed set of resources

individual libraries will become the authorities for special collections — items, people, events

queries to a central area would find a match

cache the sameAs so don’t have to re-query

everybody who consumes has the cross-links

the sort of thing that OCLC might end up doing —

could be any type of object — logical to start with works

brings up the questions of the degrees of sameAs ness

when a new match is known, publish that — a notification mechanism

you would provenance those links to indicate where came from

used to be a plug-in for Netscape where a side-wiki and annotate — anybody could see what everyone else had done

now in the world of unique identifiers — a linkerator - for people to rank what they see

build up ant trails over time, around an object

how to make it in any way central — get it to the browser

how about the annotation example?

regular expressions against EAD for an object to suggest what they link to

feed into a system to validate

then give pointers to the link

other levels of relationship than sameAs

over time it would aggregate and

a clustering algorithm — the more a link is traversed, the space reduces

emergence sorting

software crawling the graph - how do you figure out what to trust? the world according to professor X or Y

trust is very tricky

a page rank algorithm for linked data — more for asserters

strenghthen the nodes to repeat confidence

repeating assertions in multiple repositories — I agree with them, the +1 or thumbs up

Reddit gets a lot of traction

nanopublications

if you reify assertions — to add confidence where have more knowledge or curation

confidence levels

wikipedia has a way to accept

no confidence in semantic search engines

too siloed

visualizations have to be crafted

Page tree

2015-02-24 Breakout: Services on linked data