Services on linked data

LD4L Workshop Breakout Session, Tuesday, February 24

facilitator: Jon Corson-Rikert

Risk of not knowing what to search for

may be addressed by

Providing discovery endpoints and what they hold
- ‘hardened’ SPARQL endpoints may be less prone to down time – e.g., Fuseki documentation states that "authentication and control of the number of concurrent requests can be added using an Apache server"
standard extracts and publishing starting points with examples may examples and standard extracts may help
- emulate Social Explorer http://socialexplorer.com as a way to query the contents of a larger data source, in that case census data
- the linked data fragments technology (http://linkeddatafragments.org) may facilitate hosting linked data without the server-side overhead and risk of a public SPARQL endpoint
VIVO/Vitro 'rich export' – augmenting standard linked data responses with standard queries
- e.g., get all a person's publications from a single request rather than client having to issue multiple requests

Synchronizing harvested information

Risk of harvested or aggregated information going out of sync
risk of not knowing what to search for
publish starting points & examples of queries and/or canned responses
- Semantic Web crawling leveraging HTML web crawler experience
  - what's attached
  - what has changed
Desire to be able to query on different axes
- e.g., query OCLC Works by VIAF identifier to get a list of works by that author
Reconciliation services
- not necessarily centralized or monopolies
- would work best in an iterative mode, with curation and provenance to manage difference of opinion (or evidence)
  - who's made that assertion – differentiate librarians from crowdsourcing
  - some way to express variable confidence levels
- incorporate feedback from users
- need protocols – could leverage a
reconciliation services — not necessarily monopolies or centralized
iterative, with curation and provenance
- common API for reconciliation building on the OpenRefine API — specify as much metadata as you have, get ranked results back
mashup tools that test connections
sameAs website
- surface (publish) the results – known servers, as with annotations – select which servers to request responses or harvest data from
  - notifications of new matches?
  - ability to +1 or thumbs-up the connection to corroborate – Reddit gets a lot of traction that way
  - repeating assertions in multiple repositories
- sameAs.org but with other expressions for and levels of confidence in the relationship
Validation
validation
- RDF data shapes working group
DMCI
- DCMI tutorial on RDF validation
- Measure the consistency of ontology use
- Linked data needs mashup tools that test connections and illustrate bringing data together
Ontology extension mechanisms
-
Schema.org
query on different axes — query OCLC by VIAF id to get works
- extensions being proposed and managed on GitHub
  Bib Extend group and BiblioGraph
Ability to push bookmarks
- Small
ability to push bookmarks but as small
- graphs of data, consumable by others
semantic web crawling
bookmark
- , to a platform similar to Mendeley but not limited to bibliographic material
- A
a
- service where I can push the results of my search, organized by topic
a sort of Mendele but for everything
add it
- Add things to a collection I have
similar
- Similar to an annotation service
you
- You search, you refine it, you step back — now only save as bookmarks at one level
nobody
- Nobody can use your web bookmarks
2
a tool that would facilitate entity reconciliation
to put together UN and LC
a first pass, then improve that manually, then 2nd iteration
then publish — surface
manage difference of opinion
provenance
exclude some
centralized entity mapping
feedback by users on the mapping
need protocols
want to discover annotation — known servers with protocols
collections have been done by many different places
if we do linked data, my list is a list of URIs from many sources
on the UI won’t see that
assuming accessible SPARQL endpoints
3
other cleanup tasks — validation? consistency of ontology use
entity recognition — text mining or analytics for tools — autotaggers
4
constant crawling graphs of linked data
semantically aware web crawling — is it worth going down this path, what’s attached, what has changed
5
provenance space — who’s made a particular assertion for that
in the library domain, could imagine a layer about who’s responsible for an assertion
unspecified.
crowd sourcing — as move up toward the general public, typically track less who did it
variable credibility
acknowledge that
nanopublications
===== group 4 ====
reconciliation services — contains no data, queries a distributed set of resources
- now
- Hide the URIs behind a UI
Additional ideas
- Semantic autotagging
- Nanopublications – breaking academic articles into independent assertions with a mechanism to agree/disagree
- Side wikis – a plugin for the Netscape browser where a wiki could be associated with any web page and display additional, user-entered content or commentary on any web page
- individual libraries will become the authorities for special collections — items, people, events
  - queries to a central
...
- - area would find a match
  - cache the sameAs so don’t have to re-query
...
- - ; everybody who consumes has the cross-links
  - the sort of thing that OCLC might end up doing
...
- - — could be any type of object — logical to start with
...
brings up the questions of the degrees of sameAs ness
when a new match is known, publish that — a notification mechanism
you would provenance those links to indicate where came from
used to be a plug-in for Netscape where a side-wiki and annotate — anybody could see what everyone else had done
now in the world of unique identifiers — a linkerator - for people to rank what they see
build up ant trails over time, around an object
how to make it in any way central — get it to the browser
how about the annotation example?
regular expressions against EAD for an object to suggest what they link to
- - works
- regular expressions to apply against EAD to suggest what is linked to; feed into a system to validate
...
- , then give pointers to the link
other levels of relationship than sameAs
over time it would aggregate and
- a clustering algorithm
...
- to track the number of times a link between two entities is traversed,
...
emergence sorting
software crawling the graph - how do you figure out what to trust? the world according to professor X or Y
trust is very tricky
- effectively shortening the distance between them
- a better
...
- page rank algorithm for linked data
...
strenghthen the nodes to repeat confidence
repeating assertions in multiple repositories — I agree with them, the +1 or thumbs up
Reddit gets a lot of traction
nanopublications
if you reify assertions — to add confidence where have more knowledge or curation
confidence levels
wikipedia has a way to accept
no confidence in semantic search engines
- anybody a favorite semantic search engine (no – too siloed)
...
- visualizations have to be crafted individually

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Services on linked data

Risk of not knowing what to search for

Synchronizing harvested information

Desire to be able to query on different axes

Reconciliation services

Validation

Ontology extension mechanisms

Schema.org

Ability to push bookmarks

Additional ideas

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

Services on linked data

Risk of not knowing what to search for

Synchronizing harvested information

Desire to be able to query on different axes

Reconciliation services

Validation

Ontology extension mechanisms

Schema.org

Ability to push bookmarks

Additional ideas