Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Services on linked data

LD4L Workshop Breakout Session, Tuesday, February 24

facilitator: Jon Corson-Rikert

Risk of not knowing what to search for

may be addressed by
  • Providing discovery endpoints and what they hold
    • ‘hardened’ SPARQL endpoints may be less prone to down time – e.g., Fuseki documentation states that "authentication and control of the number of concurrent requests can be added using an Apache server"
  • standard extracts and publishing starting points with examples may examples and standard extracts may help
    • emulate Social Explorer http://socialexplorer.com as a way to query the contents of a larger data source, in that case census data
    • the linked data fragments technology (http://linkeddatafragments.org) may facilitate hosting linked data without the server-side overhead and risk of a public SPARQL endpoint
  • VIVO/Vitro 'rich export' – augmenting standard linked data responses with standard queries
    • e.g., get all a person's publications from a single request rather than client having to issue multiple requests

Synchronizing harvested information

  • Risk of harvested or aggregated information going out of sync
    risk of not knowing what to search for
    publish starting points & examples of queries and/or canned responses
    • Semantic Web crawling leveraging HTML web crawler experience
      • what's attached
      • what has changed

    Desire to be able to query on different axes

     

    Reconciliation services

    • not necessarily centralized or monopolies
    • would work best in an iterative mode, with curation and provenance to manage difference of opinion (or evidence)
      • who's made that assertion – differentiate librarians from crowdsourcing
      • some way to express variable confidence levels
    • incorporate feedback from users
    • need protocols – could leverage a
    reconciliation services — not necessarily monopolies or centralized
    iterative, with curation and provenance
    • common API for reconciliation building on the OpenRefine API — specify as much metadata as you have, get ranked results back
    mashup tools that test connections
    sameAs website
    • surface (publish) the results – known servers, as with annotations – select which servers to request responses or harvest data from
      • notifications of new matches?
      • ability to +1 or thumbs-up the connection to corroborate – Reddit gets a lot of traction that way
      • repeating assertions in multiple repositories
    • sameAs.org but with other expressions for and levels of confidence in the relationship

    Validation

    validation

    DMCI
    • DCMI tutorial on RDF validation
    • Measure the consistency of ontology use
    • Linked data needs mashup tools that test connections and illustrate bringing data together

    Ontology extension mechanisms

    -

    query on different axes — query OCLC by VIAF id to get works

    Ability to push bookmarks

    • Small
    ability to push bookmarks but as small
    • graphs of data, consumable by others
    semantic web crawling
    bookmark
    • , to a platform similar to Mendeley but not limited to bibliographic material
    • A
    a
    • service where I can push the results of my search, organized by topic
    a sort of Mendele but for everything
    add it
    • Add things to a collection I have 
    similar
    • Similar to an annotation service
    you
    • You search, you refine it, you step back — now only save as bookmarks at one level
    nobody
    • Nobody can use your web bookmarks
    2
    a tool that would facilitate entity reconciliation
    to put together UN and LC
    a first pass, then improve that manually, then 2nd iteration
    then publish — surface
    manage difference of opinion
    provenance
    exclude some
    centralized entity mapping
    feedback by users on the mapping
    need protocols
    want to discover annotation — known servers with protocols 
    collections have been done by many different places
    if we do linked data, my list is a list of URIs from many sources
    on the UI won’t see that
    assuming accessible SPARQL endpoints
    3
    other cleanup tasks —  validation? consistency of ontology use
    entity recognition — text mining or analytics for tools — autotaggers
    4
    constant crawling graphs of linked data
    semantically aware web crawling — is it worth going down this path, what’s attached, what has changed
    5
    provenance space — who’s made a particular assertion for that
    in the library domain, could imagine a layer about who’s responsible for an assertion
    unspecified.
    crowd sourcing — as move up toward the general public, typically track less who did it
    variable credibility
    acknowledge that
    nanopublications
    ===== group 4 ====
    reconciliation services — contains no data, queries a distributed set of resources
    • now
    • Hide the URIs behind a UI

    Additional ideas

    • Semantic autotagging
    • Nanopublications – breaking academic articles into independent assertions with a mechanism to agree/disagree
    • Side wikis – a plugin for the Netscape browser where a wiki could be associated with any web page and display additional, user-entered content or commentary on any web page
    • individual libraries will become the authorities for special collections — items, people, events
      • queries to a central

    ...

      • area would find a match
      • cache the sameAs so don’t have to re-query

    ...

      • ; everybody who consumes has the cross-links
      • the sort of thing that OCLC might end up doing

    ...

      • could be any type of object — logical to start with

    ...

    brings up the questions of the degrees of sameAs ness
    when a new match is known, publish that — a notification mechanism
    you would provenance those links to indicate where came from
    used to be a plug-in for Netscape where a side-wiki and annotate — anybody could see what everyone else had done
    now in the world of unique identifiers — a linkerator - for people to rank what they see
    build up ant trails over time, around an object
    how to make it in any way central — get it to the browser
    how about the annotation example?
    regular expressions against EAD for an object to suggest what they link to
      • works
    • regular expressions to apply against EAD to suggest what is linked to; feed into a system to validate

    ...

    • , then give pointers to the link
    other levels of relationship than sameAs
    over time it would aggregate and 
    • a clustering algorithm

    ...

    • to track the number of times a link between two entities is traversed,

    ...

    emergence sorting
    software crawling the graph - how do you figure out what to trust? the world according to professor X or Y
    trust is very tricky
    • effectively shortening the distance between them
    • a better

    ...

    • page rank algorithm for linked data

    ...

    strenghthen the nodes to repeat confidence
    repeating assertions in multiple repositories — I agree with them, the +1 or thumbs up
    Reddit gets a lot of traction
    nanopublications
    if you reify assertions — to add confidence where have more knowledge or curation
    confidence levels
    wikipedia has a way to accept 
    no confidence in semantic search engines
    • anybody a favorite semantic search engine (no – too siloed)

    ...

    • visualizations have to be crafted individually