This Confluence wiki site, maintained by DuraSpace prior to the recent merger with LYRASIS, will transition from the duraspace.org domain to the lyrasis.org domain on Saturday, Nov 16 beginning at approximately 7pm ET. A period of downtime of 2-3 hours is expected. After the transition, this wiki will be available at https://wiki.lyrasis.org/. All links to duraspace.org wiki pages will be redirected to the correct lyrasis.org URL. If you have questions prior to or following the transition please contact: wikihelp@lyrasis.org.
Page tree
Skip to end of metadata
Go to start of metadata

 

facilitator: Steven Folsom

Themes Identified:

  • Requires a mix of automated and manual methods
  • Need tools to do this, e.g. present user with automated matches and allow them to make changes (this could then be used to tune the algorithm)
  • There's a potential to open this up to communities beyond library professionals (crowd-sourcing/niche-sourcing)

 

UNEDITED NOTES

Table 1

DPLA: placename resolution

  • matching against Geonames
  • staff discomfort
  • lack subject expertise in aggregated data

Entity recognition

  • use entire record as context for resolution
  • points vs. shapes in geo entity resolution
  • crowdsourcing opportunity?
  • OCLC - several passes through data, information from multiple sources (ISNI, VIAF, etc.)
  • need public feedback for last 20%
  • refine algorithms based on crowdsourcing feedback
  • machine transformation and confidence rating – mark that is machine-generated, with date

Table 2

strings --> things

  • need string info in perpetuity
  • accuracy, testability of ambiguity
  • places ... think maps ...
  • people
  • dates ... map interface
  • subjects

libraries divide and conquer entity cataloging

post-processing tools

  • human mediation
  • less human mediation
  • hybrid models – e.g., obit project
  • akin to OCR post-processing

accuracy tools

  • page rank algorithm
  • BibFrame converter – work accuracy?

entity extraction

  • from metadata – how structured is it?
  • lots of text – algorithms better

how motivate users to take tools/data for a spin?

what if we had no metadata and started only with full text?

Table 3

challenges

  • solutions – would be awesome

parsing MARC to find translaters and role

  • roles as strings should be things

person reconciliation

  • requires human review
  • resolve ambiguity in identity, roles, contributions
  • predicates restrict detail
    • e.g., performer vs. violinist

crowd sourcing

  • simple problems or too complex, requires experts?

music parsing

image identity

Table 4

UCSD – mix of auto & manual review

CERL – name, spelling & disambiguation

HBS – URIs provided by authority vendor

Create local auth record/URI for strings with no auth?

Feed into LC or OCLC for needed authorities?

Improve cataloging tools with type-ahead entity resolution

 

  • No labels