2015-02-23 breakout: Entity resolution (strings to things)

facilitator: Steven Folsom

Requires a mix of automated and manual methods
Need tools to do this, e.g. present user with automated matches and allow them to make changes (this could then be used to tune the algorithm)
There's a potential to open this up to communities beyond library professionals (crowd-sourcing/niche-sourcing)

UNEDITED NOTES

DPLA: placename resolution

Entity recognition

use entire record as context for resolution
points vs. shapes in geo entity resolution
crowdsourcing opportunity?
OCLC - several passes through data, information from multiple sources (ISNI, VIAF, etc.)
need public feedback for last 20%
refine algorithms based on crowdsourcing feedback
machine transformation and confidence rating – mark that is machine-generated, with date

strings --> things

libraries divide and conquer entity cataloging

post-processing tools

accuracy tools

entity extraction

how motivate users to take tools/data for a spin?

what if we had no metadata and started only with full text?

challenges

parsing MARC to find translaters and role

person reconciliation

crowd sourcing

music parsing

image identity

UCSD – mix of auto & manual review

CERL – name, spelling & disambiguation

HBS – URIs provided by authority vendor

Create local auth record/URI for strings with no auth?

Feed into LC or OCLC for needed authorities?

Improve cataloging tools with type-ahead entity resolution

Page tree