Date: Thu, 28 Mar 2024 16:22:04 -0400 (EDT)
Message-ID: <136549982.28856.1711657324982@lyrasis1-roc-mp1>
Subject: Exported From Confluence
MIME-Version: 1.0
Content-Type: multipart/related;
boundary="----=_Part_28855_718288.1711657324981"
------=_Part_28855_718288.1711657324981
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Content-Location: file:///C:/exported.html
facilitator: Steven Folsom
Themes Identified:
- Requires a mix of automated and manual methods
- Need tools to do this, e.g. present user with automated matches and all=
ow them to make changes (this could then be used to tune the algorithm)
- There's a potential to open this up to communities beyond library profe=
ssionals (crowd-sourcing/niche-sourcing)
UNEDITED NOTES
Tab=
le 1
DPLA: placename resolution
- matching against Geonames
- staff discomfort
- lack subject expertise in aggregated data
Entity recognition
- use entire record as context for resolution
- points vs. shapes in geo entity resolution
- crowdsourcing opportunity?
- OCLC - several passes through data, information from multiple sources (=
ISNI, VIAF, etc.)
- need public feedback for last 20%
- refine algorithms based on crowdsourcing feedback
- machine transformation and confidence rating =E2=80=93 mark that is mac=
hine-generated, with date
Tab=
le 2
strings --> things
- need string info in perpetuity
- accuracy, testability of ambiguity
- places ... think maps ...
- people
- dates ... map interface
- subjects
libraries divide and conquer entity cataloging
post-processing tools
- human mediation
- less human mediation
- hybrid models =E2=80=93 e.g., obit project
- akin to OCR post-processing
accuracy tools
- page rank algorithm
- BibFrame converter =E2=80=93 work accuracy?
entity extraction
- from metadata =E2=80=93 how structured is it?
- lots of text =E2=80=93 algorithms better
how motivate users to take tools/data for a spin?
what if we had no metadata and started only with full text?
Tab=
le 3
challenges
- solutions =E2=80=93 would be awesome
parsing MARC to find translaters and role
- roles as strings should be things
person reconciliation
- requires human review
- resolve ambiguity in identity, roles, contributions
- predicates restrict detail=20
- e.g., performer vs. violinist
crowd sourcing
- simple problems or too complex, requires experts?
music parsing
image identity
Tab=
le 4
UCSD =E2=80=93 mix of auto & manual review
CERL =E2=80=93 name, spelling & disambiguation
HBS =E2=80=93 URIs provided by authority vendor
Create local auth record/URI for strings with no auth?
Feed into LC or OCLC for needed authorities?
Improve cataloging tools with type-ahead entity resolution
------=_Part_28855_718288.1711657324981--