Date:

Attendees: Huda, Lynette, Tim, Steven, Greg, Jason

Regrets: Simeon

Discovery (WP3)

  • https://github.com/LD4P/discovery/projects/2 for issues etc. 
  • Draft of a discovery plan: https://docs.google.com/document/d/1zKYW7FQVVNvyd0XjjW0qWznX9PC3jbmOE6Kz_yygPjs/edit?usp=sharing
  • Research: how to go from knowledge graph to an index
  • DASH! (Displaying Authorities Seamlessly Here)
    • Dashboard design meeting kickoff notes
    • User reps D&A meeting: Expect next follow-up in August (Slides: from user reps meeting 2021-04-09 and result was "not no")
    • https://docs.google.com/document/d/1PgQi3xobsPhr9DUHU_YGeimL1OjNiiTdkiNWb36r3Gg/edit
    • Usability testing and followup for DASH: Usability results
      • Usability results, a few little things to finish up
      • GitHub issues
      • 2021-10-15: Feedback from User Reps, shared by Lenora: "In general, the catalog and LCSH content feels reliable, whereas some of the external data appears arbitrary and unbalanced, which makes us uncomfortable, especially for reference and instruction librarians. Exceptions to this are the photos and biographical information - everyone seemed fine with that content."  Slides with comments here: https://docs.google.com/presentation/d/12raOmGmBpG3DdLG-xqRYkcFe7HkPeehB8z9ozNSyokc/edit#slide=id.p.  Final slide has recommendations. Comments in slides discusses what was liked, not liked, problematic. Two classes of problems: timeline doesn't load properly; unclear whether zoomed in - but wouldn't likely change overall recommendations. Other concern: cool but is this all of the works? is this misleading? what's up with influences? Liked: images and biographical information. Cannot solve data problem. Four years ago, were not alright with images... so that change is nice... and implementing would be fairly easy, albeit need discussion with Tim and D&A.
        • Next step: next D&A sprint: 1st 2 weeks of November, incl. 2 meetings with user groups. Show this as part of D&A sprint during the first sprint meeting. Chunk of work can get done before the sprint begins. Tim does not believe there is a need to do another round of review. Start to move this into production and let them comment in sprint. Knowledge panel and author page where we add image and biographical info, which includes library holdings page. Bottom wouldn't have tabs but would have general browsing. Same lay out but does not include tabs with unliked content.
        • Need to work on way turn off individual data points/images for specific problematic data/image; per property / per URI basis for now. Possible approach: exclusions YAML file
    • 2020-10-1: Separately, need to finish documentation of work done so far and set up demo video.
  • BANG! (Bibliographic Aspects Newly GUI'd)
    • Jamboard link
    • Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
    • 2021-10-01: Experimented with various queries.  For retrieving sets of related ISBNs, queried for following relationship: An opus that has two separate works, where each work has an instance with an ISBN.  Parsed the results to create a CSV where each line starts with an ISBN and is followed by all the others related ISBNS (based on the query above).  Set up front-end code that takes ISBN from catalog, looks for any line with that ISBN and returns the entire set, then does an ISBN field "OR" query to the Solr index to return any matches with their titles. 
      • Questions to explore:
        • What is the goal of an Opus? And is this conceptually useful for users?
        • What is the data quality (both of the Opus data and connection via ISBN)?
        • What is the gap in data for CUL?
        • What is a good UI for display of this data? And should there be different UI for translations vs. based on, etc.?
      • Next steps: Huda will look for more specific relationships in the data (e.g. LCCN matching). Huda/Tim to explore UI options. Also look for definitions of opuses and hubs
    • 2021-10-08:
      • Steven: I began to gather properties in the PCC data for different instances of the same work. Shared with Huda, and will work to fine tune it next week. Re: PCC non-RDA Entities, the spreadsheet is almost ready to send to Kevin for feedback; we're mostly working on examples to help clarify the scope/range of our definitions.
      • Tim looked at SPARQL queries for different types of work
      • Asked ShareVDE slack regarding example about instances that don't look like equivalent ones (different works by Tony Ousler).
        • Answer: "When the bibliographic records do not have the tag 240 (uniform title) the matching criteria for Opus is made using tag 245 $a. So, this is the reason why your bibliographic records are grouped under the same opus "Tony Oursler" ... The matching criteria during Opus creation does not take into account values inside $b subfield of tag 245."
        • Generally, they are meant to be "functionally equivalent"
        • Our conclusion is that this grouping is an artifact of a lax query (e.g. ignoring subtitle in $b), the MARC is appropriate. We likely need to understand how commonly misleading groupings like this are made. Huda will follow up with additional questions about the cases where there isn't a uniform title (240)
      • LCCN data processing and display:
        • Did processing similar to that for ISBN for LCCN: Get all LCCNs where an Opus has two works and the works have instances with LCCNs.  Have incorporated that into the application in the same way
      • To do: Make the UI look similar for the "related works" section as far as types of information returned for isbns, lccns, and maybe related works. Also look at problematic cases (e.g "Geography" shouldn't be related to "Hamlet") to see why this is happening.
    • 2021-10-15:
      • Info display for related ISBNs and LCCNs is mostly working. Two pieces of confusion: online/at-library. Huda will reach out to Frances
      • Had examples where Work was not the object of any statements. Not many - fewer than other Opera related examples. Need to see what shows up when approaching query with other properties. Biggest challenge right now is how long the queries take and whether they'll time out. Huda has been requested to add something to a SHARE-VDE forum; might be seeking help to form the question. In data_feedback SVDE Slack channel, Steven asked questions re: Stanford data for QA work.
      • Robustness of data? In Stanford data, has to jump through a few nodes to get to data that should have been low-hanging. Opus was aggregating all labels from Instances; mostly Instance had a title... Work and Opus had rdfs:label. Didn't investigate label coverage much but was looking at relationships. Possible we use the links/relatedness but we go to SOLR index once we have the match.
  • DAG Calls
    • 2021-10-1: Next two meetings will be usability/user research focused: 10/12 with LINCS, 10/26 with Harvard image research/IIIF D4H user research.
    • 2021-10-15: Kim Martin & her student sgave interesting presentation. Reaching out to ask whether any archival linked data. Huda will ask Elizabeth Russey Roke.

Linked-Data Authority Support (WP2)

  • Qa Sinopia Collaboration – Support and evolve QA+cache instance for use with Sinopia
    • 2021-10-15 - Met with Stanford dev with topics...
      • ShareVDE 2.0 CKB data is available.  It has been ingested into the cache and the basic analysis of extended context is done.  Dave will index, then it can be configured in QA and Sinopia.
      • Dave will regenerate the index for all cached authorities to bring in the new case-insensitivity update.
      • Several folks are working to reconcile differences in entities and subauthorities for OCLC data.
      • Publisher cities select list (created by Jim Hahn) is in QA.  There are some challenges on the Sinopia side due to the complexity of the way they access outside lookups.  Sinopia dev plan to refactor in next work cycle (staring in about 2 weeks), but it depends on its prioritization in the context of other work.  Without this, Sinopia is not able to make use of the new select list or other simple vocabularies brought in via YAML definitions.  This is a problem for the Sinopia team to resolve.
      • We have a number of issues that are targeted for completion in the next two weeks to support Sinopia's next work cycle.  Stanford has also requested cache search documentation.
  • Best Practices for Authoritative Data working group (focus on Change Management)
    • 2021-10-15
      • We are at the point of needing to create a context document (ex. IIIF context) that can be used in activity streams.  Minimally, this will extend the activity streams context adding DEPRECATION, MERGE, and SPLIT activities.
      • For this document, we need a home for the document to live.  This will be LD4 in github.  We also need a namespace.  That is still under discussion, but it seems to be leaning toward Entity Metadata Management.
      • Looking at rewriting the main documentation in the style of the IIIF activity streams document.
      • Still need refinement for dates and object types. 
  • Cache Containerization Plan - Develop a sustainable solution that others can deploy
    • 2021-10-15
      • Working on moving initializer setup to environment variables to avoid requiring a volume to support these customizations.
      • Wrote github action scripts to auto-deploy the -int image with any commit to dev branch and the -stg image with any commit to main branch.  These are both private ECR repositories.  The production image is in a public repository which has a different login process.  Lynette believes she has a solution but need to test it.
      • Greg and Lynette are in discussions about how to deploy the -int service.  During this, we will update the instructions.  Lynette has been taking notes on the basic process for using the templates.  One thing that has become clear is that the templates should be extracted out into their own github repo.  If you fork the current repo, it will try to run the deploy scripts which we don't want to happen.
      • The plan for moving the containerization process forward state so far is...
        • DONE - Lynette will clean up the images in ECR
        • PARTIALLY DONE - Lynette will get the github-actions deploying images.
          • DONE for the private repos.
          • IN PROCESS - Still working on the public repo.  I believe I have a solution but need to test it.
        • IN PROCESS - Lynette will update the env file to allow for initializers to draw their values from that file.
        • IN PROCESS - Greg can walk Lynette through setting up -int.  This should identify…
          • what can be done by a moderate privileged user and what has to be done by a sys opts user
          • make changes/additions to documentation as needed
        • Lynette will move the templates to a new github repo
        • Lynette will setup -stg except where sys opts privileges are required to proceed.

Other Topics

  • Sinolio - Sinopia-FOLIO
    • 2021-10-15: Little was done on Sinolio last sprint - mostly was Symphony integration; this sprint review is this afternoon so will know more then.
  • OCLC Linked Data / Entities Advisory Group
  • PCC 
    • 2021-10-15: Task Group on Non-RDA Entities - slight change in how POCO expects this vocab to be implemented, which effects some of the examples. Once done, will work with Kevin on specific issues converting to RDF
  • Authorities in FOLIO
    • 2021-10-15 Frances has database she is using for discovery. She has begun to successfully parse weekly Peter Ward files and updating her database. will then test whether that database can serve as source. Meeting every-other-week. Nick is investigating creating a GUI rather than spreadsheets; question about how much to invest, given that this would be a local solution and no other libraries are identified yet to do work this way.

Upcoming meetings

  • https://kula.uvic.ca/index.php/kula/announcement/view/1 .  Call for Proposals - Special Issue: "The Metadata Issue: Metadata as Knowledge".  
  • WikidataCon; suggested we present Activity Streams in that. Lynette will reach out to Lydia to see whether good fit. Happening sometime around 10/30.
  • SWIB (11/29-12/3) virtual again this year - No proposals submitted
  • Virtual Blacklight Summit in 8-10 November. Last year had an institutional update from Cornell; can do this again. And/or can propose a session on the LD4P discovery work. Informal CFP has gone out.
    • Huda Khan has filled out form to offer institutionals demo and possibly demo work. Meeting with Melissa after this call as they did it last year. 5-6 minutes total
    • Demo videos due by 11/3

Next Meeting(s), anyone out?:

  • 2021-10-22: Lynette; Steven & Jason will drop off at 10 for the communities of practice meeting. Check whether we can meet at 9am again.