Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • https://github.com/LD4P/discovery/projects/2 for issues etc. 
  • Research: how to go from knowledge graph to an index
  • BANG! (Bibliographic Aspects Newly GUI'd)
    • Jamboard link
    • Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
    • References/bibliography list (beginning)
    • BANG! preliminary design-ish/data questions link
    • 2022-04-0815
      • Steven and Huda looked at IMDB-id.loc.gov-wikidata-Catalog connections
      • Huda implemented a demo bringing in movie information from IMDB, also looked at finding related moves (e.g derived from Dune book). There is a lack of clear identifiers for movies in the data. Jason points out that Andy Horbal would be good to discuss how we might best to use external movie data
      • Looked at sinopia staging data and found that there is less useful data in there than there is in Sinopia prod. Will look a little more but expect that the best way forward is to create some examples in Sinopia that demonstrate the desired links
      • Reading "User conceptualizations of derivative relationships in the bibliographic universe" https://doi.org/10.1108/JD-10-2017-0139 
      • Steven thinking about design ideas based on cluster/work data, what different systems do with this sort of information. Working on slide deck, including idea from commercial sector
      • Sinopia prod data: Used Sinopia API to download JSON.  Each resource is represented as a JSON object with a data portion that is JSON-LD.  Extracted just the JSON-LD for the resources (3208 total) and added to a Fuseki dataset.  Began querying for work to work relationships.  Multiple work to work relationships but dwindles to only four when looking for where those works also have instances.
        • Next step: Do the same thing for Sinopia staging data.
      • OCLC Work Ids
        • Remembered we have an index! Used workid_facet to query the LD4P3 copy of the Cornell catalog to get all facet values and counts.
          • 884,868 Cornell records that have a work id value

          • Total number of work ids being used: 513,078

          • Number of work ids with only one match:317,970

          • Number of work ids with two or more match:195,108

            • Compare to LOC Hub ISBN analyses. Extrapolating to total number of hubs: 6000
      • Proof of concept for IMDB - Wikidata - Catalog example: Can we use Wikidata to get the IMDB URL and then get data from that IMDB URL to supplement info on the page
  • DAG Calls
    • 2022-04-08 - Three folks coming to talk about various topics and seed discussions about what worked and what didn't. Have also contracted 15 - Good call and followup discussions. Next time will try to reverse engineer the future depicted in Astrid's pictures. May 10 will have U Ghent
  • Document started re: Comments, Questions and Suggestions offered during the myriad of presentations provided. Huda will add link here
    • Huda has some extra notes to add
    • Can perhaps refine this into topics/themes or clarification questions
    • 2022-04-15 Still planning to do this

Linked-Data Authority Support (WP2)

  • Qa Sinopia Collaboration
    • 2022-04-15
      • One new direct access authorities (i.e. Humord) added to QA.  All 3 new authorities for the Norwegian project (i.e. Bibbi, Genre/Form) and Homosaurus are in production Sinopia.
      • Dave still needs to fix of total_number_found to make pagination work to get pagination working again in Sinopia.  Once that is done, it will just feed through to Sinopia without any additional work.
  • Best Practices for Authoritative Data working group (focus on Change Management) 
    • 2022-04-15
      • Met with the group and had a long conversation about the complexities of identifying the edges of the graph for delete.  LOC gives the edges by using blank nodes.  Getty does not use blank nodes and returns a significant amount of data beyond the subject entities triples.  Possible recommendations: 
        • delete only triples matching <subject_uri> ?p ?o
        • plus any triples in the same namespace that are in the return graph AND that are now orphaned (This could have unintended consequences.)
  • Containerization 
    • 2022-04-15
      • Containerizing the QaServer
        • Added GithHub actions to sync authorities and locale customizations from GithHub GitHub repo to S3.  Fully tested with -int and working as expected.  Testing on -stg and -prod are pending finalization of the deploy process.
        • Added GitHub actions for redeploy for -stg and -prod based on the action Greg wrote for -int.  This will likely address Issue #4, but is also pending finalization of the deploy process for -stg and -prod.
        • The other part of customization is adding ENV vars to drive components included on the monitor status page.  That is being done manually in the file stored on S3.  This is completed for -int, and pending finalization of the deploy process for -stg and -prod.
        • We are doing some renaming of components to make the naming consistent.  There may need to be some adjustment to action scripts once that is complete.
        • Greg setup Pingdom is setup to ping the monitor status page.  It correctly sent a 'site is down' message.  There is a bug in qa_server when the monitor status page is run for the first time in an app. Lynette will debug this and create a new image.  Hopefully it is a straight-foward forward problem and will be done this afternoon. 
        • There is a problem with precompiling of assets on -int.  Presumably this is a problem in the built image.  Perhaps it will be resolved when the image is rebuilt after resolving the monitor status page bug.  If not, there will need to be some debugging on this issue too. 
        • Greg notes that we need a separate volume for customization/localization information, has led to some naming fixes
        • Hope to build a staging environment today, should be a slight tweak to -int recipe... but likely some gotchas. Intend to work on this for a few more days in the hope of getting it over the line
      • Containerizing the Cache Indices
        • Focusing on QaServer now. After that done, Greg will try putting container with Dave's stuff up on AWS. Will work in Terraform as part of moving on from CloudFormation

Other Topics

...

  • Sinolio - Sinopia-FOLIO
    • 2021-12-17 - Work Cycle finished, sprint video out
  • OCLC Linked Data / Entities Advisory Group
    • 2021-12-10 OCLC presented at bigheads meeting this week, in testing
  • PCC 
    • 2021-01-21 Definitions and non-RDA final report to POCO (hopefully) to be submitted next week
    • 2022-01-14 Nothing new to report.
  • Authorities in FOLIO
    • 2022-03-25 Some transitions in team. Useful meeting with Jenn, Frances, Nick, and Darcy to decide what needs to be provided to build queue. Mockups look good and allow filtering on types of change (new, deleted, updated). Quite different indexing requirements for data maintenance vs discovery

...

  • KULA submission for special Issue "The Metadata Issue: Metadata as Knowledge" accepted, hopefully final edits sent
  • LD4 conference - submission date is next week
    • Discovery proposal focused on project work (20-30min presentation)
    • Discussion panel to talk about experiences implementing new discovery features
    • Huda & Steven thinking about discussion on discovery
  • DCMI https://www.dublincore.org/conferences/2022/cfp/ - deadline is May 2, conference f2f north America or virtual, in October
  • SWIB - later in the year

...