Time/Place

Time: 2:00pm ET

Audio Conference: 

Join from PC, Mac, Linux, iOS or Android: https://duraspace.zoom.us/j/8128353771

Or iPhone one-tap :
US: +16699006833,,8128353771# or +16468769923,,8128353771#
Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 669 900 6833 or +1 646 876 9923
Canada: +1 647 558 0588
Australia: +61 (0) 2 8015 2088
United Kingdom: +44 (0) 20 3695 0088
Meeting ID: 812 835 3771
International numbers available: https://zoom.us/u/MO73B

Attendees

Agenda/ Notes 

TopicLead
Review pilot guidelines and answer any questionsDavid

Discuss NLM pilot details

  • What are the desired outcomes?
  • What are the timelines?
  • What is our communication plan?
David

Next steps

All


Notes


Desired Outcomes:

  • Success would mean Fedora6 is something we can use and the steps for migration would be clear
  • Something we can use: 
    • Requirements: 
      • Migration performance 
        • how many objects representing
        • how much data migrated in
        • how much time
    • Are the development targets outlined by the Fedora 6 design summary aligned with NLM's needs?
    • Concern around external content
      • Current F3 stack makes heavy use of external content
      • How to use an external content model with the Fedora6 / OCFL design?
      • Would we (NLM) have to bring in all the external binaries into Fedora ?  Or is would it make sense to use external content?
        • OCFL objects can point to external content (via a URI)
        • NLM would always have the option to bring in the content into OCFL later.
        • Having binaries and metadata collocated is a principle of OCFL not  a requirement
    • Concerns around RDF and LDP: 

Timelines / Responsibilities

  • Two sprints for fedora - September  and November
  • We're looking at year long road map with Fedora 6 + migration tooling
  • Aiming to wrap up the pilot around this time next year.
  • Does that timeline seem reasonable to NLM? 
    • Busy through September
    • Overall timeframe looks good 
  • Sample data:  
    • Andrew Woods  is moving migration-utils project forward insofar as it is writing to OCFL directly
    • It would be helpful to vet the performance characteristics of that migration approach. 
    • Having a Fedora3 tar file would be useful for us to run some tests in the short term.
    • One initial test would be to run migration utils without writing anything on the fedora to establish a baseline for performance.
    • 60 TBs and 10 million objects
  • How important is it to retain your own URIs? 
    • Currently permalinks are stored in the object metadata
    • NLM does not care so much that Fedora 3 URLs change in the migration.
      • Fedora URLs are used in NLMs architecture so changing the code that resolves fedora links will have to change, but that is not a big deal.

Communication:

  • As sprints kick-off the Fedora team will need feedback on progress.  
  • NLM's participation in the weekly tech calls and  #fedora6-pilot would be very helpful.

Actions

  • Danny Bernstein  to bring in the question of external binaries in OCFL in the weekly tech call.  Pilot partners should feel free to raise the issue on in the #fedora6-pilots slack channel
  • Andrew Woods  to schedule a dedicated call to discuss URLs as URIs for digital objects.  
  • Nancy Fallgren and everyone in the current meeting  to review the thread and make sure to raise any issues that are not touched on in the thread.
  • Andrew Woods  to work with Doron Shalvi  on determining the way forward re sample data


  • No labels

2 Comments

  1. In reviewing the email thread about URIs and URLs, I think I dropped off because 1) the conversation veered into the topic of whether Fedora should be an LDP at all and 2) some of the tech speak was over my head and off putting (but it was a tech forum).  So, if we make this a discussion open to developers and metadata practitioners at the same time, could we try to 1) keep it well scoped and 2) try to be mindful of language barriers.  I went to a meeting of developers and metadata librarians once where from the start both sides agreed that anyone could raise their hand and say 'language' which would indicate we were using terms of art that others did not understand.  The meeting leaders made it less intimidating by raising their hands and yelling 'Language!" frequently at the start of the meeting.  It actually worked well to break down barriers and get people to think about how they were expressing their thoughts to a mixed audience.  Not saying that would work on a call, but you get the idea.

    That aside, I stand by the accuracy of my original post.  With apologies to Doron, I would make a change to his introduction –

    <Parent URL> <has child> <Child URL> should be expressed as <Parent URI> <has child> <Child URI>, followed by <Child URI> <has web location> <URL>. 

    And further explanation (with apologies to all if I'm stating the obvious) -

    URIs in RDF should be machine dereferenceable; they are intended for machine use, not human readability although they may redirect to a human readable web page.  An example is the Wikidata Concept URI for NLM https://www.wikidata.org/entity/Q611833, which redirects in a browser to a human readable web page (URL) about NLM https://www.wikidata.org/wiki/Q611833.  A machine only accesses the Concept URI to read and analyze Wikidata's RDF graph about NLM and to get the URL to display the web page and to gather the data displayed in the web page. In Fedora4, we were seeing developers equating the web page with the Thing, and there was also an issue about being able to use and maintain established RDF URIs from external RDF sources, like MeSH RDF or id.loc.gov or local triple stores.

    Using the example above to drive home the URL/URI difference -

    The Child Thing is the RDF Subject (center) of the Child graph.  The Child URL is the RDF Object of <has web location>, i.e., it is part of the Child Thing graph, but not the Child Thing itself.  The Child URL is accessible by referencing or querying the Child Thing graph.  There could conceivably be a graph around the Child URL as well - that graph would be about the web page itself (when it was created, size, when it was updated, encoding, etc.) and that would be a separate but related graph to the Child Thing graph.  Does that help with understanding?

    RDF is a very precise and granular expression format intended for relating and re-using data, so making correct and precise RDF statements is important to creating accurate data and graphs for that purpose.  If you want to say something about a Thing, you must be able to unequivocally identify that Thing with a machine dereferenceable URI in order to create an accurate, usable, and re-usable graph about It.  The Thing URI facilitates the ability to use and re-use data in RDF graphs for the purpose of realizing granular relationships among different Things (possibility resulting in serendipitous/unforeseen findings), which is core to LD and the Semantic Web.  An LDP should support that.

    1. Thanks for the clarification, Nancy. I agree that, if we do schedule a call on this topic, we should be mindful of scope and language barriers. Andrew posted the followed to the fedora6-pilots Slack channel to Doron - I don't believe you're in that channel so I wanted to make sure you saw it:

      "I expect/hope F6's less constraining relationship with RDF will make this a non-issue.
      Would you potentially be able to provide a sample, ideal RDF document with your preferred use of URIs/URLs?"

      Ahead of a potential call, it would be great to be able to review such an example if you could produce one. That would allow us to determine the gap (if any) between our high-level conception of Fedora 6 functionality with regard to RDF and the functionality you're looking for.