Date

Call-in Information

To join the online meeting:

Slack

Attendees

(star)  Indicating note-taker

  1. Andrew Woods
  2. Andy Seaborne
  3. Hunter Jarrell
  4. Taeber Rapczak
  5. Brian Lowe
  6. Don Elsborg
  7. Benjamin Gross
  8. Graham Triggs
  9. Ralph O'Flinn
  10. Alexander (Sacha) Jerabek
  11. William Welling
  12. Douglas C. Hahn
  13. Steven McCauley
  14. Kevin Hanson
  15. Mike Conlon

Objective

  1. Moving towards a decision on VIVO's default triplestore (and community recommendations)

    1. SDB
    2. TDB
    3. External triplestore

Agenda

  1. Brief introductions: What is your interest in the conversation?
  2. Pros / Cons of each option (see table in notes)
    1. Performance characteristics (benchmarks on READ?)
    2. Reliability
    3. ACID compliance
    4. Maintenance implications
    5. Future-proofing
    6. Community impact
  3. Is there a recommendation from this group?
  4. Follow-on actions

Notes 

Draft notes in Google-Doc

Recording


Goal to understand differences between SDB/TDB(2).  Recommend best practices. Set a default for VIVO.

Andy Seaborne -- answers questions.  Apache Jena is an open source project with what that entails.

Andy - settling on TDB2

Can’t go directly into SDB unless you understand how the access works on the lowest levels.

50 million triples(wild guess ) is a practical limit with SDB. It’s the interaction between basic graph patterns and filters. 

TDB doesn’t support incremental loading. Massive parallelism is recommended. Set the flags in the bulk-loader. Be sure to try different ones to see which works best for your system/setup.

TDB1 slightly better at small commits.  TDB2 at the moment has additional commit overhead to be eventually removed.  TDB2 better at large commits -- 200 million added is possible.

Each index loads on a separate thread.  Load named graphs in parallel.

Corruption possible across technologies. Bizarre cases. Record what you put in.  Dump regularly.

  • However, regarding stability, TDB has the most community usage, and therefore is the most bullet-proof

Queries can affect performance greatly.  And in some cases the optimize spends measurable time evaluating the query.  It’s programming.

Can use TDB and SDB together.  Queries in TDB. Data recovery in SDB.

Suggestion regarding "future-proofing": avoid coupling too tightly to any given technology... implement against standards

AWS Neptune isn’t blazegraph. Neptune overwrote the SERVICE calls.

Next steps:

  • What are the outstanding questions at this point?
  • Should we have a follow-on call (in the new year) to reach a community recommendation?

Actions

  •  


  • No labels