Date

06 Dec 2019

Time: 9:00 am, Eastern Time (New York, GMT-04:00)
See in your timezone

Call-in Information

To join the online meeting:

Go to: https://lyrasis.zoom.us/my/vivo1
One tap mobile:
- US: +16699006833,,9358074182# or +19292056099,,9358074182#
Or Telephone:
- US: +1 669 900 6833 or +1 929 205 6099 or 877 853 5257
- Meeting ID: 935 807 4182
International numbers available: https://zoom.us/u/aeANHanzED

Slack

https://vivo-project.slack.com
- Self-register at: http://bit.ly/vivo-slack

Attendees

Indicating note-taker

Objective

Moving towards a decision on VIVO's default triplestore (and community recommendations)
1. SDB
2. TDB
3. External triplestore

Agenda

Brief introductions: What is your interest in the conversation?
Pros / Cons of each option (see table in notes)
1. Performance characteristics (benchmarks on READ?)
2. Reliability
3. ACID compliance
4. Maintenance implications
5. Future-proofing
6. Community impact
Is there a recommendation from this group?
Follow-on actions

Notes

Draft notes in Google-Doc

Recording

http://bit.ly/2019-12-vivo-sdb-tdb

Goal to understand differences between SDB/TDB(2). Recommend best practices. Set a default for VIVO.

Andy Seaborne -- answers questions. Apache Jena is an open source project with what that entails.

Andy - settling on TDB2

Can’t go directly into SDB unless you understand how the access works on the lowest levels.

50 million triples(wild guess ) is a practical limit with SDB. It’s the interaction between basic graph patterns and filters.

TDB doesn’t support incremental loading. Massive parallelism is recommended. Set the flags in the bulk-loader. Be sure to try different ones to see which works best for your system/setup.

TDB1 slightly better at small commits. TDB2 at the moment has additional commit overhead to be eventually removed. TDB2 better at large commits -- 200 million added is possible.

Each index loads on a separate thread. Load named graphs in parallel.

Corruption possible across technologies. Bizarre cases. Record what you put in. Dump regularly.

However, regarding stability, TDB has the most community usage, and therefore is the most bullet-proof

Queries can affect performance greatly. And in some cases the optimize spends measurable time evaluating the query. It’s programming.

Can use TDB and SDB together. Queries in TDB. Data recovery in SDB.

Suggestion regarding "future-proofing": avoid coupling too tightly to any given technology... implement against standards

AWS Neptune isn’t blazegraph. Neptune overwrote the SERVICE calls.

Next steps:

What are the outstanding questions at this point?
Should we have a follow-on call (in the new year) to reach a community recommendation?

Space shortcuts

Page tree

Date

Call-in Information

Slack

Attendees

Objective

Agenda

Notes

Recording

Actions

Space shortcuts

Page tree

2019-12-06 - Special Topic - TDB vs SDB

Date

Call-in Information

Slack

Attendees

Objective

Agenda

Notes

Recording

Actions