Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note
titleRelease Candidate

This document is currently being drafted. Items on this page are expected to be in the release, but this is not guaranteed.

 

What is this document?

The VIVO 1.8.1 release concentrates on fixes for certain bugs and performance issues. Some minor, non-breaking additions are present in the ontology, and a non-breaking addition to the UI.

Performance Improvements

 

Center
 Release 1.7Release 1.8Release 1.8.1 RC
Small data set (800,000 triples, 200 people, 3,500 articles)100%292%70%
Large data set (11,000,000 triples, 4,500 people, 40,000 articles)100%256%72%

Relative execution time. Testing by Jim Blake, as reported on the development list using the 30 slowest profiles.

 

Note that all testing reported below has been performed on a Macbook Pro, with PCI-E SSD. However, no specific tuning has been applied to either the hardware or software. Real world performance will depend on hardware and software configuration - it is recommended that you have an SSD / high IO performance storage layer, and if using SDB/MySQL, enough memory allocated to read the tables indexes.

 

Page Rendering

Note

AntiSamy is no longer used to filter fields before they are rendered. For a large profile in a test dataset, this was responsible for over two seconds of the execution time required to render a profile.

A simple regular expression is used to filter out and JavaScript elements - this is 300x faster than using AntiSamy.

...

  • Reduce overhead on reinferencing - up to 30% faster on individuals with no changes (depending on configuration)
  • Reasoning on a large dataset has more consistent performance (used to slow down / crash with memory used by the search indexer)
  • Faux property resolution rewritten to greatly reduce work being repeated in the presence of multiple instances of the same property
  • RDFService has additional methods
    • CONSTRUCT that takes a Model to write into
    • SELECT that takes a ResultSetConsumer - implemented by the user - which processes each QuerySolution as it is retrieved from the ResultSet
    • Reduce latency and memory overhead of reading into a Jena model; serialising; and then re-reading into a Jena model in the calling method.
      (NB: Responsible for 20 seconds of improvement to Map of Science / Temporal Graph)
  • Replace certain uses of RDFServiceDatasetGraph with RDFService (repeated calls to find() in RDFServiceDatasetGraph responsible for some overhead)
  • RDFServiceSDB always constructs queries against the graph, and not the union model (simple optional queries much faster against the graph than the dataset)
  • Clean up of many list view SPARQL queries, removing a few redundant patterns.
  • Cache list of graph lists when using a SPARQL backend for faster page loads (4 second saving on Virtuoso / 25 million triples)
  • NOTE: Some methods have changed their signatures to support the above. If you have custom Java code in your installation, you may need to make minor adjustments - typically, this will be exchanging a Dataset parameter for an RDFService.
  • NOTE: Some listview-*.xml files have changed, if you have customised your list views, you will need to resolve the conflicts.
  • NOTE List views that return publications (e.g. authorInAuthorship) now only resolve the editor person for publications that are either bibo:Book or bibo:BookSection (includes Chapter, etc.). This is necessary for reasonable performance when you have large publication lists that involve articles with many co-authors.

...