Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Center
 Release 1.7Release 1.8Release 1.8.1 RC
Small data set (800,000 triples, 200 people, 3,500 articles)100%292%70%
Large data set (11,000,000 triples, 4,500 people, 40,000 articles)100%256%72%

Relative execution time. Testing by Jim Blake, as reported on the development list using the 30 slowest profiles.

 

Note that all testing reported below has been performed on a Macbook Pro, with PCI-E SSD. However, no specific tuning has been applied to either the hardware or software. Real world performance will depend on hardware and software configuration - it is recommended that you have an SSD / high IO performance storage layer, and if using SDB/MySQL, enough memory allocated to read the tables indexes.

...

  • Reduce overhead on reinferencing - up to 30% faster on individuals with no changes (depending on configuration)
  • Reasoning on a large dataset has more consistent performance (used to slow down / crash with memory used by the search indexer)
  • Faux property resolution rewritten to greatly reduce work being repeated in the presence of multiple instances of the same property
  • RDFService has additional methods
    • CONSTRUCT that takes a Model to write into
    • SELECT that takes a ResultSetConsumer - implemented by the user - which processes each QuerySolution as it is retrieved from the ResultSet
    • Reduce latency and memory overhead of reading into a Jena model; serialising; and then re-reading into a Jena model in the calling method.
      (NB: Responsible for 20 seconds of improvement to Map of Science / Temporal Graph)
  • Replace certain uses of RDFServiceDatasetGraph with RDFService (repeated calls to find() in RDFServiceDatasetGraph responsible for some overhead)
  • RDFServiceSDB always constructs queries against the graph, and not the union model (simple optional queries much faster against the graph than the dataset)
  • Clean up of many list view SPARQL queries, removing a few redundant patterns.
  • Cache list of graph lists when using a SPARQL backend for faster page loads (4 second saving on Virtuoso / 25 million triples)
  • NOTE: Some methods have changed their signatures to support the above. If you have custom Java code in your installation, you may need to make minor adjustments - typically, this will be exchanging a Dataset parameter for an RDFService.
  • NOTE: Some listview-*.xml files have changed, if you have customised your list views, you will need to resolve the conflicts.
  • NOTE List views that return publications (e.g. authorInAuthorship) now only resolve the editor person for publications that are either bibo:Book or bibo:BookSection (includes Chapter, etc.). This is necessary for reasonable performance when you have large publication lists that involve articles with many co-authors.

...