Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that all testing has been performed on a Macbook Pro, with PCI-E SSD. However, no specific tuning has been applied to either the hardware or software. Real world performance will depend on hardware and software configuration - it is recommended that you have an SSD / high IO performance storage layer, and if using SDB/MySQL, enough memory allocated to read the tables indexes.

Page Rendering

Note: Anti

Page Rendering

Memory Usage

 

Under The Hood

 

Bug Fixes

Note

AntiSamy is no longer used to filter fields before they are rendered. For a large profile in a test dataset, this was responsible for over two seconds of the execution time required to render a profile.

A simple regular expression is used to filter out and JavaScript elements - this is 300x faster than using AntiSamy.

  • Most pages - even large profiles - render within approximately two seconds
    • Large profile in a test dataset takes between 1.5 and 2 seconds to render (It has been reported that the same large profiles took 4.3 seconds in v1.7, and 5.3 seconds in v1.8)
    • Large profile when logged in as root user takes 6.5 seconds to render (was reported to be 7.75 seconds in v1.7, and 14.7 seconds in v1.8)
    • Large profile when logged in as site admin user takes 2.5 seconds to render
      (Note: Site Admin by default does not display the "related by" faux property - this is responsible for the majority of the performance hit when logged in as root)
  • Large profile 

 

Visualizations

All visualisations have been overhauled

  • Map of Science and Temporal Graphs significantly faster
      • Under 3 seconds for a 1,218,694 Quad dataset (previously 1 minute 20 seconds)
      • Approx 2 minutes for a 24,647,681 Quad dataset. Contains 145,000 people, 155,000 publications and 14,000 journals
    • Person level Map of Science return in under 2 seconds, using direct queries of the triple store
    • Person level Map of Science will use the system-level cache once queries take longer than 2 seconds, if the system-cache has been populated
    • Background refresh of Map of Science / Temporal Graph data - once populated, all requests are served from the cache whilst refreshes occur
  • CoAuthor and CoInvestigator visualisations use short-lived caches to prevent multiple executions of the same query in rendering a single visualisation
  • Minor tweak to CoAuthor query to improve performance
  • Sparklines use some of the under the hood improvements

Memory Usage

  • New data structures for Map of Science / Temporal Graphs use lightweight Java objects instead of Jena models (should use much less memory)
  • Search Indexer does not queue statements to index if paused and a full rebuild has been requested (much lower memory usage during reasoning)

Under The Hood

  • Reasoning on a large dataset has more consistent performance (used to slow down / crash with memory used by the search indexer)
  • Faux property resolution rewritten to greatly reduce work being repeated in the presence of multiple instances of the same property
  • RDFService has additional methods
    • CONSTRUCT that takes a Model to write into
    • SELECT that takes a ResultSetConsumer - implemented by the user - which processes each QuerySolution as it is retrieved from the ResultSet
    • Reduce latency and memory overhead of reading into a Jena model; serialising; and then re-reading into a Jena model in the calling method.
      (NB: Responsible for 20 seconds of improvement to Map of Science / Temporal Graph)
  • Replace certain uses of RDFServiceDatasetGraph with RDFService (repeated calls to find() in RDFServiceDatasetGraph responsible for some overhead)
  • RDFServiceSDB always constructs queries against the graph, and not the union model (simple optional queries much faster against the graph than the dataset)

Bug Fixes

  • Pause counting on the search indexer to prevent it become accidentally unpaused during long running processes (e.g. reasoning)
  • VIVO-1059 Improved parameter binding in SparqlQueryDataGetter
  • VIVO-1075 Correct use of Jena Nodes to access typed data (MarkLogic)
  • VIVO-1046 vCard authors do not display if lacking first name
  • VIVO-1047 vCard middle names displayed before first names
  • VIVO-1038 vCard grant contributor behaves as publication author

 

Additional Changes

  • TinyMCE filters out Word formatting on paste
  • TinyMCE version updated
  • Add seven US provinces to us_states.rdf
  • DOI property displays as a link

...

  •