This is document is currently being drafted. Items on this page are expected to be in the release, but this is not guaranteed. |
What is this document?
The VIVO 1.8.1 release contains no new features - only a fix for certain bugs and performance issues. Some minor, non-breaking additions are present in the ontology.
Performance Improvements
Note that all testing has been performed on a Macbook Pro, with PCI-E SSD. However, no specific tuning has been applied to either the hardware or software. Real world performance will depend on hardware and software configuration - it is recommended that you have an SSD / high IO performance storage layer, and if using SDB/MySQL, enough memory allocated to read the tables indexes.
Page Rendering
AntiSamy is no longer used to filter fields before they are rendered. For a large profile in a test dataset, this was responsible for over two seconds of the execution time required to render a profile. A simple regular expression is used to filter out and JavaScript elements - this is 300x faster than using AntiSamy. |
- Most pages - even large profiles - render within approximately two seconds
- Large profile in a test dataset takes between 1.5 and 2 seconds to render (It has been reported that the same large profiles took 4.3 seconds in v1.7, and 5.3 seconds in v1.8)
- Large profile when logged in as root user takes 6.5 seconds to render (was reported to be 7.75 seconds in v1.7, and 14.7 seconds in v1.8)
- Large profile when logged in as site admin user takes 2.5 seconds to render
(Note: Site Admin by default does not display the "related by" faux property - this is responsible for the majority of the performance hit when logged in as root)
- Worst case profile tested - 1500 publications, high number of co-authors, between 5.5 and 7 seconds
- Manage publications / grants organisation pages performance improvements
- Manage people in organisations page now include the position label with each person entry so that you can disambiguate multiple person entries
Visualizations
All visualisations have been overhauled
- Map of Science and Temporal Graphs significantly faster
- Under 3 seconds for a 1,218,694 Quad dataset (previously 1 minute 20 seconds)
- Approx 2 minutes for a 24,647,681 Quad dataset. Contains 145,000 people, 155,000 publications and 14,000 journals
- Person level Map of Science return in under 2 seconds, using direct queries of the triple store
- Person level Map of Science will use the system-level cache once queries take longer than 2 seconds, if the system-cache has been populated
- Background refresh of Map of Science / Temporal Graph data - once populated, all requests are served from the cache whilst refreshes occur
- CoAuthor and CoInvestigator visualisations use short-lived caches to prevent multiple executions of the same query in rendering a single visualisation
- Minor tweak to CoAuthor query to improve performance
- Sparklines use some of the under the hood improvements
- New Added AltMetric embed code to display badges on the article pages - enable via the runtime.properties (see example.runtime.properties for details)
Memory Usage
- New data structures for Map of Science / Temporal Graphs use lightweight Java objects instead of Jena models (should use much less memory)
- Search Indexer does not queue statements to index if paused and a full rebuild has been requested (much lower memory usage during reasoning)
Under The Hood
- Reasoning on a large dataset has more consistent performance (used to slow down / crash with memory used by the search indexer)
- Faux property resolution rewritten to greatly reduce work being repeated in the presence of multiple instances of the same property
- RDFService has additional methods
- CONSTRUCT that takes a Model to write into
- SELECT that takes a ResultSetConsumer - implemented by the user - which processes each QuerySolution as it is retrieved from the ResultSet
- Reduce latency and memory overhead of reading into a Jena model; serialising; and then re-reading into a Jena model in the calling method.
(NB: Responsible for 20 seconds of improvement to Map of Science / Temporal Graph)
- Replace certain uses of RDFServiceDatasetGraph with RDFService (repeated calls to find() in RDFServiceDatasetGraph responsible for some overhead)
- RDFServiceSDB always constructs queries against the graph, and not the union model (simple optional queries much faster against the graph than the dataset)
Bug Fixes
- Pause counting on the search indexer to prevent it become accidentally unpaused during long running processes (e.g. reasoning)
- VIVO-1059 Improved parameter binding in SparqlQueryDataGetter
- VIVO-1075 Correct use of Jena Nodes to access typed data (MarkLogic)
- VIVO-1046 vCard authors do not display if lacking first name
- VIVO-1047 vCard middle names displayed before first names
- VIVO-1038 vCard grant contributor behaves as publication author
- VIVO-1081 Fix to display of training positions within an organisation entry
- VIVO-1114 Broken sparklines when more than 1000 publications, etc
Additional Changes
- TinyMCE filters out Word formatting on paste
- TinyMCE version updated
- Add seven US provinces to us_states.rdf
- DOI property displays as a link