Old Release

This documentation relates to an old version of VIVO, version 1.9.x. Looking for another version? See all documentation.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Types of performance

Performance can mean different things to different sites including the length of time it takes to render a large page (e.g., a person with 800 - 1500 publications), to display a visualization, to load new data, to regenerate the search index or recompute inferences, or to generate an export of RDF data.

What kind of performance is normal?  How do I know if I have a problem?

This section gives some very rough guidelines for determining whether your VIVO is performing similarly to established installations.  The numbers below assume that VIVO is otherwise idle; that is, not loaded with concurrent public page requests or performing multiple background operations at the same time.

Individual page display

The time it takes to render an individual page can vary significantly depending on the types of data involved.  The page for a person with many publication citations will take longer to render than one with simple links to other individuals.  As a very general rule, your VIVO should be able to handle around 100 data items (properties) per second when displaying an individual page.  Thus, if the page for a person with 500 publication links displays in five seconds, there may be relatively little room for performance tweaking short of caching the entire page.  If the page takes 50 seconds to appear, there is very likely a serious performance bottleneck somewhere in the installation that needs to be addressed.

RDF loading

Loading RDF through VIVO is slower than inserting it directly into the triple store because VIVO performs additional operations such as inference and search index maintenance as the data are changed.  You should still expect to see at least several hundred triple insertions per second.

Inference recomputation and search index rebuilding

These operations are important for VIVO installations that modify data directly in the triple store instead of adding or removing RDF through VIVO.  With VIVO 1.8, you should expect inference recomputation to average about 20-25 milliseconds per individual.  (You can find your values in vivo.all.log.)  Search index rebuilding is typically faster, on the order of 10ms per individual.

Tuning for improved performance 

Memory

Ensure that that Java JVM for your VIVO has been allocated sufficient memory (heap space).  This is a critical element of the installation process, as the default Java heap setting will cause VIVO to run extremely slowly.  A production VIVO installation should typically be allocated several gigabytes of heap space. 

Additionally, ensure that your server has enough memory to support the heap space you have allocated.  Otherwise, data may be swapped to disk, which can seriously degrade performance.  On a server that runs only VIVO, the available memory should be about double the Java heap space.

Server connections

A production VIVO installation often involves an Apache web server, the Tomcat servlet container, and a MySQL database server.  The numbers of available connections between each of these servers should be set to prevent unncessary bottlenecks.  Thus, the number of database connections should slightly exceed the number of possible concurrent Tomcat threads, which should in turn exceed the number of simultaneous Apache connections.

MySQL configuration

Data display in VIVO often depends on complex SPARQL queries that, when using the default SDB triple store, are translated into similarly complex SQL queries.  Tuning the MySQL database server can significantly increase performance.  There are a number of tools available for assisting with this process, such as mysqltuner.pl (https://github.com/rackerhacker/MySQLTuner-perl).  There are also a few typical parameters that often require adjustement.

In-memory temporary tables

The nature of the SQL queries generated by the triple store often requires the generation of temporary tables.  Ideally these temporary tables will remain in memory;  if they exceed the threshold where MySQL writes them to disk, this can result in serious slowdowns.  Depending on the amount of data in your VIVO and your server’s available memory, you may need to increase the size limit for in-memory temporary tables.

Consult the MySQL documentation for the parameters

tmp_table_size

max_heap_table_size


Key buffer size 

If your VIVO database uses MySQL’s traditional MyISAM storage engine, consult the documentation for the key_buffer_size parameter.  Increasing this value can yield significant performance benefit.

 

InnoDB buffer pool size

If your VIVO database uses MySQL’s newer InnoDB storage engine, consult the documentation for the innodb_buffer_pool_size parameter.  Setting this value as large as possible given available memory will improve performance.

 

Transaction logging 

Changing MySQL’s transaction logging settings can lead to dramatic improvements to the speed at which triples are added to or removed from the database.  For more details, see „Writing the MySQL transaction log” here: https://wiki.duraspace.org/display/VIVO/MySQL+configuration,+tuning,+and+troubleshooting

Additional discussion

Work in progress at https://docs.google.com/a/symplectic.co.uk/document/d/1ylp9HEzJiBsBP6vx1vd-Irf8o3Ff-5vDhytOVTI5_Ho/edit#heading=h.vdtjwwvnjdn7

  • No labels