Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

This page describes an effort to measure key aspects of DSpace 7's performance compared to previous versions.

The goal is to establish a feedback loop while DSpace 7 is under development, in order to identify areas worth investigation/improvement.

Areas of interest

For a variety of commonly-accessed pages in a pre-populated DSpace instance, the following will be measured:

  • Server memory use
  • Server CPU use
  • Page load time
  • Browser memory use

Where it makes sense, measurements will be taken while simulated load is put on the server.

Test Repository

A test repository has been generated with the following size and structure:

  • 288 collections:
    • 4 top-level communities
    • 8 subcommunities in each top-level community
    • 9 collections in each subcommunity
  • 69120 items with 705024 bitstreams:
    • 36864 "Tiny" items distributed evenly among 72 collections, each with 1 bitstream:
      • 1 single-byte bitstream in the OTHER bundle
    • 18432 "Small" items distributed evenly among 72 collections, each with 4 bitstreams:
      • 1 single-byte bitstream in the OTHER bundle
      • 1 single-page text PDF in the ORIGINAL bundle
      • 1 derived txt bitstream in the TEXT bundle
      • 1 derived jpg bitstream in the THUMBNAIL bundle
    • 9216 "Medium" items distributed evenly among 72 collections, each with 13 bitstreams:
      • 10 single-byte bitstreams in the OTHER bundle
      • 1 ten-page text PDF in the ORIGINAL bundle
      • 1 derived txt bitstream in the TEXT bundle
      • 1 derived jpg bitstream in the THUMBNAIL bundle
    • 4608 "Big" items distributed evenly among 72 collections, each with 103 bitstreams:
      • 100 single-byte bitstreams in the OTHER bundle
      • 1 hundred-page text PDF in the ORIGINAL bundle
      • 1 derived txt bitstream in the TEXT bundle
      • 1 derived jpg bitstream in the THUMBNAIL bundle

To download and install this repository into a fresh install of DSpace 5, 6, or 7:

  • Visit https://drive.google.com/open?id=1MK3drQsa3KtZCObRtrKBXnE4oMo_C2Ub
  • Download the test-x.sql.gz file corresponding to your major version of DSpace and restore into postgres via:
    • gunzip test-7.sql.gz
    • dropdb dspace
    • createdb dspace
    • psql dspace 'create extension pgcrypto'
    • psql dspace < test-7.sql
  • Download the assetstore, containing the PDFs, text, and other bitstreams, then install via:
    • mv assetstore.tar.gz /dspace/
    • cd /dspace
    • tar -xvf assetstore.tar.gz
    • rm assetstore.tar.gz
  • Once the database and assetstore are installed, you can start DSpace and create the discovery index:
    • /dspace/bin/dspace index-discovery
    • The search index will consume about 5G when built 
  • Note: The database has a built-in admin user: dspace@test, with password test.

To generate your own test data, here's a script that can be used to create test content:

Environment

(TBD)

Methodology

Measurements for each of the following were taken as described below.

Server memory use

  • java: jmap -histo:live (total heap)

  • node: pm2 (mem column)
  • system: free -m (used column, minus buffers/cache)

Server cpu use

  • java and node: ps -o %cpu -p
  • system: uptime (1-minute cpu load average)

Page load time

  • (Undecided)

Browser memory use

  • Chrome Devtools' Memory profiler

Test Results

(TBD - goal is to test periodically prior to final 7.0 release)

  • No labels

3 Comments

  1. Some additional suggestions which may help identify potential performance risks:

    • The use-case of HTML bitstreams is not included here. This is typically an item with 100s of files (sometimes even 1000s of files) in the ORIGINAL bundle, and 100s of files in the THUMBNAIL, TEXT bundle, where one html file is marked as the primary bitstream
    • Impact on the memory usage, load and duration of some scripts, e.g. a forced discovery re-index, a forced filter-media re-index, … can be verified as well
    • Differentiating page load time for a first request vs navigating through the pages while the UI was already loaded
    • Another use-case for large items which frequently occurs are items with large amounts of metadata. Some articles have 1000s of authors, which can cause issues in the submission and on item views

    For the page load time, it is also important to identify which pages to test. Some useful pages can be:

    • The general home page
    • The community list
    • Community/collection pages
    • Item pages
    • Search/Browse pages
    • File downloads
    • Statistics
    • The submission process
    • The workflow process
    • Edit EPerson/Groups
    • Edit community/collection/item
  2. Thanks for all this work, it is great to see performance test officially introduced! Some notes from the DSpace 7 WG meeting 18.07.2019

    • we should enlarge the dataset to always include more than 20 instances of an object (collections in a community, etc.) this to be sure that the pagination capabilities of DSpace 7 are properly highlighted
    • we should record some crawler session (server side rendering) to be sure that we can test what happen when crawler overlap with real-users sessions
    • I'm worried about the browser load / performance, it will be nice to see if it is possible to monitor what happen on the browser during a normal or long navigation session (this because angular load lot of data in local cache that can make the browser unresponsive at some point)
    • Please link the result of critical rest call (everything slow than 2sec! in the JIRA ticket  Unable to locate Jira server for this macro. It may be due to Application Link configuration.  (the projection mechanism and good default should be the solution for most of the issues)
  3. As discussed in the meeting of 2019-07-25, we should try to track the memory usage of ngx-translate. The main developer of the ngx-translate library has said the following (source):

    [ngx-translate] uses bindings, which means that you can change the translations at any time. The downside is that bindings take memory, so the Angular way is more performant. But if you use OnPush for your components you will probably never notice the difference

    However we do use OnPush for most components, and I haven't got a clue how to track only the memory overhead added by ngx-translate bindings and not others.