Page History

A harness to evaluate performance for Fedora Futures platform candidates.

https://github.com/futures/ff-jmeter-madness

Test

...

Process

Set 1: Digital Corpora govdocs

Set 2: OpenPlanets

Set 3: Random binary data created from a stable set of filesizes

The govdocs dataset includes (…), (…some characteristics, e.g. N PDF documents, varying in size from X to Y)

The OpenPlanets dataset …

(Description of fixture processing, generation of bagits)

The generated binary data set

The set is created by the script https://github.com/futures/ff-fixtures/blob/master/create_random_files.sh which writes the files to objects/random and creates the necessary manifest-md5.txt file used by the JMeter Tests at https://github.com/futures/ff-jmeter-madness

It works by using some standard GNU commands including dd, rm and md5sum and iterates over a list of integer filesizes in https://github.com/futures/ff-fixtures/blob/master/random_sizes.data in order to create one file per iteration of the given size in megabytes. This to a certain extend ensures the comparability of the measurements, since exactly the same number of files with the same number of bytes is created each time the data set is generated with the same input file.

In order to create the binary test data set checkout the project https://github.com/futures/ff-jmeter-madness first:

git clone https://github.com/futures/ff-jmeter-madness

then init and update the submodules:

git submodule init && git submodule update

this will checkout the submodule fixtures containing the script create_random_data.sh.

Switch to the fixtures subdirectory:

cd fixtures

and run the script:

./create_random_data.sh

this will create the directory objects/random and, using dd, create the random binaries as objects/random/random_N.data.

Additionally a file manifest-md5.txt is generated which is employed by the JMeter tests to find the random binaries for uploading them via HTTP requests

Now you can fire up JMeter and open the JMX file containing the test plan.

Generating random file sizes with a gaussian distribution

When you want to create a different input file for the file sizes this can be done in Octave/Matlab:

octave:1> x = round((stdnormal_rnd(1,100) * 128) + 256);

with 1 being the standard deviation, 100 is the number of samples, 128 is the scaling factor of the sample size and finally 256 (twice the scaling factor) pushes the whole function to the right so that 1 >= x >= 512

In order to create a larger set of file sizes with a larger median file size you could use stdnormal_rnd(1,500) * 256) + 512. This will create 500 file size entries between 1 and 1024.

Then save the generated vector into a file

octave:2> save "file_sizes.txt" x;

The data file needs some postprocessing (remove comments, insert line breaks). This can be achieved with some standard GNU tools:

sed '/^\#/d'file_sizes.txt | tr " " "\n" | sed '/^$/d' > tmp.txt

mv tmp.txt file_sizes.txt

Image Removed
Fig. 1: Histogram of the current data set

Ingest Test

1. For each "bag", create an "object" (i.e. whatever the equivalent is for the platform candidate)

2. For each resource described in the bag's manifest, add a "datastream" (again, whatever the equivalent is for the platform candidate).

Update Test

1. Delete each test "object"

2. Create an object

3. Retrieve the object

4. Add "datastreams"

5. Modify the datastreams a specified number of times

6. Read the datastreams

Test Software

Apache Jmeter version 2.9 installed on futures1 (see Test Platform for details)

The jmeter test script fedora.jmx implements the above process for a fedora rest API.

Test Data

The test data is generated at run time to produce a random binary data created from a stable set of file sizes, as explained in Test corpora - The generated binary dataset

The generation of files has been included in the jmeter test plan and does not need to be generated separately.

Test Platform

Test Results

The Jmeter test produces a csv file (one for each repeat) containing the following columns

(Jmeter reference: http://jmeter.apache.org/usermanual/glossary.html, http://jmeter.apache.org/usermanual/listeners.html#csvlogformat)

To view a log file generated by the Jmeter tests, have a look at the log files from the fedora's tests.

Column name	Description
timeStamp	time of request (in milliseconds since 1/1/1970 )
elapsed	time measured from just before sending the request to just after the last response has been received (in milliseconds). It does not include time taken to render the response (as on a browser).
label	the name given to the CRUD operation (in milliseconds)
responseCode	The HTTP response code received for the request
responseMessage	The HTTP response message received for the request
threadName	This is derived from the Thread Group name and the thread within the group. The name has the format groupName + " " + groupIndex + "-" + threadIndex groupName - name of the Thread Group element groupIndex - number of the Thread Group in the Test Plan, starting from 1 threadIndex - number of the thread within the Thread Group, starting from 1
dataType	the type of data received
success	a binary value indicating the success or failure of the request
bytes	the response size
Latency	The time from just before sending the request to just after the first response has been received, including all the processing time needed to assemble the request as well as assembling the first part of the response (in milliseconds).

Analyzing the test results

The stats visualizations were done using R (http://www.r-project.org/).

For notes on installing R - follow the links for Ubuntu, Mac OS X, Windows

The R code uses a few libraries - ggplot2 , gridExtra , tcltk, RColorBrewer, plyr
- Most of these libraries should be included in the base package, if not installing a library is very easy
  
  install.packages("gridExtra", repos="http://R-Forge.R-project.org")

The code used to produce the graphs is in fedora-jmx.r.

To execute the code,

$ R
> source('/path/to/the/file/fedora-jmx.r')
The program will ask you to choose the directory which contains the test results (csv format)
It will run through all the files, gather the data and produce 3 graphs and a summary of the data. These will be saved in your current working directory. (See Fedora's test results for an example of the plots generated)

The plots are used to measure the robustness of the software and the time it takes to respond to requests. This is tested for increasing loads to the system.

Page tree

Versions Compared

Old Version 23

New Version Current

Key