Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents

Excerpt

Large files can be uploaded via the REST API, or projected into the repository using filesystem federation.  Transfer times for uploading to the repository via the REST API are about the same as copying using NFS, and moderately faster than using SCP.  Uploading via the REST API to a federated filesystem is significantly slower and requires a large temp directory capacity.

...

Based on the tests below, we believe arbitrarily-large files can be ingested and downloaded via the REST API (tested up to 1TB).  The only apparent limitations are disk space available to store the files, and a sufficiently large Java heap size.

Note

...

To enable fast access to large files, it is necessary to set "contentBasedSha1" : "false".  Otherwise the repository will run a SHA1 on the content for identification that could take

...

hours when reaching into the range of > 50Gb. 

...

For more on this benchmarking see:

...

Design

...

- LargeFiles.

REST API Upload/Download Roundtrip

  • Platform: Linux 3.12.1-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux 16GB RAM
  • Repository Profile: Single-File
  • Workflow Profile: Upload/Download Roundtrip
File SizeUploadDownload
256GB15,488,156ms (16.9MB/sec)3,306,756ms (79.3MB/sec)

REST API Upload/Download Roundtrip

File SizeUploadDownload
256GB15,488,156ms (16.9MB3h51m34s (18.87MB/sec)3,306,756ms (79.3MB43m09s (101.25MB/sec)
512GB 31,262,610ms (16.77MB7h49m43s (18.60MB/sec)5,386,542ms 1h29m15s (97.33MB90MB/sec)
1TB59,631,142ms (17.58MB15h41m21s (18.57MB/sec)15,120,135ms (69.35MB3h21m44s (86.63MB/sec)

Serving Large Files via Filesystem Federation

Based on the tests below, we believe arbitrarily-large files can be projected into the repository via filesystem federation and downloaded via the REST API (tested up to 1TB).  The only apparent limitations are disk space available to store the files, and a sufficiently large Java heap size.

Filesystem Federation Download Tests

  • Platform: Linux 3.12.1-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux  16GB RAM

    Repository Profile: Single-File with an additional external Resource:

    "externalSources" : {
    "home-directory" : {
        "classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
        "directoryPath" : "/tmp/projection",
        "projections" : [ "default:/projection => /" ],
        "readOnly" : true,
        "addMimeTypeMixin" : true
        }
    }
File SizeProjection Directory Request DurationFirst Projected Node Request Duration

Download Duration

Throughput
2 GB0m35.117s0m34.572s0m8.236s248.66 mb/sec

Filesystem Federation Download Tests

Code Block
"externalSources" : {
    "filesystem" : {
        "classname" : "org.modeshapefcrepo.connector.filesystemfile.FileSystemConnectFedoraFileSystemConnector",
        "directoryPath" : "/mnt/isilon/fedora-dev/federated",
        "projections" : [ "default:/projection => /" ],
        "readOnlyreadonly" : true,
        "addMimeTypeMixin" : true,
        "contentBasedSha1" : "false"
    }
}
File Size

Download

256 GB1h09m26s (62.92MB/sec)
512 GB2h00m15s (72.67MB/sec)
1 TB3h57m25s (73.61MB/sec)
ObjectsDatastream
Size

Projection Directory
Request Duration

Projected Node
Request Duration

Download
Duration

Download
Throughput
11 GB417 ms35 ms17,333 ms59.08 MB/sec
12 GB528 ms219 ms26,902 ms76.13 MB/sec
14 GB432 ms54 ms47,581 ms86.08 MB/sec
18 GB583 ms90 ms90,705 ms90.31 MB/sec
116 GB691 ms452 ms176,508 ms92.82 MB/sec
132 GB445 ms34 ms348,488 ms94.03 MB/sec
164 GB750 ms460 ms699,937 ms93.63 MB/sec
1128 GB800 ms90 ms1,412,640 ms92.79 MB/sec
1256 GB530 ms70 ms2,768,570 ms94.69 MB/sec
1512 GB490 ms80 ms5,893,420 ms88.96 MB/sec
11 TB420 ms40 ms11,322,330 ms92.61 MB/sec

 

Direct Comparison of Different Transfer Methods

Based on the tests below, we believe arbitrarily-large files can be uploaded and downloaded via the REST API, using either repository storage or a federated filesystem (tested up to 1TB).  The only apparent limitations are disk space available to store the files, temp directory capacity, and a sufficiently large Java heap size.

 

Comparison of Upload and Download Times for Different Transfer Methods

...

Retrieving a byte range is supported and has been tested with 1TB files for both repository storage and federated filesystem.  There is an integration test in the standard test suite for verifying that range retrieval works.  By default, this test uses a small datastream binary size to avoid slowing down the test suite, but the size is configurable so it is easy for a developer to test files as large as local disk space allows.

...