Federated filesystem
Files in a single directory
Using a single directory to hold an increasing number of files, listing the files in the directory scales roughly linearly up to around 3,000 files, then degrades sharply. Using paging to list only 1,000 files in a batch makes the degradation a slightly less severe.
16.7 million files in a 3-level hierarchy
16.7 million files were generated on a federated filesystem using a three-level hierarchy (256 top-level nodes, 256 second-level nodes in each, 256 third-level nodes in each, and one 10KB datastream in each), taking 26.78 hours. Performance retrieving files and listing the second-level nodes did not degrade with larger numbers of objects. However, listing the top-level of the repository ("toplist" in the chart below) degraded roughly linearly as more objects were added, and became increasing erratic.
16.7 million files in a 4-level hierarchy
16.7 million files were generated on a federated filesystem using a four-level hierarchy (64 objects per level, with each leaf object having a single 10KB datastream), taking 18.19 hours. Performance retrieving files and listing the third-level nodes was essentially flat at less than 0.1 second. Time to list the top-level of the repository ("toplist" in the chart below) increased linearly, but was still extremely fast even with a fully-populated hierarchy of 16.7 million objects/files – even the spike at the 24th batch was less than 1 second.