Title (goal)Support ingest (and export?) of large files into the repository over non-HTTP mechanisms.
Primary ActorRepository contributor
Scope 
LevelUser-goal
Story

A repository contributor has a significant number of large (100GB?) files in a data center. Ingesting them into the repository over HTTP would be very slow (and, from a browser, sometimes impossible?). The repository administrator is not fluent in XACML and can't be bothered to write appropriate policies for handling file:// URIs. The repository could provide a way to spin up a temporary FTP(/samba share/??) for the user to upload (or download?) files, while still enforcing policies.

For export, the accessor wants to put the file onto remote storage (via FTP, NFS share, etc) for data analysis. The file is big enough that download and then upload would take prohibitively long. Instead, the accessor would like to have Fedora place the file directly into the remote storage.

 

4 Comments

  1. What we do in Oxford is pass a pointer to the file at some location and then some retrieval credentials. This is then fetched asynchronously by a message driven worker task and the object updated.

    "Large" use cases are hard drives images from personal archives (BEAM http://www.bodleian.ox.ac.uk/beam/collections) as well as research data.

  2. Neil, what are the sizes of the files that you are ingesting (the hard drive images) - hard drives are getting pretty darn large. I'm interested in knowing how big the file sizes are that you are having Fedora manage.... Or if anyone else has some large file sizes they can share info with me about (via this thread).

  3. If I understand correctly Niel from side conversation, you are not actually ingesting into Fedora either. Rats. Does anyone know if Fedora Futures is requiring pointers to large files ("externally-managed datastreams"), and what the size threshold might be?

     

  4. Chris Beer this use case should be supported as of Fedora 4.0-Alpha-4. Can you verify that this is the case?