2015-08-17 - Indiana - Amherst F4 Storage

Dial In Details

Date: Monday August 17, 11am EDT (-4 UTC)

URL: http://connect.iu.edu/dam2/
Phone: 812-856-7060
Passcode: 227815#

Attendees

Items for Discussion

Use Case for Indiana University Libraries:
1. Asynchronous Storage
2. HPSS-based tape storage for digital preservation (Scholarly Data Archive, SDA)
3. Fedora 4 Federation / Projection across SDA
4. HPSS ModeShape Connector for SDA
5. Items loaded directly into SDA then available/ingested into Fedora 4
Other issues:
1. Fixity Checks and other preservation actions
  1. Fixity checking in Fedora 4 vs in SDA
  2. hooks into fixity checking and hooks into SDA
2. Efficiency of single file requests vs. batch requests
3. Batch processing:
1. Use Glacier? DPN? other sources?
Software Development for Asynchronous Storage

Minutes

Indiana Goals for this call
- Define parameters of development and parts of Fedora we’ll need to modify to implement asynchronous storage
- Preliminary scoping of a focused development effort around asynchronous storage that IU would contribute to
  - Would be good to scope this as a community effort - broad participation
Amherst: needs re: storage in F4
- Putting together F4 with a number of different storage backends
  - Projected storage over local network filesystems
    - Already supported
  - Need large segments of files stored in Amazon S3
    - Need to segment portions of different repos to different S3 buckets
      - Will make it possible to charge back for storage use
    - Can already connect to an S3 bucket in ModeShape/Infinispan
    - Primarily a synchronous interaction model (ingest/retrieval)
    - Files would all be binaries. When pushed into Fedora it would be replicated out to S3
    - There is a limit for single uploads in S3 - large files would need to go in as a series of single operations and then recomposed into a single object
  - Cold-storage backup
    - Assemble an object together into a bundle and push into Amazon Glacier as an individual archive
    - This would happen asynchronously and Fedora would not be able to retrieve the object directly
    - Could retrieve and process through an external processing chain
- Policy-driven storage use case - a resource coming into Fedora provides a hint that determines which back-end storage system it is sent to
  - For GET requests, a property on the resource or something similar would provide the hint (rather than the client, which may not know where the resource is stored)
Indiana use case
- Based on HPSS tape storage (Scholarly Data Archive - SDA)
- Large media digitization and preservation initiative - many large files into SDA
- Need federation/projection to work across SDA
- Need to build HPSS ModeShape connector
- Access via F4 - either ingested or projected
- Some resources (containers with metadata, access binaries) would be in F4 and binaries would be stored in HPSS
Other issues
- Fixity checking
  - Already doing fixity checking within SDA - how will this work with F4 fixity?
- HPSS offers greater efficiency with batch requests vs. single file
  - Might want to indicate desire to GET a batch of files and then execute the GET as a batch
Other technologies
- Glacier, DPN
- Ideally the connector would be adaptable to a variety of technologies
Need to use Servlet 3.1 specification to support asynchronous interactions
Implementation
- Multiple implementations may draw out additional community support
- Intent to GET, followed by GET
  - Could be done with API extension architecture
  - Possible to find out when the file is available and alert Fedora - Fedora is only involved when the file is available
  - A property on the container indicates that the binary is only available asynchronously
  - May not need a connector, though we would need to account for access control
- Indiana favours a design where the client is not aware of the serving destination of the file - this would be handled by middleware
- Best way forward: API extension architecture?
  - Bring this use case to the next meeting
- Reach out to the community to see who else has a similar use case

Page tree

2015-08-17 - Indiana - Amherst F4 Storage

Dial In Details

Attendees

Items for Discussion

Minutes