You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This page provides an overview of the DuraCloud service provided by DuraSpace.  You can also review full documentation for DuraCloud, watch video overviews, or sign up to try it yourself by creating a trial account. Pricing information is readily available, and you can use our form to request additional information or a custom quote.

 

What is DuraCloud?

DuraCloud is an open source, hosted service from DuraSpace that makes it easy to control where and how your organization preserves content in the cloud. DuraCloud enables your institution to store content with expert cloud storage providers while adding lightweight features that enable digital preservation, data access, and data sharing.

DuraCloud is designed to meet the needs of cultural heritage institutions, with features including:

  • Storage and replication of content across multiple providers via a desktop tool, REST API, command line, or through a web-accessible interface
  • Bit integrity health reports on all content at least twice per year
  • Multiple content transfer interfaces, with options for both novice and technical staff
  • Predictable annual billing and the option to add additional storage at any time
  • Flexibility to combine private storage, public access, and dark archive options 
  • Integrations to support repository backup, archival file storage, and website archiving
  • Optional administrative access controls

Storage Providers

DuraCloud provides one-click preservation backup to one or more expert cloud storage providers. The following storage provider combinations are available:

  • Amazon S3 only (Preservation and Enterprise subscriptions) 
  • Amazon S3 with automatic backup to Amazon Glacier (Preservation Plus and Enterprise Plus subscriptions) 
  • Chronopolis TRAC-certified, geographically distributed dark archive storage (DuraCloud Enterprise Chronopolis subscription)

Selecting a Subscription 

To determine which DuraCloud subscription is right for your organization, you'll need to decide the following:

  1. How many TB of storage will you require for your content? DuraCloud storage is provided in TB increments. If you need more storage than your initial subscription, additional TB can be added at any time and the cost will be prorated for the number of months left in the subscription year. DuraCloud has special pricing available for 20+ TB subscriptions.
  2. Do you need a dark archive solution? The Duracloud Enterprise Chronopolis subscription provides geographically distributed preservation storage in a TRAC-certified dark archive system, and is the most affordable option for content that needs to be preserved but not accessed for a period of time. This is a great option for preservation masters or for materials that will not be described or made accessible for several years.
  3. Do you want to store your content with one provider (Amazon S3), or to also back up your content to Amazon Glacier? This decision may depend on your institution's preservation policies. If you are keeping two copies in locally managed systems and DuraCloud is tertiary storage, then Amazon S3 may be sufficient. However, if you keep only one local copy, we recommend a subscription that includes a backup to another provider.  Backing up to Glacier also allows DuraCloud to replace any items which are found to be missing or corrupt in Amazon S3.
  4. Will you need access to administrative features? Enterprise, Enterprise Plus, and DuraCloud Enterprise Chronopolis subscriptions include administrative features. Subscriptions of 20+ TB have an optional add-on for administrative features. Administrators can create and delete spaces, create groups of users, and implement access controls at the space level. With these controls you can provide a variety of individuals, departments, research groups, etc. access to a single DuraCloud account. 

Please see the DuraCloud pricing page for more details about the subscriptions listed below:

Preservation: 5 or fewer TB of storage, one provider, no administrative controls

Preservation Plus: 5 or fewer TB of storage, backup to a second provider, no administrative controls

Enterprise: Up to 19 TB of storage, one provider, includes administrative controls

Enterprise Plus: Up to 19 TB of storage, backup to a second provider, includes administrative controls

Duracloud Enterprise Chronopolis: Dark archive solution with three geographically-distributed copies, includes administrative controls and up to 19 TB of storage

Subscriptions of 20+ TB

Subscriptions of 20+ TB have special pricing that pass along storage savings. These subscriptions consist of a per-TB fee for storage, a per-TB fee for processing, a support fee, and optional add-ons for administrative features and additional support hours. See our 20+ TB pricing sheet (PDF) for details and contact us for a custom quote.

Content Organization, Access, and Metadata in DuraCloud

How Content is Organized

Within DuraCloud, content is organized into containers called spaces. Each institutional account can include up to 100 spaces, with the option to increase this number if the need should arise. Access controls are at the space level, so Enterprise-level account administrators can create user groups and control who has read or write access to a given space. Content is transferred from the local system to a specific DuraCloud space using one of the transfer methods described below. You can learn more by signing up for a trial account.

When a directory of files (which can include sub-folders) is uploaded to DuraCloud, the original file structure is maintained in the name of the item. This means that when a directory is later retrieved from cloud storage, the original structure (i.e. folders) is replicated. This also makes it easier to locate specific items based in the file name, as DuraCloud does not currently allow for searching within a space. A search feature is frequently requested and is a high priority for future development.

Access Options

Spaces within DuraCloud are private by default, with files only accessible to authenticated and authorized users. An institution can choose to make a space public. Every item in a public space has a URL. This URL can be used to grant access to the item through a repository, CMS, or website.

Data Structure & Metadata 

There are no requirements on how your content must be structured for ingest into DuraCloud. DuraCloud is capable of storing any type of file or package (i.e., AIP, ZIP, TAR, etc.). 

DuraCloud does not require any specific metadata schema. Through the DuraCloud web interface or REST API, you can add as many different name/value pairs of metadata as you need, on a content item or DuraCloud space basis. You can also tag your content stored in DuraCloud in the same way.

Metadata Captured During Transfer

As the Sync Tool transfers files to DuraCloud, it will attempt to capture certain types of metadata about each file, and include that information as part of the content item added to DuraCloud. See this list for a full description of the metadata that is captured automatically. You have the option to add, update, or delete the properties of each file after it has been transferred to DuraCloud.

Content Transfer to & Retrieval from Cloud Storage

Web Interface

Users can interact with a browser-based graphical user interface to view and manage content in DuraCloud. The web interface offers access to all storage system capabilities, including space and content creation, updates, and deletion. It provides access to graphical depictions of the information contained in the storage reports and allows for bulk deletion of spaces and content items and for user account administration. The interface also allows an authorized user to initiate a snapshot to send to Chronopolis for storage as part of a DuraCloud Chronopolis subscription. Administrators can use the interface to designate read and/or write access to a given space for a given user or group of users.

DuraCloud SyncTool

The SyncTool provides a simple way to move files from a local file system to DuraCloud. The Sync Tool provides a web-browser-based application user interface which begins with a configuration wizard, then provides a dashboard display showing the current status of the sync process. This interface is the default and is started by selecting any of the shortcuts created by the installer. The user can select directories and sub-directories for the SyncTool to either upload from once in a single pass, or can set the tool to watch and sync the directories to the space automatically. 

Please review the Sync Tool minimum requirements.

Command Line

command line interface for the SyncTool is also available. It can be executed directly, used in scripts, or used for scheduling sync activities (e.g. within a Cron Job.)  The command line interface provides access to all features of the SyncTool, some of which are not currently available in the graphical interface.

Chunking Files

The DuraCloud SyncTool will "chunk" files as they are sent to DuraCloud. What this means is that if a file is over a pre-defined size limit (by default this is 1GB, but can be set up to 5GB in the tool configuration settings), that file is transferred in segments. A checksum for each segment is generated and captured in a manifest for the file which also includes the checksum for the entire file. When the entire file has been transferred to DuraCloud, you will see the list of chunks as well as the manifest file in storage.

Chunking and Stitching in Detail

To chunk a file the SyncTool reads bytes in the source file and writes them into a temp file on the local file system until reaching the defined chunk size limit (by default, 1GB.) At that point the temp file becomes the first chunk, so the SyncTool will compute the checksum and transfer the file to storage. The process then continues to read bytes from the source file into a new temp file, compute its checksum and transfer it to storage, and the process repeats until reaching the end of the source file. Along the way, as each chunk is created the SyncTool will write the details for that chunk into a file DuraCloud calls the "chunks manifest"; this includes the name of each chunk file (which is the original file name plus a numbered suffix) and its checksum. When all chunks have been transferred the chunk manifest file is finalized and is transferred into storage.

Stitching files is essentially the reverse of chunking. The Retrieval Tool will first pull and read the chunk manifest file to determine all the chunk files that are needed to construct the original file. It will then pull the first chunk and write it to disk, then the second file, which is appended to the first, then the third, and so on until each of the chunks have been appended to the end of the file. As each chunk is pulled the system will check its checksum to verify it was downloaded correctly before appending it. Given that it is working at the byte level, this is simply constructing the same stream of bytes as the original file. A final checksum comparison with what is recorded in the chunks manifest is used to verify that the completed file is consistent with the original checksum.

REST API

Authorized users can also choose to interact with a DuraCloud account through the REST API. The API offers the same functionality as the graphical user interface. Complete REST API documentation is provided.

Chronopolis Transfer 

If you decide to use Chronopolis as your provider, there is one additional step. When you have loaded all of your content, you will select the button to create a "snapshot" of that content in Chronopolis. This starts the process of moving those files form DuraCloud to Chronopolis. Part of this transfer process is to "stitch" those files back together as they land in Chronopolis storage. The stitching process combines the chunks back in to the original file. After this is complete, the manifest file is used to verify that the reconstructed file has the same checksum as the original. Once all files in the snapshot have been replicated across all of the Chronopolis nodes, the content in DuraCloud is removed. 

Retrieval Tool

Regardless of which provider you choose, if you need to get your content back, you will use the Retrieval Tool to download it from DuraCloud. This tool ensures that the content you download from DuraCloud looks the same as what you sent to DuraCloud. That includes recreating the original directory structure, stitching all chunked files, and even re-setting the timestamps on those files to the original values (where possible.) If retrieving content from Chronopolis dark archives storage, there is a per-TB fee for retrieval. 

Health Checks and Reports

Bit integrity checks of all content are conducted twice yearly. For each content item stored in Amazon S3, the file is retrieved from storage and a checksum is calculated. This checksum is compared to both the checksum stored by S3 and the checksum maintained in the DuraCloud space manifest. Files from Amazon Glacier are not retrieved, but the checksum provided by Glacier storage is compared with the DuraCloud space manifest. The primary reason for this is cost; pulling content out of Glacier adds a significant cost overhead, which reduces its promise of being a low cost storage option. Because of this, we do not offer Glacier as a primary storage option in DuraCloud, it must be paired with S3 as primary storage.

DuraCloud provides an audit log (full list of events) and manifest for each content space listing all items and checksums. Each DuraCloud space provides a list of all content included. The DuraCloud Sync Tool provides history logs of all files transferred or updated. Additional reporting needs can be accommodated through custom development.

For content deposited in Chronopolis, an authorized user can retrieve a list of all files and checksums in both md5 and sha256 format. 

File Replacement

If any of the bit integrity checks fail, the file is added to a failure report which is sent to DuraCloud operations staff. These staff members will re-check each failed file, and if checksums still do not match, will perform a restore action from Glacier. If the file restored from Glacier correctly matches the expected checksum, the file in S3 is replaced. If the file retrieved from Glacier also fails to match the expected checksum, we notify the customer of the discrepancy.

A subscription with a second provider, such as Glacier, allows DuraCloud to replace any items which are found to be missing or corrupt in Amazon S3. If Amazon S3 is the only provider, DuraSpace staff will notify the customer if a file fails integrity checks. 

For content deposited in Chronopolis, each node in the Chronopolis network performs regularly scheduled integrity checks of all content. In the event that a file fails this test, the auditing system flags the file for review. Reviews are performed manually by Chronopolis staff in order to ascertain the cause of the audit failure. Once the cause is determined, a repair request is made to another node, which transmits a valid copy of the file for replacement at the requesting node.

Video and Audio Streaming

DuraCloud spaces can be configured to enable streaming of the content stored in the space. When enabled, files can be streamed using the RTMP format, which requires a flash-based player to view. Supported file formats include MP3 and MP4, among others. Streaming can be used in either open or secure modes. Secure streaming requires an authenticated request to DuraCloud to retrieve a signed URL before the stream can be delivered. More details about media streaming in DuraCloud can be found here.

Download Costs

The fees for DuraCloud integrate the cost of bandwidth and requests, allowing for downloads up to the amount of the storage subscription. What this means is that if your DuraCloud subscription is for 5 TB of content, you are able to download (retrieve) 5 TB of content each year for no additional cost. For the vast majority of DuraCloud customers, this is sufficient and there are no additional charges for download.  If there is a need to download content in excess of your storage allotment, please discuss this with DuraSpace staff.

Integrations

Archive-It

Archive-It  partner organizations can automatically perform an offsite backup to DuraCloud, allowing independent preservation and direct access to all web archive collections captured by Archive-It.

Archivematica

DuraCloud integrates with Archivematica, and complete documentation is available. If you are interested in combining DuraCloud with a hosted Archivematica instance, you may be interested in the ArchivesDirect service.

DSpace Replication Task Suite

The DSpace Replication Task Suite is a set (suite) of tasks to assist in performing replication of DSpace content to other locations (including DuraCloud). Currently, DSpace content is packaged in containers known as archival information packages (AIPs). The DSpace Replication Task Suite was released as an optional "add-on" to DSpace 1.8 and is available in all following DSpace releases.

Fedora CloudSync

Fedora CloudSync is a web-based utility for backing up and restoring Fedora 3 content in DuraCloud. It supports on-demand and scheduled backups of any content in a Fedora 3 repository, including externally-managed datastreams. The project is functional but no longer receiving on-going support. Work on an integration between DuraCloud and Fedora 4 is currently in the planning stages.

Troubleshooting

If you encounter an error when running the SyncTool or using other content transfer tools, please first consult the list of error messages and suggested fixes below. If the error you are experiencing is not included in the list below, please visit the support system and submit a ticket with a detailed description of the issue you are experiencing (and include screenshots when available).

Out of Memory Error or Java Heap Space Error

Please review this page for information about addressing memory errors.

SyncTool interface display problems

Restarting your machine is often all that is needed to address issues with the display of the SyncTool interface. If a restart does not address the problem please contact support.

SyncTool "file does not exist" message for a watched directory

This message will appear if one of the folders in your watched directory has been renamed. You can delete the old watched directory and add the directory with the new name.

Large files failing to upload via the SyncTool

Increasing the chunk size in the SyncTool configuration will often address this issue. Please review the SyncTool operational notes for how to optimize chunk size for larger files.

Technical Questions

DuraCloud is an open source project and complete user and developer documentation is publicly available.

Technical Details

DuraCloud Security

DuraCloud provides multiple levels of security, including an instance firewall, encrypted transmissions, application authentication, and storage provider access control.

The instance firewall provides protection to each DuraCloud instance by blocking all access except via the standard HTTP and HTTPS ports. Data transmission to and from DuraCloud is via HTTPS encrypted requests and responses that can only be read by the intended recipient. The DuraCloud application requires users accessing their DuraCloud instance via either the web or the REST API interfaces to authenticate with credentials.

Users of a DuraCloud Enterprise or Enterprise Plus instance may have various roles with associated permission levels. Users with the Administrator role have the ability to define space access controls, which defines the users and group that may read or write content in a space. Access to the underlying storage providers used by a DuraCloud instance is restricted to only DuraCloud applications. This ensures that all actions involving content must occur through DuraCloud.

DuraCloud uses the leading cloud infrastructure vendor, Amazon Web Services (AWS), for managing systems and storage. AWS has deep compliance credentials and state-of-the-art security practices. DuraCloud leverages the capabilities of AWS to deploy load balanced and auto-scaled infrastructure components which are spread across data centers to reduce localized risk exposure. Amazon S3 storage boasts 99.999999999% durability and 99.99% availability of objects over a given year.

DuraCloud is a service of DuraSpace, a respected institution in the open source repository community and the home institution of the DSpace, Fedora, and VIVO communities.

Chronopolis Security

Chronopolis is a geographically distributed network consisting of partners from higher education and/or research institutions. Chronopolis is one of six digital preservation repositories to have received certification as a trusted digital repository by the Center for Research Libraries. This certification was based on the Trustworthy Repositories: Audit and Certification: Criteria and Checklist (TRAC). TRAC was officially approved as an International Standards Organization (ISO) standard in 2012 and the ISO 16363 standard is widely known as the standard by which to evaluate trustworthy digital repositories. While there are currently no auditing bodies approved to conduct ISO 16363 conformance evaluations, the Chronopolis team adheres to best practices as outlined in the standard and plans to undergo certification once auditors are available.

With DuraCloud Enterprise Chronopolis there is no concern about proprietary or vendor lock-in issues for your content.

Confidential Data

DuraCloud is one low-level component of an overall preservation strategy. It does not address fine-grained policy and access control considerations. DuraCloud is not audited for compliance with state or federal laws such as HIPAA or FERPA. Ensuring compliance with legal and institutional policies concerning data use and Personally-Identifying Information (PII) is the responsibility of the user/account holder. DuraCloud does provide basic authentication, space-level access controls, and the option to limit login access to a specific IP or IP range.

Encrypted Data

DuraCloud does not encrypt data. Customers may choose to encrypt files before storing them in DuraCloud, however, it is the responsibility of the customer to maintain any encryption keys.

Legal Compliance

Content access and copyright for content stored in DuraCloud is controlled and managed by the user/account holder.


  • No labels