Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: cleanup of docs, remove references to SRB

...

Bitstreams also have an 38-digit internal ID, different from the primary key ID of the bitstream table row. This is not visible or used outside of the bitstream storage manager. It is used to determine the exact location (relative to the relevant store directory) that the bitstream is stored in traditional or SRB storage. The first three pairs of digits are the directory path that the bitstream is stored under. The bitstream is stored in a file with the internal ID as the filename.

...

  • Using a randomly-generated 38-digit number means that the 'number space' is less cluttered than simply using the primary keys, which are allocated sequentially and are thus close together. This means that the bitstreams in the store are distributed around the directory structure, improving access efficiency.
  • The internal ID is used as the filename partly to avoid requiring an extra lookup of the filename of the bitstream, and partly because bitstreams may be received from a variety of operating systems. The original name of a bitstream may be an illegal UNIX filename.
  • When storing a bitstream, the BitstreamStorageManager DOES BitstreamStorageService DOES set the following fields in the corresponding database table row:
    • bitstream_id
    • size
    • checksum
    • checksum_algorithm
    • internal_id
    • deleted
    • store_number

...

  1. A database connection is created, separately from the currently active connection in the current DSpace context.
  2. An unique internal identifier (separate from the database primary key) is generated.
  3. The bitstream DB table row is created using this new connection, with the deleted column set to true.
  4. The new connection is _commit_ted, so the 'deleted' bitstream row is written to the database
  5. The bitstream itself is stored in a file in the configured 'asset store directory', with a directory path and filename derived from the internal ID
  6. The deleted flag in the bitstream row is set to false. This will occur (or not) as part of the current DSpace Context.

This means that should anything go wrong before, during or after the bitstream storage, only one of the following can be true:

  • No bitstream table row was created, and no file was stored
  • A bitstream table row with deleted=true was created, no file was stored
  • A bitstream table row with deleted=true was created, and a file was stored
    None of these affect the integrity of the data in the database or bitstream store.

...

The above techniques mean that the bitstream storage manager is transaction-safe. Over time, the bitstream database table and file store may contain a number of 'deleted' bitstreams. The cleanup method of BitstreamStorageManager goes BitstreamStorageService goes through these deleted rows, and actually deletes them along with any corresponding files left in the storage. It only removes 'deleted' bitstreams that are more than one hour old, just in case cleanup is happening in the middle of a storage operation.

...

Configuring the Bitstream Store

BitStores (aka assetstores) are configured with [dspace]/config/spring/api/bitstore.xml

Configuring Traditional Storage

By default, DSpace uses a traditional filesystem bitstore of [dspace]/assetstore/

To configure normal traditional filesystem bitstore, as a specific directory, configure the bitstore like this:

...

This would configure store number 0 named localStore, which is a DSBitStore (filesystem), at the filesystem path of ${dspace.dir}/assetstore (i.e. [dspace]/assetstore/)

 

To It is also possible to use multiple local filesystems. Key0 In the below example, key #0 is localStore at ${dspace.dir}/assetstore, and key1 key #1 is localStore2 at /data/assetstore2. Note that incoming is set to store "1", which in this case refers to localStore2. That means that any new files (bitstreams) uploaded to DSpace will be stored in localStore2, but some existing bitstreams may still exist in localStore.

Code Blockcode
<bean name="org.dspace.storage.bitstore.BitstreamStorageService" class="org.dspace.storage.bitstore.BitstreamStorageServiceImpl">
    <property name="incoming" value="1"/>
    <property name="stores">
        <map>
            <entry key="0" value-ref="localStore"/>
            <entry key="1" value-ref="localStore2"/>
        </map>
    </property>
</bean>
<bean name="localStore" class="org.dspace.storage.bitstore.DSBitStoreService" scope="singleton">
    <property name="baseDir" value="${dspace.dir}/assetstore"/>
</bean>
<bean name="localStore2" class="org.dspace.storage.bitstore.DSBitStoreService" scope="singleton">
    <property name="baseDir" value="/data/assetstore2"/>
</bean>

...

Configuring Amazon S3 Storage

To use AWS Amazon S3 as a bitstore, add a bitstore entry s3Store, using S3BitStoreService, and configure it with awsAccessKey, awsSecretKey, and bucketName. You NOTE: Before you can specify these settings, you obviously will have to create an account on in the Amazon AWS console, and create an IAM user with credentials , and privilege privileges to a an existing S3 bucket.

Code Block
<bean name="org.dspace.storage.bitstore.BitstreamStorageService" class="org.dspace.storage.bitstore.BitstreamStorageServiceImpl">
    <property name="incoming" value="1"/>
    <property name="stores">
        <map>
            <entry key="0" value-ref="localStore"/>
            <entry key="1" value-ref="s3Store"/>
        </map>
    </property>
</bean>
<bean name="localStore" class="org.dspace.storage.bitstore.DSBitStoreService" scope="singleton">
    <property name="baseDir" value="${dspace.dir}/assetstore"/>
</bean>
<bean name="s3Store" class="org.dspace.storage.bitstore.S3BitStoreService" scope="singleton">
    <!-- AWS Security credentials, with policies for specified bucket -->
    <property name="awsAccessKey" value=""/>
    <property name="awsSecretKey" value=""/>
    <!-- S3 bucket name to store assets in. example: longsight-dspace-auk -->
    <property name="bucketName" value=""/>
    <!-- AWS S3 Region to use: {us-east-1, us-west-1, eu-west-1, eu-central-1, ap-southeast-1, ... } -->
    <!-- Optional, sdk default is us-east-1 -->
    <property name="awsRegionName" value=""/>
    <!-- Subfolder to organize assets within the bucket, in case this bucket is shared  -->
    <!-- Optional, default is root level of bucket -->
    <property name="subfolder" value=""/>
</bean>

...

The incoming property specifies which assetstore receives incoming assets (i.e. when new files are uploaded, they will be stored in the "incoming" assetstore). This defaults to store 0.

Note: SRB Storage Resource Broker bitstore support was removed with DSpace 6.

Configuring S3 Storage

S3BitStore has parameters for awsAccessKey, awsSecretKey, bucketName, awsRegionName (optional), and subfolder (optional).AccessKey

  • awsAccessKey and

...

  • awsSecretKey are created from

...

...

  • console. You'll want to create

...

  • an IAM user, and generate a Security Credential, which provides you the accessKey and secret. Since you need permission to use S3, you could give this IAM user a quick & dirty policy of AmazonS3FullAccess (for all S3 buckets that you own), or for finer grain controls, you can assign an IAM user to have certain permissions to certain resources, such as read/write to a specific subfolder within a specific s3 bucket.
  • bucketName is a globally unique name that distinguishes your S3 bucket. It has to be unique among all other S3 users in the world.
  • awsRegionName is a region in AWS where S3 will be stored. Default is US Eastern. Consider distance to primary users, and pricing when choosing the region.
  • subfolder is a folder within the S3 bucket, where you could organize the assets to be in. If you wanted to re-use a bucket for multiple purposes (bucketname/assets vs bucketname/backups) or DSpace instances (bucketname/XYZDSpace or bucketname/ABCDSpace or bucketname/ABCDSpaceProduction).

Migrate BitStores

There is a command line migration tool to move all the assets within a bitstore, to another bitstore. bin/dspace bitstore-migrate

Code Block
/[dspace]/bin/dspace bitstore-migrate
usage: BitstoreMigrate
 -a,--source <arg>        Source assetstore store_number (to lose content). This is a number such as 0 or 1
 -b,--destination <arg>   Destination assetstore store_number (to gain content). This is a number such as 0 or 1.
 -d,--delete              Delete file from losing assetstore. (Default: Keep bitstream in old assetstore)
 -h,--help                Help
 -p,--print               Print out current assetstore information
 -s,--size <arg>          Batch commit size. (Default: 1, commit after each file transfer)

/[dspace]/bin/dspace bitstore-migrate -p
store[0] == DSBitStore, which has 2 bitstreams.
store[1] == S3BitStore, which has 2 bitstreams.
Incoming assetstore is store[1]

/[dspace]/bin/dspace bitstore-migrate -a 0 -b 1


/[dspace]/bin/dspace bitstore-migrate -p
store[0] == DSBitStore, which has 0 bitstreams.
store[1] == S3BitStore, which has 4 bitstreams.
Incoming assetstore is store[1]