This documentation space is deprecated. Please make all updates to DuraCloud documentation on the live DuraCloud documentation space.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Introduction

The Retrieval Tool is a utility which is used to transfer (or "retrieve") digital content from DuraCloud to your local file system.

Download

Download the retrieval tool from the Downloads page.

How the Retrieval Tool Works

  • When the Retrieval Tool starts up, it connects to DuraCloud using the connection parameters you provide and gets a list of content items in the spaces you indicate. It will then proceed to download the files from those spaces, each into a local directory named for the space, which is placed within the content directory.
  • For each content item, the Retrieval Tool checks to see if there is already a local file with the same name. If so, the checksums of the two files are compared to determine if the local file is the same as the file in DuraCloud. If they match, nothing is done, and the Retrieval Tool moves on to the next file. If they do not match, the file from DuraCloud is retrieved.
  • By default, when a local file exists and differs from the DuraCloud copy, the local file is renamed prior to the DuraCloud file being retrieved. If you would prefer that the local file simply be overwritten, you will need to include the overwrite command-line flag when starting the Retrieval Tool.
  • As each content file is downloaded, a checksum comparison is made to ensure that the downloaded file matches the file in DuraCloud. If the checksums do not match, the file is downloaded again. This re-download will occur up to 5 times. If the checksums still do not match after the fifth attempt, a failure is indicated in the output file.
  • As each file download completes, a new line is added to the retrieval tool output file in the work directory, indicating whether the download was successful or not. Files which did not change are not included in the output file.
  • As the Retrieval Tool runs, it will print its status approximately every 10 minutes to indicate how many files have been checked and downloaded.
  • Once all files are retrieved, the Retrieval Tool will print its final status to the command line and exit.
  • As files are updated in DuraCloud, you can re-run the Retrieval Tool using the same content directory, and only the files which have been added or updated since the last run of the tool will be downloaded.

Operational notes

  • Content Directory - the directory to which files will be downloaded. A new directory within the content directory will be created for each space.
  • Work Directory - the work directory contains both logs, which give granular information about the process, and output files. A new output file is createdc for each run of the Retrieval Tool, and it stores a listing of the files which were downloaded.

Prerequisites

  • You must have Java version 6 or above installed on your local system. If Java is not installed, you will need to download and install it. To determine if the correct version of Java is installed, open a terminal or command prompt and enter
    java -version
    The version displayed should be 1.6.0 or above. If running this command generates an error, Java is likely not installed.
  • You must have downloaded the Retrieval Tool. It is available as a link near the top of this page.

Using the Retrieval Tool

  • To run the Retrieval Tool, open a terminal or command prompt and navigate to the directory where the Retrieval Tool jar file is located
  • To display the help for the Retrieval Tool, run
    java -jar retrievaltool-{version}-driver.jar
  • When running the Retrieval Tool, you will need to use these options:

    Short Option

    Long Option

    Argument Expected

    Required

    Description

    Default Value (if optional)

    -h

    --host

    Yes

    Yes

    The host address of the DuraCloud DuraStore application

     

    -r

    --port

    Yes

    No

    The port of the DuraCloud DuraStore application

    443

    -u

    --username

    Yes

    Yes

    The username necessary to perform writes to DuraStore

     

    -p

    --password

    Yes

    Yes

    The password necessary to perform writes to DuraStore

     

    -i

    --store-id

    Yes

    No

    The Store ID for the DuraCloud storage provider

    The default store is used

    -s

    --spaces

    Yes

    No

    The space or spaces from which content will be retrieved. Either this option or -a must be included

     

    -a

    --all-spaces

    No

    No

    Indicates that all spaces should be retrieved; if this option is included the -s option is ignored

    Not set

    -c

    --content-dir

    Yes

    Yes

    Retrieved content is stored in this local directory

     

    -w

    --work-dir

    Yes

    Yes

    Logs and output files will be stored in the work directory

     

    -o

    --overwrite

    No

    No

    Indicates that existing local files which differ from files in DuraCloud under the same path and name sould be overwritten rather than copied

    Not set

    -t

    --threads

    Yes

    No

    The number of threads in the pool used to manage file transfers

    3

  • An example for running the Retrieval Tool
    java -jar retrievaltool-{version}-driver.jar -c content -h test.duracloud.org -u myname -p mypassword -w work -s space1 space1 -o
  • No labels