Introduction

The Sync Tool is a utility which was created in order to provide a simple way to move files from a local file system to DuraCloud and subsequently keep the files in DuraCloud synchronized with those on the local system.

Download

Download the Sync Tool from the Downloads page.

Getting Started

The Sync Tool can be installed using one of the installers on the downloads page linked above. Once installed, the Sync Tool will default to running in GUI mode. To run in command line mode, open a terminal window (or command prompt) and navigate to the Sync Tool installation directory. Once there, execute the Sync Tool JAR file using: "java -jar duracloudsync.jar --help". This will print the usage information for the tool.

How the Sync Tool Works

Operational notes

Large Datasets and Out of Memory Errors

When using the SyncTool to transmit data sets with a large number of files (i.e. hundreds of thousands of files or more) users occasionally run into out of memory errors.  Users with sufficient memory resources on their machines can usually remedy this problem by increasing the maximum heap space available to the Java VM.  We recommend starting with a setting of at least 1 GB when working with sets over 100,000 files. If the problem persists, try increasing the memory value until the problem ceases to manifest. To increase the heap space use the -Xmx java option.  Click for more information on setting the heap space.

An alternative solution is to upload files in smaller sets. The prefix option can be used to ensure that files are added to DuraCloud with the preferred ID values.

To run the SyncTool in UI mode with 1 GB of heap memory space, download the Jar version of the SyncTool and execute the following on the command line:

java -Xmx1g -jar duracloudsync-{version}.jar

To run the SyncTool in command-line mode with 1 GB of heap memory space, download the Jar version of the SyncTool and execute the above command followed by the command line parameter values.

Prerequisites

As of DuraCloud version 2.2.0, the Sync Tool requires Java 7 to run. As of version 3.3.0, DuraCloud is primarily tested using Java 8, and this is the recommended Java version for building and running DuraCloud tools. The latest version of Java can be downloaded from here.

Using the Sync Tool

Runtime commands

Running the Sync Tool in a server shell environment

As noted above, the Sync Tool can be run in one of two modes, one which allows it to run continually, and the other which allows it to exit once it completes transferring all current files. The mode you choose will determine the way in which you deploy the Sync Tool on a server. The following examples assume the use of the bash shell.

To start the Sync Tool in continually running mode, you would use a command like this:

nohup java -jar duracloudsync-{version}.jar {parameters} > ~/synctool-output.log 2>&1 &
In this case, the & at the end of the command instructs the command to run in the background, and the "nohup" at the beginning tells the command to continue running even when the terminal being used is closed or when you disconnect from the server machine. The output of the Sync Tool would be placed in a file called "synctool-output.txt" in the user's home directory.
In order for the Sync Tool to be run on startup when the server machine boots, additional settings will need to be added which depend on the operating system being used. In Ubuntu, for example, an Upstart script would be used for this purpose.
Running the Sync Tool in exit on completion mode works best when the tool is run on a scheduled basis. A popular choice for handling this type of task is the cron utility. To run daily using cron a script should be placed in /etc/cron.daily. The script would look something like:

 

#!/bin/bash
  
java -jar duracloudsync-{version}.jar -x [parameters] >> ~/synctool-output.log 2>&1

The -x parameter is included here to ensure the Sync Tool exists after completing its run.