Introduction

The Sync Tool is a utility which was created in order to provide a simple way to move files from a local file system to DuraCloud and subsequently keep the files in DuraCloud synchronized with those on the local system.

Download

Download the Sync Tool from the Downloads page.

Getting Started

The Sync Tool can be installed using one of the installers on the downloads page linked above. Once installed, the Sync Tool will default to running in GUI mode. To run in command line mode, open a terminal window (or command prompt) and navigate to the Sync Tool installation directory. Once there, execute the Sync Tool JAR file using: "java -jar duracloudsync.jar --help". This will print the usage information for the tool.

How the Sync Tool Works

Operational notes

Prerequisites

As of DuraCloud version 2.2.0, the Sync Tool requires Java 7 to run. As of version 3.3.0, DuraCloud is primarily tested using Java 8, and this is the recommended Java version for building and running DuraCloud tools. The latest version of Java can be downloaded from here.

Using the Sync Tool

Runtime commands

Running the Sync Tool in a server shell environment

As noted above, the Sync Tool can be run in one of two modes, one which allows it to run continually, and the other which allows it to exit once it completes transferring all current files. The mode you choose will determine the way in which you deploy the Sync Tool on a server. The following examples assume the use of the bash shell.

To start the Sync Tool in continually running mode, you would use a command like this:

nohup java -jar duracloudsync-{version}.jar {parameters} > ~/synctool-output.log 2>&1 &
In this case, the & at the end of the command instructs the command to run in the background, and the "nohup" at the beginning tells the command to continue running even when the terminal being used is closed or when you disconnect from the server machine. The output of the Sync Tool would be placed in a file called "synctool-output.txt" in the user's home directory.
In order for the Sync Tool to be run on startup when the server machine boots, additional settings will need to be added which depend on the operating system being used. In Ubuntu, for example, an Upstart script would be used for this purpose.
Running the Sync Tool in exit on completion mode works best when the tool is run on a scheduled basis. A popular choice for handling this type of task is the cron utility. To run daily using cron a script should be placed in /etc/cron.daily. The script would look something like:

 

#!/bin/bash
  
java -jar duracloudsync-{version}.jar -x [parameters] >> ~/synctool-output.log 2>&1

The -x parameter is included here to ensure the Sync Tool exists after completing its run.

Large Datasets and Out of Memory Errors

When using the synctool to transmit datasets with a large number of files (ie  greater than one million items)  users occassionally run into out of memory errors.   Users with sufficient memory resources on their machines can usually remedy this problem by increasing the maximum heap space available to the JVM.  We recommend starting with a max heap space of at least  1.5 GB when working with sets of with approximately one million files.   If you're still running into issues,  try increasing by 500MB until the problem ceases to manifest.  To increase the heap space use the -Xmx java option.  Click for more information on setting the heap space.  

#!/bin/bash
 
#for 1GB 
java -Xmx1024m  -jar duracloudsync-{version}.jar [parameters]
#or 
java -Xmx1g  -jar ...
#for 2GB 
java -Xmx2048m  ...
#or 
java -Xmx2g  ...