Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you are not yet familiar with the DuraCloud Mill please refer to the DuraCloud Architecture document, which describes the purpose and primary components of the Mill.

Requirements

  1. Java: Version 8 required
    1. The Oracle JDK is recommended for building DuraCloud, as this is the JDK used for DuraCloud testing.
  2. Maven 3.x
  3. Git
  4. MySQL database
  5. AWS account and credentials

Download, Build, Configure

  1. To start,  clone and build the latest release of the mill

    Code Block
    git clone https://github.com/duracloud/mill.git
    cd mill
    git checkout release-2.21.0
    mvn clean install
  2. Create database
    1. Create the empty mill database
    2. Add database credentials
    3. Create the schema using <mill project root>/resources/mill.schema.sql 
    4. Run schema updates in ascending order:  <mill project root>/resources/schema-update-*.sql
  3. Create a configuration file.
    1. Now that you've built the mill and created the database, we need to set up a configuration file that can be used my the various components of the system. A template of this configuration file can be found in the base line at mill/resources/mill-config-sample.properties
      1. Copy and rename the file to mill-config.properties
      2. Configure the database connections to the mill database as well as the management console database:

        Code Block
        ###################
        # Mill Database
        ###################
        # Config for mill database.
        mill.db.host=[fill in]
        mill.db.port=[fill in]
        mill.db.name=[fill in]
        # User must have read/write permission
        mill.db.user=[fill in]
        mill.db.pass=[fill in]
        
        ###################
        # Account Management Database
        ###################
        # Config for the management console database - used to retrieve accounts and storage provider credentials
        db.host=[fill in]
        db.port=[fill in]
        db.name=[fill in]
        # User must have read permission
        db.user=[fill in]
        db.pass=[fill in]

...

Workman is responsible for reading tasks off from a set of queues, delegating them to task processors, and removing them once they have reached a completed state.  In the case of failures,  tasks are retried three times before they are sent to the dead letter queue.   A single instance of Workman can run multiple tasks in parallel.  How many tasks depends on the max-workers setting in the mill-config.properties file.  It is also safe to run multiple instances of workman a single machine as well as multiple.  We recommend running a single instance of workman on each machine instance, setting the max-workers setting in accordance with the available resources.   

...

Once you have an instance of workman running you can perform an explicit duplication run.  The spaces that have been configured with duplication policies (see the Mill Overview for details) will generate duplication events when the audit tasks associated with them are processed.  If you add a new duplication policy to a new space that already has content items,  you'll need to perform a duplication run to ensure that those new items get duplicated. The loopingduptaskproducer fulfills this function. Based on the set of duplication policies, it will generate duplication tasks for all matching spaces.  It will keep track of which accounts, spaces and items have been processed in a given run so it does not need to run in daemon mode. It will run until it has reached the max number of allowable items on the queue and then it will exit. The next time it is run, it will pick up where it left off. You may want to dial down the max queue size in the event that you have so many items and so little computing power to process them with that you may exceed the maximum life of an SQS message (which happens to be 14 days).  It should also be noted here that items are added roughly one thousand at a time for each space in a round-robin fashion to ensure that all spaces are processed in a timely way.   This strategy ensures that small spaces that are flanked by large spaces are processed quickly.   It is also important that only one instance of loopingduptaskproducer is running at any moment in time.     Two settings to be concerned with when it comes to the looping dup task producer: 

...

Code Block
 java -Dlog.level=INFO -jar manifest-cleaner-{mill version here}.jar -c /path/to/mill-config.properties

Content by Label
showLabelsfalse
max5
spacesDURACLOUDDEV
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel = "mill" and type = "page" and space = "DURACLOUDDEV"
labelsmill

 
Page properties
hiddentrue
Related issues