Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1.  Queue Names refer to the AWS SQS queue names defined in your account.  You must create and configure the following queues as defined in the queue section by replacing the brackets ([]) with names.

    Code Block
    #########
    # Queues
    #########
    queue.name.audit=[]
    queue.name.bit-integrity=[]
    queue.name.dup-high-priority=[]
    queue.name.dup-low-priority=[]
    queue.name.bit-error=[]
    queue.name.bit-report=[]
    queue.name.storagestats=[]
    queue.name.dead-letter=[]
  2.  For a given instance of workman  you must specify which queues will be consumed and the order in which they will be read.   In other words,  a given instance of workman can focus on specific kinds of tasks.  It can also decide which tasks have a higher priority.  In this way,  instances of workman can be configured to work on hardware configurations that are suitable to the load and kinds of tasks they will need to bear.  Make sure you use the above defined keys rather than the queue names themselves.

    Code Block
    ## A comma-separated prioritized list of task queue keys (ie do not use the 
    ## concrete aws queue names - use  queue.name.* keys) where the first is highest priority.
    ## The first items in the list have highest priority; the last the lowest.
    queue.task.ordered=[]
     
  3. As we mentioned before,  max-workers sets the number of task processing threads that can run simultaneously.

    Code Block
    # The max number of worker threads that can run at a time. The default value is 5. 
    max-workers=[]
  4. The duplication policy manager writes policies to an S3 bucket.  Both the loopingduptaskproducer and workman use those policies for making decisions about duplication. 

    Code Block
    # The last portion of the name of the S3 bucket where duplication policies can be found.
    duplication-policy.bucket-suffix=duplication-policy-repo
    # The frequency in milliseconds between refreshes of duplication policies.
    duplication-policy.refresh-frequency=[]
  5. You can also set the workdir which defines where temp data will be written as well as notification.recipients.

    Code Block
    # Directory that will be used to temporarily store files as they are being processed.
    workdir=[]
    # A comma-separated list of email addresses
    notification.recipients=[]
    
    
  6. Once these settings are in place you can run workman by simply invoking the following java command: 

    Code Block
     java -Dlog.level=INFO -jar workman-{mill version here}.jar -c /path/to/mill-config.properties

...

Code Block
#############################
# LOOPING BIT TASK PRODUCER
#############################
# The frequency for a complete run through all store policies. Specify in hours (e.g. 3h), days (e.g. 3d), or months (e.g. 3m). Default is 1m - i.e. one month
looping.bit.frequency=[]
# Indicates how large the task queue should be allowed to grow before the Looping Task Producer quits.
looping.bit.max-task-queue-size=[]
# A file containing inclusions. Expressions will be matched against the following path: /{account}/{storeId}/{spaceId} and should have the same format. You can use an asterix (*) to indicate all.
# For example,  to indicate all spaces name "space-a"  in the "test" account across all providers you would add a line like this: 
#    /test/*/space-a
# You may also comment lines by using a hash (#) at the beginning of the line.
looping.bit.inclusion-list-file=[]
# A file containing exclusions as regular expressions using the same format as specified for inclusions.
looping.bit.exclusion-list-file=[]

The program can be run using the following command:

Code Block
 java -Dlog.level=INFO -jar loopingbittaskproducer-{mill version here}.jar -c /path/to/mill-config.properties

Configuring and Running Storage Statistics

Storage statistics looping task producer runs works similarly to the duplication runs.  looping-storagestats-taskproducer  has similar settings as those mentioned above as well as two others, looping.storagestats.inclusion-list-file and looping.storagestats.exclusion-list-file. These two config files let you be more surgical in what you decide to include and exclude from your storage stats run and functions similarly to the duplication policies. The important thing to note here is that if there are no entries in the inclusion list then all accounts, stores, and spaces are included. 

Code Block
#############################
# LOOPING STORAGE STATS TASK PRODUCER
#############################
# The frequency for a complete run through all store policies. Specify in hours (e.g. 3h), days (e.g. 3d), or months (e.g. 3m). Default is 1m - i.e. one month
looping.storagestats.frequency=1d
# Indicates how large the task queue should be allowed to grow before the Looping Task Producer quits.
looping.storagestats.max-task-queue-size=[]
# A file containing inclusions. Expressions will be matched against the following path: /{account}/{storeId}/{spaceId} and should have the same format. You can use an asterix (*) to indicate all.
# For example,  to indicate all spaces name "space-a"  in the "test" account across all providers you would add a line like this: 
#    /test/*/space-a
# You may also comment lines by using a hash (#) at the beginning of the line.
looping.storagestats.inclusion-list-file=[]
# A file containing exclusions as regular expressions using the same format as specified for inclusions.
looping.storagestats.exclusion-list-file=[]

The program can be run using the following command:

Code Block
 java -Dlog.level=INFO -jar looping-storagestats-taskproducer-{mill version here}.jar -c /path/to/mill-config.properties

...