...
Bit Integrity Check Task Processor
Bit integrity check tasks operate on a single content item at a time. It will download the content item, calculate the checksum on the downloaded file, and then compare that value to the storage provider's checksum as well as those stored for the item in the audit log and content-index. The results, pass or fail are recorded in the BitLog database. See the table below for various error conditions, how they might have come about, and how they are resolved.
# | Content | Storage | Content Index | Audit Log | Outcome | How did it happen? |
---|---|---|---|---|---|---|
1 | N/A | add bit log item: success | all went as planned | |||
2 | add bit log item: failure add item to ResolutionTask queue (may be internally resolvable if secondary store available) | The content went sour | ||||
3 | add bit log item: failure add item to ResolutionTask queue (externally resolvable: contact storage provider) | The storage provider's checksum process failed | ||||
4 | N/A | If last retry, wait 5 minutes before trying again If last retry, then generate bit error. | The storage provider's checksum process failed, the audit log is backed up, or an audit task was dropped. | |||
5 | or null | add bit log item: failure update content index in place if the audit log properties are null, use storage provider properties to patch audit log item and content index. | The content index was corrupted because an update failed or the checksum itself was corrupted in the process of update | |||
6 | or N/A | add bit log item: failure add item to ResolutionTask queue (internally resolvable: audit log out of sync) | The audit log item was corrupted because an insert failed or the checksum itself was corrupted in the process of insertion into Dynamo | |||
7 | or N/A | null | add bit log item: failure Add item to audit queue | The audit index was corrupted because an insert failed silently under the AWS covers or the item was manually deleted. | ||
8 | 404 | 404 | If penultimate retry, wait 5 minutes before putting back on queue. If last retry, then generate bit error.
| The item was removed in the Storage Provider, but not captured by DuraCloud (yet) | ||
9 | null | null | If penultimate retry, wait 5 minutes before putting back on queue. Otherwise log error and add to the audit queue. | The item was added in the Storage Provider, but not captured by DuraCloud (yet) | ||
10 | If penultimate retry, wait 5 minutes before putting back on queue. Otherwise log error and add to the audit queue. | The item was updated in the Storage Provider, but not captured by DuraCloud (yet) | ||||
11 | 404 | 404 | null | null | Do nothing. | Bit integrity processing is behind fully processed deletes. |
Worker Manager
The Worker Manager, a.k.a. Workman, is the heart of the system. Or perhaps more aptly, the digestive system. Workman is responsible for managing a pool of worker threads that are in turn responsible for processing different kinds of tasks. Multiple instances of workman may run at the same time, be they on the same and/or separate machines. In fact, the scalability of the system depends on the ability to scale up worker nodes to process queue items in parallel. The workman process attempts to read the high priority queue first, reading up to 10 tasks at a time and distributes them to a pool of workers. If not high priority tasks are available, it attempts to read the low priority queue. Should that queue be empty as well, the DSTP back off exponentially before retrying again, waiting initially for 1 minute and will never wait longer than 8 minutes. Once a task worker thread receives a task to process, it will monitor the progress and make sure that the visibility timeout is extended as necessary to prevent the item from reappearing on the queue. Once the task has been processed, the task worker than deletes the item from the queue. In the case that the task could not be successfully processed, it will be placed at the back of the queue for reprocessing. If it has failed three times, the task will be placed on the Dead Letter Queue for human review.
...