Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added "Changes in 1.8" section, changed org.dspace.curate to org.dspace.ctask.general in the context of task packages

...

Table of Contents
minLevel2
outlinetrue
stylenone

Changes in 1.8

  • New package: The default curation task package is now org.dspace.ctask. The tasks supplied with DSpace releases are now under org.dspace.ctask.general
  • New tasks in DSpace release: Some additional curation tasks have been supplied with DSpace 1.8, including a link checker and a translator

Tasks

The goal of the curation system ('CS') is to provide a simple, extensible way to manage routine content operations on a repository. These operations are known to CS as 'tasks', and they can operate on any DSpaceObject (i.e. subclasses of DSpaceObject) - which means Communities, Collections, and Items - viz. core data model objects. Tasks may elect to work on only one type of DSpace object - typically an Item - and in this case they may simply ignore other data types (tasks have the ability to 'skip' objects for any reason). The DSpace core distribution will provide a number of useful tasks, but the system is designed to encourage local extension - tasks can be written for any purpose, and placed in any java package. This gives DSpace sites the ability to customize the behavior of their repository without having to alter - and therefore manage synchronization with - the DSpace source code. What sorts of activities are appropriate for tasks?

...

Code Block
plugin.named.org.dspace.curate.CurationTask = \
org.dspace.curate.ProfileFormats = profileformatsorg.dspace.ctask.general.NoOpCurationTask = noop, \org.dspace.ctask.general.ProfileFormats = profileformats, \
org.dspace.ctask.general.RequiredMetadata = requiredmetadata, \
org.dspace.ctask.general.ClamScan = vscan, \
org.dspace.curatectask.general.RequiredMetadataMicrosoftTranslator = requiredmetadatatranslate, \
org.dspace.curatectask.general.ClamScanMetadataValueLinkChecker = vscanchecklinks

For each activated task, a key-value pair is added. The key is the fully qualified class name and the value is the taskname used elsewhere to configure the use of the task, as will be seen below. Note that the curate.cfg configuration file, while in the config directory, is located under 'modules'. The intent is that tasks, as well as any configuration they require, will be optional 'add-ons' to the basic system configuration. Adding or removing tasks has no impact on dspace.cfg.

...

The CurationTask interface is almost a 'tagging' interface, and only requires a few very high-level methods be implemented. The most significant is:

Code Block
 int perform(DSpaceObject dso); 

The return value should be a code describing one of 4 conditions:

...

A simple tool 'CurationCli' provides access to CS via the command line. This tool bears the name 'curate' in the DSpace launcher. For example, to perform a virus check on collection '4':

Code Block
 [dspace]/bin/dspace curate -t vscan -i 123456789/4 

The complete list of arguments:

...

Because some tasks may consume a fair amount of time, it may not be desirable to run them in an interactive context. CS provides a simple API and means to defer task execution, by a queuing system. Thus, using the previous example:

Code Block
     Curator curator = new Curator();
     curator.addTask("vscan").queue(context, "monthly", "123456789/4");

would place a request on a named queue "monthly" to virus scan the collection. To read (and process) the queue, we could for example:

Code Block
 [dspace]/bin/dspace curate -q monthly 

use the command-line tool, but we could also read the queue programmatically. Any number of queues can be defined and used as needed.
In the administrative UI curation 'widget', there is the ability to both perform a task, but also place it on a queue for later processing.

...

This was mentioned above. This is returned to CS whenever a task is called. The complete list of values:

Code Block
      -3 NOTASK - CS could not find the requested task
      -2 UNSET  - task did not return a status code because it has not yet run
      -1 ERROR - task could not be performed
       0 SUCCESS - task performed successfully
       1 FAIL - task performed, but failed
       2 SKIP - task not performed due to object not being eligible

...

The task may define a string indicating details of the outcome. This result is displayed, in the 'curation widget' described above:

Code Block
       "Virus 12312 detected on Bitstream 4 of 1234567789/3"

...

The status code, and the result string are accessed (or set) by methods on the Curation object:

Code Block
     Curator curator = new Curator();
     curator.addTask("vscan").curate(coll);
     int status = curator.getStatus("vscan");
     String result - curator.getResult("vscan");

...

The task with the taskname 'formatprofiler' (in the admin UI it is labeled "Profile Bitstream Formats") examines all the bitstreams in an item and produces a table ("profile") which is assigned to the result string. It is activated by default, and is configured to display in the administrative UI. The result string has the layout:

Code Block
     10 (K) Portable Network Graphics
     5  (S) Plain Text

where the left column is the count of bitstreams of the named format and the letter in parentheses is an abbreviation of the repository-assigned support level for that format:

Code Block
    U  Unsupported
    K  Known
    S  Supported

...

Code Block
### Task Class implementations
plugin.named.org.dspace.curate.CurationTask = \
org.dspace.ctask.curategeneral.ProfileFormats = profileformats, \
org.dspace.curatectask.general.RequiredMetadata = requiredmetadata, \
org.dspace.curatectask.general.ClamScan = vscan
  • Optionally, add the vscan friendly name to the configuration to enable it in the administrative it in the administrative user interface.

...