Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

Command used:

The directory and where the command is to be found.

Java class:

The actual java program doing the work.

Arguments:

The required/mandatory or optional arguments available to the user.

...

Info
titleDSpace Command Launcher

With DSpace Release 1.6,

...

the

...

many

...

commands

...

and

...

scripts

...

have

...

been

...

replaced

...

with

...

a

...

simple

...

[dspace

...

]/bin/dspace

...

<command>

...

command.

...

See

...

Application

...

Layer

...

chapter

...

for

...

the

...

details

...

of

...

the

...

DSpace

...

Command

...

Launcher

...

.

Table of Contents:

Table of Contents
minLevel2
outlinetrue
stylenone

Community and Collection Structure Importer

This CLI tool gives you the ability to import a community and collection structure directory from a source XML file.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="37bb9fca-3b15-4415-9c24-1ba81e4220fa"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace structure-builder

]]></ac:plain-text-body></ac:structured-macro>

Java class:

Java class:

org.dspace.administer.StructBuilder

Argument: short and long (if available) forms:

Description of the argument

-f

Source xml file.

-o

Output xml file.

-e

Email of DSpace Administrator.

...

Code Block
[dspace]/bin/dspace packager -e [user-email] -p [parent-handle] -t [packager-name] /full/path/to/package

Wiki MarkupWhere _\[user-email\]_ is the e-mail address of the E-Person under whose authority this runs; _\[parent-handle\]_ is the Handle of the Parent Object into which the package is ingested, _\[packager-name\]_ is the plugin name of the package ingester to use, and _/full/path/to/package_ is the path to the file to ingest (or _"-"_ to read from the standard input).

Here is an example that loads a PDF file with internal metadata as a package:

...

Note
titleNot All Packagers Support Bulk Ingest

Because the packager plugin must know how to locate all child packages from an initial package file, not all plugins can support bulk ingest. Currently, in DSpace the following Packager Plugins support bulk ingest capabilities:

Restoring/Replacing using Packages

...

Code Block
[dspace]/bin/dspace packager -d -e [user-email] -i [handle] -t [packager-name] [file-path]

Wiki MarkupWhere _\[user-email\]_ is the e-mail address of the E-Person under whose authority this runs; _\[handle\]_ is the Handle of the Object to disseminate; _\[packager-name\]_ is the plugin name of the package disseminator to use; and _\[file-path\]_ is the path to the file to create (or _"-"_ to write to the standard output). For example:

Code Block
 [dspace]/bin/dspace packager -d -t METS -e admin@myu.edu -i 4321/4567 4567.zip

...

As of DSpace 1.7, DSpace now can backup and restore all of its contents as a set of AIP Files. This includes all Communities, Collections, Items, Groups and People in the system.

...

Essentially, this means DSpace can export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (a METS-based, AIP format). This entire hierarchy can also be re-imported into DSpace in the same format (essentially a restore of that content in the same or different DSpace installation).

For more information, see the section on AIP backup & Restore for DSpace.

METS packages

Since DSpace 1.4 release, the software includes a package disseminator and matching ingester for the DSpace METS SIP (Submission Information Package) format. They were created to help end users prepare sets of digital resources and metadata for submission to the archive using well-defined standards such as METS, MODS, and PREMIS. The plugin name is METS by default, and it uses MODS for descriptive metadata.

...

Code Block
archive_directory/
    item_000/
        dublin_core.xml         -- qualified Dublin Core metadata for metadata fields belonging to the dc schema
        metadata_[prefix].xml   -- metadata in another schema, the prefix is the name of the schema as registered with the metadata registry
        contents                -- text file containing one line per filename
        file_1.doc              -- files to be added as bitstreams to the item
        file_2.pdf
    item_001/
        dublin_core.xml
        contents
        file_1.png
        ...

Wiki MarkupThe _dublin_core.xml_ or _metadata_\[prefix\].xml_file has the following format, where each metadata element has it's own entry within a _<dcvalue>_ tagset. There are currently three tag attributes available in the _<dcvalue>_ tagset:

  • <element> - the Dublin Core element
  • <qualifier> - the element's qualifier
  • <language> - (optional)ISO language code for element
    Code Block
    <dublin_core>
        <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue>
        <dcvalue element="date" qualifier="issued">1990</dcvalue>
        <dcvalue element="title" qualifier="alternate" language="fr">J'aime les Printemps</dcvalue>
    </dublin_core>
    
    
    (Note the optional language tag attribute which notifies the system that the optional title is in French.)

...

The bitstream name may optionally be followed by any of the sequencefollowing:

  • \tbundle:BUNDLENAME
  • \tpermissions:PERMISSIONS
  • \tdescription:DESCRIPTION
  • \tprimary:true

Where bundlenamewhere '\t' is the tab character.

and 'bundlenameBUNDLENAME' is replaced by the name of the bundle to which the bitstream should be added. If no bundle is specified, the bitstream will be added to the 'ORIGINAL' bundle.

...

Without specifying the bundle, items will go into the default bundle, ORIGINAL.

'PERMISSIONS'  is text with the following format: -[r|w] 'group name'

'DESCRIPTION' is text of the files description.

Primary is used to specify the primary bitstream.

Configuring metadata-[prefix].xml for Different

...

Schema

It is possible to use other Schema such as EAD, VRA Core, etc. Make sure you have defined the new scheme in the DSpace Metada Schema Registry.

  1. Create a separate file for the other schema named "metadata{-[prefix}].xml_", where the {prefix} is replaced with the schema's prefix.
  2. Inside the xml file use the dame Dublin Core syntax, but on the <dublin_core> element include the attribute "schema={prefix}".
  3. Here is an example for ETD metadata, which would be in the file "metadata_etd.xml":
    Code Block
    <xml<?xml version="1.0" encoding="UTF-8"?>
    <dublin_core schema="etd">
         <dcvalue element="degree" qualifier="department">Computer Science</dcvalue>
         <dcvalue element="degree" qualifier="level">Masters</dcvalue>
         <dcvalue element="degree" qualifier="grantor">Texas A & M</dcvalue>
    						</dublin_core>

Importing Items

Before running the item importer over items previously exported from a DSpace instance, please first refer to Transferring Items Between DSpace Instances.

...

Command used:

[dspace]/bin/dspace import

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.

Java class:

org.dspace.app.itemimport.ItemImport

Arguments short and (long) forms:

Description

-a or --add

Add items to DSpace ‡

-r or --replace

Replace items listed in mapfile ‡

-d or --delete

Delete items listed in mapfile ‡

-s or --source

Source of the items (directory)

-c or --collection

Destination Collection by their Handle or database ID

-m or --mapfile

Where the mapfile for items can be found (name and directory)

-e or --eperson

Email of eperson doing the importing

-w or --workflow

Send submission through collection's workflow

-n or --notify

Kicks off the email alerting of the item(s) has(have) been imported

-t or --test

Test run‚ do not actually import items

-p or --template

Apply the collection template

-R or --resume

Resume a failed import (Used on Add only)

-h or --help

Command help

...

The item exporter can export a single item or a collection of items, and creates a DSpace simple archive for each item to be exported.

...

Command used:

[dspace]/bin/dspace export

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.app.itemexport.ItemExport

Arguments short and (long) forms:

Description

-t or --type

Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You will actually key in the keywords in all caps. See examples below.)

-i or --id

The ID or Handle of the Collection or Item to export.

-d or --dest

The destination of where you want the file of items to be placed. You place the path if necessary.

-n or --number

Sequence number to begin export the items with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export is the same as you would set your layout for an Import.

-m or --migrate

Export the item/collection for migration. This will remove the handle and metadata that will be re-created in the new instance of DSpace.

-h or --help

Brief Help.

...

For metadata, ItemUpdate can perform 'add' and 'delete' actions on specified metadta metadata elements. For bitstreams, 'add' and 'delete' are similarly available. All these actions can be combined in a single batch run.

...

One probable scenario for using this tool is where there is an external primary data source for which the DSpace instance is a secondary or down-stream system. Metadata and/or bitstream content changes in the primary system can be exported to the simple archive format to be used by ItemUpdate to synchronize the changes.

Wiki MarkupA note on terminology: *item* refers to a DSpace item. *metadata element* refers generally to a qualified or unqualified element in a schema in the form _\[schema\].\[element\].\[qualifier\]_ or _\[schema\].\[element\]_ and occasionally in a more specific way to the second part of that form. *metadata field* refers to a specific instance pairing a metadata element to a value.

DSpace simple Archive Format

...

The optional suppress_undo file is a flag to indicate that the 'undo archive' should not be written to disk. This file is usually written by the application in an undo archive to prevent a recursive undo. This file is an addition to the Archive format specifically for ItemUpdate.

ItemUpdate Commands

...

Command used:

[dspace]/bin/dspace itemupdate

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.app.itemimport.ItemUpdate

Arguments short and (long) forms:

Description

-a or --addmetadata [metadata element]

Repeatable for multiple elements

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d8e4c55b-c95c-4ad9-92c1-1f5e168bf168"><ac:plain-text-body><![CDATA[

-a or --addmetadata [metadata element]

Repeatable for multiple elements. The metadata element should be in the form dc.x or dc.x.y. The mandatory argument indicates the metadata fields in the dublin_core.xml file to be added unless already present. However, duplicate fields will not be added to the item metadata without warning or error.

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d9ee0ae0-bc4e-4fda-b3ea-012587b4c7b8"><ac:plain-text-body><![CDATA[

-d or --deletemetadata [metadata element]

Repeatable for multiple elements. All metadata fields matching the element will be deleted. ]]></ac:plain-text-body></ac:structured-macro>

-A or --addbitstream

Adds bitstreams listed in the contents file with the bistream bitstream metadata cited there.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="39cef89d-9e21-4c8c-89db-4fea9c580d2d"><ac:plain-text-body><![CDATA[-D or --deletebitstream [filter plug classname or alias]

Not repeatable. With no argument, this operation deletes bistreams bitstreams listed in the deletes_contents file. Only bitstream ids are recognized identifiers for this operatiotn. The optional filter argument is the classname of an implementation of org.dspace.app.itemdupate.BitstreamFilter class to identify files for deletion or one of the aliases (ORIGINAL, ORIGINAL_AND_DERIVATIVES, TEXT, THUMBNAIL) which reference existing filters based on membership in a bundle of that name. IN this case, the delete_contents file is not required for any item. The filter properties file will contains properties pertinent to the particular filer used. Multiple filters are not allowed. ]]></ac:plain-text-body></ac:structured-macro>

-h or --help

Displays brief command line help.

-e or --eperson

Email address of the person or the user's database ID (Required)

-s or --source

Directory archive to process (Required)

-i or --itemidentifier

Specifies an alternate metadata field (not a handle) used to hold an identifier used to match the DSpace item with that in the archive. If omitted, the item handle is expected to be located in the dc.identifier.uri field. (Optional)

-t or --test

Runs the process in test mode with logging but no changes applied to the DSpace instance. (Optional)

-P or --alterprovenance

Prevents any changes to the provenance field to represent changes in the bitstream content resulting from an Add or Delete. No provenance statements are written for thumbnails or text derivative bitstreams, un keepin in keeping with the practice of MediaFilterManager. (Optional)

-F or --filterproperties

The filter properties files to be used by the delete bitstreams action (Optional)

...

  • -r indicates this is a file to be registered
  • -s n indicates the asset store number (n)
  • -f filepath indicates the path and name of the content file to be registered (filepath)
  • \t is a tab character
  • bundle:bundlename is an optional bundle name
  • Wiki Markup_permissions: \ -\[r\|w\] 'group name'_ is an optional read or write permission that can be attached to the bitstream
  • description: some text is an optional description field to add to the file
    The bundle, that is everything after the filepath, is optional and is normally not used.

...

Available Command-Line Options:

  • Wiki Markup*Help* : _\[dspace\]/bin/dspace filter-media \ -h_
    • Display help message describing all command-line options.
    Wiki Markup*
  • Force mode* : _\[dspace\]/bin/dspace filter-media \ -f_
    • Apply filters to ALL bitstreams, even if they've already been filtered. If they've already been filtered, the previously filtered content is overwritten.
    unmigrated-wiki-markup
    *
  • Identifier mode* : _\[dspace\]/bin/dspace filter-media \ -i 123456789/2_
    • Restrict processing to the community, collection, or item named by the identifier - by default, all bitstreams of all items in the repository are processed. The identifier must be a Handle, not a DB key. This option may be combined with any other option.
    Wiki Markup*
  • Maximum mode* : _\[dspace\]/bin/dspace filter-media \ -m 1000_
    • Suspend operation after the specified maximum number of items have been processed - by default, no limit exists. This option may be combined with any other option.
    Wiki Markup*
  • No-Index mode* : _\[dspace\]/bin/dspace filter-media \ -n_
    • Suppress index creation - by default, a new search index is created for full-text searching. This option suppresses index creation if you intend to run index-update elsewhere.
    Wiki Markup*
  • Plugin mode* : _\[dspace\]/bin/dspace filter-media \ -p "PDF Text Extractor","Word Text Extractor"_
    • Apply ONLY the filter plugin(s) listed (separated by commas). By default all named filters listed in the filter.plugins field of dspace.cfg are applied. This option may be combined with any other option. WARNING: multiple plugin names must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').
    Wiki Markup*
  • Skip mode* : _\[dspace\]/bin/dspace filter-media \ -s 123456789/9,123456789/100_
    • SKIP the listed identifiers (separated by commas) during processing. The identifiers must be Handles (not DB Keys). They may refer to items, collections or communities which should be skipped. This option may be combined with any other option. WARNING: multiple identifiers must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').
    • NOTE: If you have a large number of identifiers to skip, you may maintain this comma-separated list within a separate file (e.g. filter-skiplist.txt). Use the following format to call the program. Please note the use of the "grave" or "tick" (`_) symbol and do not use the single quotation. _
      • Wiki Markup_\[dspace\]/bin/dspace filter-media \ -s `less filter-skiplist.txt`_
    unmigrated-wiki-markup
    *
  • Verbose mode* : _\[dspace\]/bin/dspace filter-media \ -v_
    • Verbose mode - print all extracted text and other filter details to STDOUT.
      Adding your own filters is done by creating a class which implements the org.dspace.app.mediafilter.FormatFilter interface. See the Creating a new Media Filter topic and comments in the source file FormatFilter.java for more information. In theory filters could be implemented in any programming language (C, Perl, etc.) However, they need to be invoked by the Java code in the Media Filter class that you create.

...

The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be either a 'parent' community‚ meaning it has at least one sub-community, or a 'child' community‚ meaning it is a sub-community of another community, or both or neither. In these terms, an 'orphan' is a community that lacks a parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-interface, since there is no parent community 'above' them. The first operation‚ establishing a parent/child relationship - can take place between any community and an orphan. The second operation - removing a parent/child relationship‚ will make the child an orphan.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="031ee111-97e4-4b1f-bd03-322e93397594"><ac:plain-text-body><![CDATA[

Command used: Command used:

[dspace]/bin/dspace community-filiator]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.administer.CommunityFiliator

Arguments short and (long) forms:

Description

-s or --set

Set a parent/child relationship

-r or --remove

Remove a parent/child relationship

-c or --child

Child community (Handle or database ID)

-p or --parent

Parent community (Handle or database ID

-h or --help

Online help.

...

The following table summarizes the basics.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="0bde4c00-6403-4b95-9025-1351b92b315b"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace metadata-export

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org

Java class:

org.dspace.app.bulkedit.MetadataExport

Arguments short and (long) forms):

Description

-f or --file

Required. The filename of the resulting CSV.

-i or --id

The Item, Collection, or Community handle or Database ID to export. If not specified, all items will be exported.

-a or --all

Include all the metadata fields that are not normally changed (e.g. provenance) or those fields you configured in the dspace.cfg to be ignored on export.

-h or --help

Display the help page.

...

The following table summarizes the basics.

...

Command used:

[dspace]/bin/dspace metadata-import]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.app.bulkedit.MetadataImport

Arguments short and (long) forms:

Description

-f or --file

Required. The filename of the CSV file to load.

-s or --silent

Silent mode. The import function does not prompt you to make sure you wish to make the changes.

-e or --email

The email address of the user. This is only required when adding new items.

-w or --workflow

When adding new items, the program will queue the items up to use the Collection Workflow processes.

-n or --notify

when adding new items using a workflow, send notification emails.

-t or --template

When adding new items, use the Collection template, if it exists.

-h or --help

Display the brief help page.

...

If you are wishing to upload new metadata without bistreams bitstreams, at the command line:

Code Block
[dspace]/bin/dspace/metadata-import -f /dImport/new_file.csv -e joe@user.com -w -n -t

In the above example we threw in all the arguments. This would add the metadata and engage the workflow, notification, and templates to all be applied to the items that are being added.

The CSV Files

Info
titleImporting large CSV files

It is not recommended to import CSV files of more than 1,000 lines.  When importing files larger than this, it is hard to accurately verify the changes that the import tool states it will make, and large files may cause 'Out Of Memory' errors part way through the process.

The CSV Files

The csv files that this tool can import and export abide by the RFC4180 CSV The csv files that this tool can import and export abide by the RFC4180 CSV format http://www.ietf.org/rfc/rfc4180.txt. This means that new lines, and embedded commas can be included by wrapping elements in double quotes. Double quotes can be included by using two double quotes. The code does all this for you, and any good csv editor such as Excel or OpenOffice will comply with this convention.

File Structure. The first row of the csv must define the metadata values that the rest of the csv represents. The first column must always be "id" which refers to the item's id. All other columns are optional. The other columns contain the dublin core metadata fields that the data is to reside.

...

When importing a csv file, the importer will overlay the data onto what is already in the repository to determine the differences. It only acts on the contents of the cvs csv file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored. For example, if you only want to edit item abstracts, you can remove all of the other columns and just leave the abstract column. (You do need to leave the ID column intact. This is mandatory).

...

Checksum Checker is program that can run to verify the checksum of every item within DSpace. Checksum Checker was designed with the idea that most System Administrators will run it from the cron. Depending on the size of the repository choose the options wisely.

...

Command used:

[dspace]/bin/dspace checker

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.app.checker.ChecksumChecker

Arguments short and (long) forms):

Description

-L or --continuous

Loop continuously through the bitstreams

-a or --handle

Specify a handle to check

-b <bitstream-ids>

Space separated list of bitstream IDs

-c or --count

Check count

-d or --duration

Checking duration

-h or --help

Calls online help

-l or --looping

Loop once through bitstreams

-p <prune>

Prune old results (optionally using specified properties file for configuration

-v or --verbose

Report all processing

...

Available command line options

  • Wiki Markup*Limited-count mode:* {{\[dspace\]/bin/dspace checker \-c}} To check a specific number of bitstreams. The _\-c_ option if followed by an integer, the number of bitstreams to check. Example: {{\-c To check a specific number of bitstreams. The -c option if followed by an integer, the number of bitstreams to check. Example: [dspace/bin/dspace checker \ -c 10}} This is particularly useful for checking that the checker is executing properly. The Checksum Checker's default execution mode is to check a single bitstream, as if the option was _\-c 1_unmigrated-wiki-markup
  • *Duration mode:* {{\[dspace\]/bin/dspace checker \ -d}} To run the Check for a specific period of time with a time argument. You may use any of the time arguments below: Example: {{\[dspace/bin/dspace checker \ -d 2h}} (Checker will run for 2 hours)

    s

    Seconds

    m

    Minutes

    h

    Hours

    d

    Days

    w

    Weeks

    y

    Years

    The checker will keep starting new bitstream checks for the specific durations, so actual execution duration will be slightly longer than the specified duration. Bear this in mind when scheduling checks.
  • Wiki Markup*Specific Bistream mode:* {{\[dspace\Specific Bitstream mode:[dspace]/bin/dspace checker \ -b}} Checker will only look at the internal bitsteam bitstream IDs. Example: {{\[dspace\]/bin/dspace checker \ -b 112 113 4567}} Checker will only check bitstream IDs 112, 113 and 4567.
  • Wiki Markup*Specific Handle mode:* {{\[dspace\]/bin/dspace checker \ -a}} Checker will only check bistreams bitstreams within the Community, Community or the item itself. Example: {{\[dspace\]/bin/dspace checker \ -a 123456/999}} Checker will only check this handle. If it is a Collection or Community, it will run through the entire Collection or Community.
  • Wiki Markup*Looping mode:* {{\[dspace\]/bin/dspace checker \ -l}} or {{\[dspace\]/bin/dspace checker \ -L}} There are two modes. The lowercase 'el' (-l) specifies to check every bitstream in the repository once. This is recommended for smaller repositories who are able to loop through all their content in just a few hours maximum. An uppercase 'L' (-L) specifies to continuously loops through the repository. This is not recommended for most repository systems. *Cron Jobs*. For large repositories that cannot be completely checked in a couple of hours, we recommend the \ -d option in cron.
  • Wiki Markup*Pruning mode:* {{\[dspace\]/bin/dspace checker \ -p}} The Checksum Checker will store the result of every check in the checksum_history table. By default, successful checksum matches that are eight weeks old or older will be deleted when the \ -p option is used. (Unsuccessful ones will be retained indefinitelindefinitely). Without this option, the retention settings are ignored and the database table may grow rather large\!

Checker Results Pruning

As stated above in "Pruning mode", the checksum_history table can get rather large, and that running the checker with the -p assists in the size of the checksum_history being kept manageable. The amount of time for which results are retained in the checksum_history table can be modified by one of two methods:

  1. Wiki MarkupEditing the retention policies in {{\[dspace\]/config/dspace.cfg}} See Chapter 5 Configuration for the property keys. OR
  2. Pass in a properties file containting containing retention policies when using the -p option.To do this, create a file with the following two property keys:
    Code Block
    checker.retention.default = 10y
    checker.retention.CHECKSUM_MATCH = 8w
    Wiki Markup
    You can use the table above for your time units. At the command line: {{\[dspace\]/bin/dspace checker \ -p retention_file_name <ENTER>}}

Checker Reporting

Wiki MarkupChecksum Checker uses log4j to report its results. By default it will report to a log called {{\[dspace\]/log/checker.log}}, and it will report only on bitstreams for which the newly calculated checksum does not match the stored checksum. To report on all bitstreams checked regardless of outcome, use the _\-v_ (verbose) command line option:unmigrated-wiki-markup

{{\[dspace\]/bin/dspace checker \ -l \ -v}} (This will loop through the repository once and report in detail about every bitstream checked.

Wiki MarkupTo change the location of the log, or to modify the prefix used on each line of output, edit the {{\[dspace\]/config/templates/log4j.properties}} file and run {{\[dspace\]/bin/install_configs}}.

Cron or Automatic Execution of Checksum Checker

...

Optionally, you may choose to receive automated emails listing the Checksum Checkers' results. Schedule it to run after the Checksum Checker has completed its processing (otherwise the email may not contain all the results).

Command used:

[dspace]/bin/dspace checker-emailer

Java class:

org.dspace.checker.DailyReportEmailer

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ee6ffeb9-33d0-4567-8b7f-aa0df16d4f97"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace checker-emailer

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.checker.DailyReportEmailer

Arguments short and (long) forms):

Description

-a or --All

Send all the results (everything specified below)

-d or --Deleted

Send E-mail report for all bitstreams set as deleted for today.

-m or --Missing

Send E-mail report for all bitstreams not found in assetstore for today.

-c or --Changed

Send E-mail report for all bitstrems bitstreams where checksum has been changed for today.

-u or --Unchanged

Send the Unchecked bitstream report.

-n or --Not Processed

Send E-mail report for all bitstreams set to longer be processed for today.

-h or --help

Help

...

If you have implemented the Embargo feature, you will need to run it periodically to check for Items with expired embargoes and lift them.

Command used:

[dspace]/bin/dspace embargo-lifter

Java class:

org.dspace.embargo.EmbargoManager

Arguments short and (long) forms):

Description

-c or --check

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ac104435-368a-4090-804c-a9a0898cb0cc"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace embargo-lifter

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.embargo.EmbargoManager

Arguments short and (long) forms):

Description

-c or --check

ONLY check the state of embargoed Items, do NOT lift any embargoes

-i or --identifier

Process ONLY this handle identifier(s), which must be an Item. Can be repeated.

-l or --lift

Only lift embargoes, do NOT check the state of any embargoed items.

-n or --dryrun

Do no change anything in the data model, print message instead.

-v or --verbose

Print a line describing the action taken for each embargoed item found.

-q or --quiet

No output except upon error.

-h or --help

Display brief help screen.

...

To create all the various browse indexes that you define in the Configuration Section (Chapter 5) there are a variety of options available to you. You can see these options below in the command table.

Command used:

[dspace]/bin/dspace index-init

Java class:

org.dspace.browse.IndexBrowse

Arguments short

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3b1bcdfb-b647-483b-93de-f58f25e80741"><ac:plain-text-body><![CDATA[

Command used:

_[dspace]_/bin/dspace index-init

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.browse.IndexBrowse

Arguments short and long forms):

Description

-r or -rebuild

Should we rebuild all the indexes, which removes old tables and creates new ones. For use with -f. Mutually exclusive with -d

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="0a60733d-94c5-4e0e-8512-892a8ffda204"><ac:plain-text-body><![CDATA[

s or -start

[-s <int>] _start from this index number and work upwards (mostly only useful for debugging). For use with _-t and -f

]]></ac:plain-text-body></ac:structured-macro>

-s or -start

[-s <int>] start from this index number and work upwards (mostly only useful for debugging). For use with {{-t and -f

-x or -execute

Execute all the remove and create SQL against the database. For use with -t _ and _ -f

-i or -index

Actually do the indexing. Mutually exclusive with -t and -f.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="dde94a5c-e4ce-410c-a13d-83525aca130e"><ac:plain-text-body><![CDATA[

o or -out

[-o<filename>] write the remove and create SQL to the given file. For use with -t and -f

]]></ac:plain-text-body></ac:structured-macro>

-o or -out

[-o<filename>] write the remove and create SQL to the given file. For use with -t and -f

-p or -print

Write the remove and create SQL to the stdout. For use with -t and -f.

-t or -tables

Create the tables only, do no attempt to index. Mutually exclusive with -f and -i

-f or -full

Make the tables, and do the indexing. This forces -x. Mutually exclusive with -f and -i.

-v or -verbose

Print extra information to the stdout. If used in conjunction with -p, you cannot use the stdout to generate your database structure.

-d or -delete

Delete all the indexes, but do not create new ones. For use with -f. This is mutually exclusive with -r.

-h or -help

Show this help documentation. Overrides all other arguments.

Running the Indexing Programs

...

*Complete Index Regeneration*. By running {{\[dspace\]/bin/dspace index-init}} you will completely regenerate your indexes, tearing down all old tables and reconstructing with the new cofigurationconfiguration.

Code Block
[dspace]/bin/dspace index-init

...

*Updating the Indexes*. By running {{\[dspace\]/bin/dspace index-update}} you will reindex your full browse wihtout without modifying the table structure. (This should be your default approach if indexing, for example, via a cron job periodically).

Code Block
[dspace]/bin/dspace index-update

Destroy and rebuild. You can destroy and rebuild the database, but do not do the indexing. Output the SQL to do this to the screen and a file, as well as executing it against the database, while being verbose. At the CLI screen:

Code Block
[dspace]/bin/dspace index \-r \-t \-p \-v \-x \-o myfile.sql

...

Indexing Customization

DSpace provides robust browse indexing. It is possible to expand upon the default indexes delivered at the time of the installation. The System Administrator should review "Defining the Indexes" from the Chapter 5. Configuration to become familiar with the property keys and the definitions used therein before attempting heavy customizations.

...

Remember to run index-init after adding any new defitions definitions in the dspace.cfg to have the indexes created and the data indexed.

...

The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into SOLR.

...

Command used:

_[dspace]_/bin/dspace stats-log-converter

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.ClassicDSpaceLogConverter

Arguments short and long forms):

Description

-i or -in

Input file

-o or -out

Output file

-m or -multiple

Adds a wildcard at the end of input and output, so it would mean dspace.log* would be converted. (For example, the following files would be included because of this argument: dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.)

-n or -newformat

If the log files have been created with DSpace 1.6

-v or -verbose

Display verbose ouput output (helpful for debugging)

-h or -help

Help

The command loads the intermediate log files that have been created by the aforementioned script into SOLR.

Command used:

[dspace]/

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="15a83049-af0d-4bad-9e29-ce6538a1940b"><ac:plain-text-body><![CDATA[

Command used:

_[dspace]_/bin/dspace stats-log-importer]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.StatisticsImporter

Arguments (short and long forms):

Description

-i or --

input file

-m or --

Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported

-s or --

To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the inforamtion information about the host from its IP addessaddress, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.)

-v or --

Display verbose ouput (helpful for debugging)

-l or --

For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in your local system to add hits to instead.

-h or --

Help

Although the DSpace Log Convertor applies basic spider filtering (googlebot, yahoo slurp, msnbot), it is far from complete. Please refer to Statistics Client (8.15) for spider removal operations, after converting your old logs.

Client Statistics

Command used:

[dspace]/bin/dspace stats-util

Java class:

org.dspace.statistics.util.StatisticsClient

Arguments (short and long forms):

Description

-u

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="cf246032-b701-49e3-a04e-988b8611ead1"><ac:plain-text-body><![CDATA[

Command used:

_[dspace]_/bin/dspace stats-util

]]></ac:plain-text-body></ac:structured-macro>

Java class:

org.dspace.statistics.util.StatisticsClient

Arguments (short and long forms):

Description

u or -update-spider-files

Update Spider IP Files from internet into /dspace/config/spiders. Downloads Spider files identified in dspace.cfg under property

-f or -delete-spiders-by-flag

Delete Spiders in Solr By isBot Flag. Will prune out all records that have isBot:true

-i or -delete-spiders-by-ip

Delete Spiders in Solr By IP Address. Will prune out all records that have IP's that match spider IPs.

-m or -mark-spiders

Update isBog Flag in Solr. Marks any records currently stored in statistics that have IP addresses matched in spiders files

-o or -optimize

Run maintenance on the SOLR index. Recommended to run daily, to prevent your applet container from running out of memory

-h or -help

Calls up this brief help table at CLI.

...

The usage of these options is open for the user to choose, If they want to keep spider entires in their repository, they can just mark them using "-m" and they will be excluded from statistics queries when "solr.statistics.query.filter.isBot = true" in the dspace.cfg.

If they want to keep the spiders out of the solr repository, they can run just use the "-i" option and they will be removed immediately.

Wiki MarkupThere are guards in place to control what can be defined as an IP range for a bot, in _\[dspace\There are guards in place to control what can be defined as an IP range for a bot, in [dspace]/config/spiders_, spider IP address ranges have to be at least 3 subnet sections in length 123.123.123 and IP Ranges can only be on the smallest subnet \ [123.123.123.0 - 123.123.123.255\]. If not, loading that row will cause exceptions in the dspace logs and exclude that IP entry.

Test Database

This command can be used at any time to test for Database connectivity. It will assist in troubleshooting PostgreSQL and Oracle connection issues with the datase.database.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="7acaa0ba-b636-4232-a3c1-366dec5e3683"><ac:plain-text-body><![CDATA[

Command used:

_[dspace]_/bin/dspace test-database

]]></ac:plain-text-body></ac:structured-macro>

Java class Java class:

org.dspace.storage.rdbms.DatabaseManager

Arguments (short and long forms):

Description

- or --

There are no arguments used at this time.

Moving items

It is possible for administrators to move items one at a time using either the JSPUI or the XMLUI.  When editing an item, on the 'Edit item' screen select the 'Move Item' option.  To move the item, select the new collection for the item to appear in.  When the item is moved, it will take its authorizations (who can READ / WRITE it) with it.

If you wish for the item to take on the default authorizations of the destination collection, tick the 'Inherit default policies of destination collection' checkbox.  This is useful if you are moving an item from a private collection to a public collection, or from a public collection to a private collection.

  • Note: When selecting the 'Inherit default policies of destination collection' option, ensure that this will not override system-managed authorizations such as those imposed by the embargo system.

Items may also be moved in bulk by using the CSV batch metadata editor (see above).