All Versions
- DSpace 7.x (Current Release)
- DSpace 8.x (Unreleased)
- DSpace 6.x (EOL)
- DSpace 5.x (EOL)
- More Versions...
...
Info | ||
---|---|---|
| ||
|
Table of Contents:
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
...
This CLI tool gives you the ability to import a community and collection structure directory from a source XML file.
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="8afb36c5-48ca-4ccb-9715-9f2d0d390e32"><ac:plain-text-body><![CDATA[ | Command used: |
|
Java class: | | |
Argument: short and long (if available) forms: | Description of the argument | |
-f | Source xml file. | |
-o | Output xml file. | |
-e | Email of DSpace Administrator. |
...
Code Block |
---|
[dspace]/bin/dspace packager -e [user-email] -p [parent-handle] -t [packager-name] /full/path/to/package |
Where _\[user-email\]_ is the e-mail address of the E-Person under whose authority this runs; _\[parent-handle\]_ is the Handle of the Parent Object into which the package is ingested, _\[packager-name\]_ is the plugin name of the package ingester to use, and _/full/path/to/package_ is the path to the file to ingest (or _"-"_ to read from the standard input). Wiki Markup
Here is an example that loads a PDF file with internal metadata as a package:
...
Code Block |
---|
[dspace]/bin/dspace packager -d -e [user-email] -i [handle] -t [packager-name] [file-path] |
...
Where _\[user-email\]_ is the e-mail address of the E-Person under whose authority this runs; _\[handle\]_ is the Handle of the Object to disseminate; _\[packager-name\]_ is the plugin name of the package disseminator to use; and _\[file-path\]_ is the path to the file to create (or _"-"_ to write to the standard output). For example:
Code Block |
---|
[dspace]/bin/dspace packager -d -t METS -e admin@myu.edu -i 4321/4567 4567.zip |
...
Code Block |
---|
archive_directory/ item_000/ dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to the dc schema metadata_[prefix].xml -- metadata in another schema, the prefix is the name of the schema as registered with the metadata registry contents -- text file containing one line per filename file_1.doc -- files to be added as bitstreams to the item file_2.pdf item_001/ dublin_core.xml contents file_1.png ... |
...
The _dublin_core.xml_ or _metadata_\[prefix\].xml_file has the following format, where each metadata element has it's own entry within a _<dcvalue>_ tagset. There are currently three tag attributes available in the _<dcvalue>_ tagset:
Code Block |
---|
<dublin_core> <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue> <dcvalue element="date" qualifier="issued">1990</dcvalue> <dcvalue element="title" qualifier="alternate" language="fr">J'aime les Printemps</dcvalue> </dublin_core> |
...
'BUNDLENAME' is the name of the bundle to which the bitstream should be added. Without specifying the bundle, items will go into the default bundle, ORIGINAL.
'PERMISSIONS' is text with the following format: -\[r\|w\] 'group  is text with the following format: -[r|w] 'group name' Wiki Markup
'DESCRIPTION' is text of the files description.
Primary is used to specify the primary bitstream.
...
...
...
...
...
...
...
It is possible to use other Schema such as EAD, VRA Core, etc. Make sure you have defined the new scheme in the DSpace Metada Schema Registry.
Code Block |
---|
<?xml version="1.0" encoding="UTF-8"?> <dublin_core schema="etd"> <dcvalue element="degree" qualifier="department">Computer Science</dcvalue> <dcvalue element="degree" qualifier="level">Masters</dcvalue> <dcvalue element="degree" qualifier="grantor">Texas A & M</dcvalue> </dublin_core> |
...
Before running the item importer over items previously exported from a DSpace instance, please first refer to Transferring Items Between DSpace Instances.
Command used: | | ||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="c13814e0-3470-4188-8a32-20e844239ade"><ac:plain-text-body><![CDATA[ | Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | | ||
Arguments short and (long) forms: | Description | ||
| Add items to DSpace ‡ | ||
| Replace items listed in mapfile ‡ | ||
| Delete items listed in mapfile ‡ | ||
| Source of the items (directory) | ||
| Destination Collection by their Handle or database ID | ||
| Where the mapfile for items can be found (name and directory) | ||
| Email of eperson doing the importing | ||
| Send submission through collection's workflow | ||
| Kicks off the email alerting of the item(s) has(have) been imported | ||
| Test run‚ do not actually import items | ||
| Apply the collection template | ||
| Resume a failed import (Used on Add only) | ||
| Command help |
...
The item exporter can export a single item or a collection of items, and creates a DSpace simple archive for each item to be exported.
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="26c2a7a7-7c96-43cb-b21c-395a87397ed3"><ac:plain-text-body><![CDATA[ | Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org | Java class: | org.dspace.app.itemexport.ItemExport |
Arguments short and (long) forms: | Description | ||
| Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You will actually key in the keywords in all caps. See examples below.) | ||
| The ID or Handle of the Collection or Item to export. | ||
| The destination of where you want the file of items to be placed. You place the path if necessary. | ||
| Sequence number to begin export the items with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export is the same as you would set your layout for an Import. | ||
| Export the item/collection for migration. This will remove the handle and metadata that will be re-created in the new instance of DSpace. | ||
| Brief Help. |
...
One probable scenario for using this tool is where there is an external primary data source for which the DSpace instance is a secondary or down-stream system. Metadata and/or bitstream content changes in the primary system can be exported to the simple archive format to be used by ItemUpdate to synchronize the changes.
A note on terminology: *item* refers to a DSpace item. *metadata element* refers generally to a qualified or unqualified element in a schema in the form _\[schema\].\[element\].\[qualifier\]_ or _\[schema\].\[element\]_ and occasionally in a more specific way to the second part of that form. *metadata field* refers to a specific instance pairing a metadata element to a value. Wiki Markup
...
The optional suppress_undo file is a flag to indicate that the 'undo archive' should not be written to disk. This file is usually written by the application in an undo archive to prevent a recursive undo. This file is an addition to the Archive format specifically for ItemUpdate.
Command used: | | ||
Java class: | org.dspace.app.itemimport.ItemUpdate | ||
Arguments short | |||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="28b901a0-fec4-4e60-a03a-01bf8395dac5"><ac:plain-text-body><![CDATA[ | Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org.dspace.app.itemimport.ItemUpdate | ||
Arguments short and (long) forms: | Description | ||
<ac:structured | Repeatable for multiple elements. The metadata element should be in the form dc.x or dc.x.y. The mandatory argument indicates the metadata fields in the dublin_core.xml file to be added unless already present. However, duplicate fields will not be added to the item metadata without warning or error. | ]]></ac:plain-text-body></ac:structured-macro> | <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1b0f2a95-b548-4078-b83e-4537ba2a90ac"><ac:plain-text-body><![CDATA[ |
| Repeatable for multiple elements. All metadata fields matching the element will be deleted. | ||
]]></ac:plain | Adds bitstreams listed in the contents file with the bitstream metadata cited there. | <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="df9d89df-e930-4602-b277-515011f874be"><ac:plain-text-body><![CDATA[ | |
| Not repeatable. With no argument, this operation Not repeatable. With no argument, this operation deletes bitstreams listed in the deletes_contents file. Only bitstream ids are recognized identifiers for this operatiotn. The optional filter argument is the classname of an implementation of org.dspace.app.itemdupate.BitstreamFilter class to identify files for deletion or one of the aliases (ORIGINAL, ORIGINAL_AND_DERIVATIVES, TEXT, THUMBNAIL) which reference existing filters based on membership in a bundle of that name. IN this case, the delete_contents file is not required for any item. The filter properties file will contains properties pertinent to the particular filer used. Multiple filters are not allowed. | ||
]]></ac:plain | Displays brief command line help. | ||
| Email address of the person or the user's database ID (Required) | ||
| Directory archive to process (Required) | ||
| Specifies an alternate metadata field (not a handle) used to hold an identifier used to match the DSpace item with that in the archive. If omitted, the item handle is expected to be located in the dc.identifier.uri field. (Optional) | ||
| Runs the process in test mode with logging but no changes applied to the DSpace instance. (Optional) | ||
| Prevents any changes to the provenance field to represent changes in the bitstream content resulting from an Add or Delete. No provenance statements are written for thumbnails or text derivative bitstreams, in keeping with the practice of MediaFilterManager. (Optional) | ||
| The filter properties files to be used by the delete bitstreams action (Optional) |
...
...
Available Command-Line Options:
...
The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be either a 'parent' community‚ meaning it has at least one sub-community, or a 'child' community‚ meaning it is a sub-community of another community, or both or neither. In these terms, an 'orphan' is a community that lacks a parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-interface, since there is no parent community 'above' them. The first operation‚ establishing a parent/child relationship - can take place between any community and an orphan. The second operation - removing a parent/child relationship‚ will make the child an orphan.
Command used: | | ||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="414a22c8-2272-4a92-8867-529373ed1ded"><ac:plain-text-body><![CDATA[ | Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org.dspace.administer.CommunityFiliator | ||
Arguments short and (long) forms: | Description | ||
| Set a parent/child relationship | ||
| Remove a parent/child relationship | ||
| Child community (Handle or database ID) | ||
| Parent community (Handle or database ID | ||
| Online help. |
...
The following table summarizes the basics.
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="6a83dedf-3fbd-42e3-8541-d7625e7bfe75"><ac:plain-text-body><![CDATA[ | Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | Java class: | org.dspace.app.bulkedit.MetadataExport | |
Arguments short and (long) forms): | Description | ||
| Required. The filename of the resulting CSV. | ||
| The Item, Collection, or Community handle or Database ID to export. If not specified, all items will be exported. | ||
| Include all the metadata fields that are not normally changed (e.g. provenance) or those fields you configured in the | ||
| Display the help page. |
...
The following table summarizes the basics.
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3571a45b-e35c-4b19-baca-1641f8e7a028"><ac:plain-text-body><![CDATA[ | Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org.dspace.app.bulkedit.MetadataImport | ||
Arguments short and (long) forms: | Description | ||
| Required. The filename of the CSV file to load. | ||
| Silent mode. The import function does not prompt you to make sure you wish to make the changes. | ||
| The email address of the user. This is only required when adding new items. | ||
| When adding new items, the program will queue the items up to use the Collection Workflow processes. | ||
| when adding new items using a workflow, send notification emails. | ||
| When adding new items, use the Collection template, if it exists. | ||
| Display the brief help page. |
...
Checksum Checker is program that can run to verify the checksum of every item within DSpace. Checksum Checker was designed with the idea that most System Administrators will run it from the cron. Depending on the size of the repository choose the options wisely.
Command used: | | ||
Java class: | org.dspace.app.checker.ChecksumChecker | ||
Arguments short and (long) forms): | Description | ||
| |||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="790ab2ec-697c-48c8-bc43-8216ca98c408"><ac:plain-text-body><![CDATA[ | Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org.dspace.app.checker.ChecksumChecker | ||
Arguments short and (long) forms): | Description | ||
| Loop continuously through the bitstreams | ||
| Specify a handle to check | ||
| Space separated list of bitstream IDs | ||
| Check count | ||
| Checking duration | ||
| Calls online help | ||
| Loop once through bitstreams | ||
| Prune old results (optionally using specified properties file for configuration | ||
| Report all processing |
...
Available command line options
[dspace
\]/bin/dspace
checker
\ -c
}} To check a specific number of bitstreams. The _\-c_ option if followed by an integer, the number of bitstreams to check. Example: {{\[dspace/bin/dspace
checker
\ -c
10
}} This is particularly useful for checking that the checker is executing properly. The Checksum Checker's default execution mode is to check a single bitstream, as if the option was _\-c 1_[dspace
\]/bin/dspace
checker
\ -d
}} To run the Check for a specific period of time with a time argument. You may use any of the time arguments below: Example: {{\[dspace/bin/dspace
checker
\ -d
2h
}} (Checker will run for 2 hours) s | Seconds |
m | Minutes |
h | Hours |
d | Days |
w | Weeks |
y | Years |
[dspace
\]/bin/dspace
checker
\ -b
}} Checker will only look at the internal bitstream IDs. Example: {{\[dspace
\]/bin/dspace
checker
\ -b
112
113
4567
}} Checker will only check bitstream IDs 112, 113 and 4567.[dspace
\]/bin/dspace
checker
\ -a
}} Checker will only check bitstreams within the Community, Community or the item itself. Example: {{\[dspace
\]/bin/dspace
checker
\ -a
123456/999
}} Checker will only check this handle. If it is a Collection or Community, it will run through the entire Collection or Community.[dspace
\]/bin/dspace
checker
\ -l
}} or {{\[dspace
\]/bin/dspace
checker
\ -L
}} There are two modes. The lowercase 'el' (-l) specifies to check every bitstream in the repository once. This is recommended for smaller repositories who are able to loop through all their content in just a few hours maximum. An uppercase 'L' (-L) specifies to continuously loops through the repository. This is not recommended for most repository systems. *Cron Jobs*. For large repositories that cannot be completely checked in a couple of hours, we recommend the \ -d option in cron.unmigrated-wiki-markup[dspace
\]/bin/dspace
checker
\ -p
}} The Checksum Checker will store the result of every check in the checksum_history table. By default, successful checksum matches that are eight weeks old or older will be deleted when the \ -p option is used. (Unsuccessful ones will be retained indefinitely). Without this option, the retention settings are ignored and the database table may grow rather large\!As stated above in "Pruning mode", the checksum_history table can get rather large, and that running the checker with the -p assists in the size of the checksum_history being kept manageable. The amount of time for which results are retained in the checksum_history table can be modified by one of two methods:
[dspace
\]/config/dspace.cfg
}} See Chapter 5 Configuration for the property keys. ORCode Block |
---|
checker.retention.default = 10y checker.retention.CHECKSUM_MATCH = 8w |
[dspace
\]/bin/dspace
checker
\ -p
retention_file_name
<ENTER>
}} Checksum Checker uses log4j to report its results. By default it will report to a log called {{\ Wiki Markup [dspace
\]/log/checker.log
}}, and it will report only on bitstreams for which the newly calculated checksum does not match the stored checksum. To report on all bitstreams checked regardless of outcome, use the _\-v_ (verbose) command line option:
{{\ Wiki Markup [dspace
\]/bin/dspace
checker
\ -l
\ -v
}} (This will loop through the repository once and report in detail about every bitstream checked.
To change the location of the log, or to modify the prefix used on each line of output, edit the {{\ Wiki Markup [dspace
\]/config/templates/log4j.properties
}} file and run {{\[dspace
\]/bin/install_configs
}}.
...
Optionally, you may choose to receive automated emails listing the Checksum Checkers' results. Schedule it to run after the Checksum Checker has completed its processing (otherwise the email may not contain all the results).
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="324957bc-2b84-4cec-be33-794a6bb46b54"><ac:plain-text-body><![CDATA[ | Command used: |
|
Java class: | org.dspace.checker.DailyReportEmailer | |
Arguments short and (long) forms): | Description | |
| Send all the results (everything specified below) | |
| Send E-mail report for all bitstreams set as deleted for today. | |
| Send E-mail report for all bitstreams not found in assetstore for today. | |
| Send E-mail report for all bitstreams where checksum has been changed for today. | |
| Send the Unchecked bitstream report. | |
| Send E-mail report for all bitstreams set to longer be processed for today. | |
| Help |
...
If you have implemented the Embargo feature, you will need to run it periodically to check for Items with expired embargoes and lift them.
Command used: |
| <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="bbe0cbca-ff2e-4ced-ba8e-0af329775fad"><ac:plain-text-body><![CDATA[ | Command used: |
| ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org.dspace.embargo.EmbargoManager | ||||
Arguments short and (long) forms): | Description | ||||
| ONLY check the state of embargoed Items, do NOT lift any embargoes | ||||
| Process ONLY this handle identifier(s), which must be an Item. Can be repeated. | ||||
| Only lift embargoes, do NOT check the state of any embargoed items. | ||||
| Do no change anything in the data model, print message instead. | ||||
| Print a line describing the action taken for each embargoed item found. | ||||
| No output except upon error. | ||||
| Display brief help screen. |
...
To create all the various browse indexes that you define in the Configuration Section (Chapter 5) there are a variety of options available to you. You can see these options below in the command table.
...
Command used: | [dspace]/bin/dspace index-init]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org.dspace.browse.IndexBrowse |
Arguments short and long forms): | Description |
| Should we rebuild all the indexes, which removes old tables and creates new ones. For use with |
: | Description |
| Should we rebuild all the indexes, which removes old tables and creates new ones. For use with |
| |
]]></ac:plain | Execute all the remove and create SQL against the database. For use with |
| Actually do the indexing. Mutually exclusive with |
<ac:structured-macro ac:name="unmigrated-wiki | |
| Write the remove and create SQL to the stdout. For use with |
| Create the tables only, do no attempt to index. Mutually exclusive with |
| Make the tables, and do the indexing. This forces |
| Print extra information to the stdout. If used in conjunction with |
| Delete all the indexes, but do not create new ones. For use with |
| Show this help documentation. Overrides all other arguments. |
...
*Complete Index Regeneration*. By running {{\[dspace
\]/bin/dspace
index-init
}} you will completely regenerate your indexes, tearing down all old tables and reconstructing with the new configuration.
Code Block |
---|
[dspace]/bin/dspace index-init |
...
*Updating the Indexes*. By running {{\[dspace
\]/bin/dspace
index-update
}} you will reindex your full browse without modifying the table structure. (This should be your default approach if indexing, for example, via a cron job periodically).
Code Block |
---|
[dspace]/bin/dspace index-update |
...
With the release of DSpace 1.6, new statistics software component was added. DSpace's use of SOLR for statics makes it possible to have a database of statistics. This in mind, there is the issue of the older log files and how a site can use them. The following command process is able to convert the existing log files and then import them for SOLR use. The user will need to perform this only once.
The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into SOLR.
...
.
The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into SOLR.
Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org.dspace.statistics.util.ClassicDSpaceLogConverter | |
Arguments short and long forms): | Description | |
| Input file | |
| Output file | |
| Adds a wildcard at the end of input and output, so it would mean dspace.log* would be converted. (For example, the following files would be included because of this argument: dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.) | |
| If the log files have been created with DSpace 1.6 | |
| Display verbose output (helpful for debugging) | |
| Help |
The command loads the intermediate log files that have been created by the aforementioned script into SOLR.
...
for debugging) | |
| Help |
The command loads the intermediate log files that have been created by the aforementioned script into SOLR.
Command used: | | ]]></ac:plain-text-body></ac:structured-macro> |
Java class: | org.dspace.statistics.util.StatisticsImporter | |
Arguments (short and long forms): | Description | |
| input file | |
| Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported | |
| To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the information about the host from its IP address, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.) | |
| Display verbose ouput (helpful for debugging) | |
| For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in your local system to add hits to instead. | |
| Help |
Although the DSpace Log Convertor applies basic spider filtering (googlebot, yahoo slurp, msnbot), it is far from complete. Please refer to Statistics Client (8.15) for spider removal operations, after converting your old logs.
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="29f2a900-05b9-4482-9793-f71015eabc00"><ac:plain-text-body><![CDATA[ | Command used: |
|
Java class: | org.dspace.statistics.util.StatisticsClient | |
Arguments (short and long forms): | Description | |
| Update Spider IP Files from internet into | |
| Delete Spiders in Solr By isBot Flag. Will prune out all records that have | |
| Delete Spiders in Solr By IP Address. Will prune out all records that have IP's that match spider IPs. | |
| Update isBog Flag in Solr. Marks any records currently stored in statistics that have IP addresses matched in spiders files | |
| Run maintenance on the SOLR index. Recommended to run daily, to prevent your applet container from running out of memory | |
| Calls up this brief help table at CLI. |
...
If they want to keep the spiders out of the solr repository, they can run just use the "-i
" option and they will be removed immediately.
There are guards in place to control what can be defined as an IP range for a bot, in {{\ Wiki Markup [dspace
\]/config/spiders
}}, spider IP address ranges have to be at least 3 subnet sections in length 123.123.123 and IP Ranges can only be on the smallest subnet \ [123.123.123.0 - 123.123.123.255\]. If not, loading that row will cause exceptions in the dspace logs and exclude that IP entry.
This command can be used at any time to test for Database connectivity. It will assist in troubleshooting PostgreSQL and Oracle connection issues with the database.
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="03f1314d-0645-4bec-ae11-ff243d603cd2"><ac:plain-text-body><![CDATA[ | Command used: |
|
Java class: | org.dspace.storage.rdbms.DatabaseManager | |
Arguments (short and long forms): | Description | |
| There are no arguments used at this time. |
...