Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Small clarifications and grammar/punctuation fixes

...

Code Block
archive_directory/
    item_000/
        dublin_core.xml         -- qualified Dublin Core metadata for metadata fields belonging to the 'dc' schema.
        metadata_[prefix].xml   -- metadata in another schema,.  theThe prefix is the name of the schema as registered with the metadata registry.
        contents                -- text file containing one line per filename.
		collections				-- text file that contains the handles of the collections the item will belong twoto. Optional. Each handle in a row.
								-- Collection in first line will be the owning collection.
        file_1.doc              -- files to be added as bitstreams to the item.
        file_2.pdf
    item_001/
        dublin_core.xml
        contents
        file_1.png
        ...

The dublin_core.xml or metadata_[prefix].xml file has the following format, where each metadata element has it's its own entry within a <dcvalue> tagset. There are currently three tag attributes available in the <dcvalue> tagset:

  • <element> element - the Dublin Core element
  • <qualifier> qualifier - the element's qualifier
  • <language>language - (optional) ISO language code for element

    Code Block
    <dublin_core>
        <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue>
        <dcvalue element="date" qualifier="issued">1990</dcvalue>
        <dcvalue element="title" qualifier="alternative" language="fr">J'aime les Printemps</dcvalue>
    </dublin_core>
    

    (Note the optional language tag attribute which notifies the system that the optional title is in French.)

Every metadata field used, must be registered via the metadata registry of the DSpace instance first, see .  See Metadata and Bitstream Format Registries.

...

Primary is used to specify the primary bitstream.

Configuring metadata_[prefix].xml for a Different Schema

It is possible to use other Schema such as EAD, VRA Core, etc. Make sure you have defined the new scheme schema in the DSpace Metada Metadata Schema Registry.

  1. Create a separate file for the other schema named metadata_[prefix].xml, where the [prefix] is replaced with the schema's prefix.
  2. Inside the xml file use the dame same Dublin Core syntax, but on the <dublin_core> element include the attribute schema=[prefix].
  3. Here is an example for ETD metadata, which would be in the file metadata_etd.xml:

    Code Block
    <?xml version="1.0" encoding="UTF-8"?>
    <dublin_core schema="etd">
         <dcvalue element="degree" qualifier="department">Computer Science</dcvalue>
         <dcvalue element="degree" qualifier="level">Masters</dcvalue>
         <dcvalue element="degree" qualifier="grantor">Michigan Institute of Technology</dcvalue>
    </dublin_core>


...

Before running the item importer over items previously exported from a DSpace instance, please first refer to Transferring Items Between DSpace Instances.

Command used:

[dspace]/bin/dspace import

Java class:

org.dspace.app.itemimport.ItemImport

Arguments short and (long) forms:

Description

-a or --add

Add items to DSpace ‡

-r or --replace

Replace items listed in mapfile ‡

-d or --delete

Delete items listed in mapfile ‡

-s or --source

Source of the items (directory)

-c or --collection

Destination Collection by

their

its Handle or database ID

-m or --mapfile

Where the mapfile for items can be found (name and directory)

-e or --eperson

Email of eperson doing the importing

-w or --workflow

Send submission through collection's workflow

-n or --notify

Kicks off the email alerting of the item(s) has(have) been imported

-t or --test

Test run‚ do not actually import items

-p or --template

Apply the collection template

-R or --resume

Resume a failed import (Used on Add only)

-h or --help

Command help

-z or --zip

Name of zipfile

‡ These are mutually exclusive.

The item importer is able to batch import unlimited numbers of items for a particular collection using a very simple CLI command and 'arguments'.

Adding Items to a Collection from a directory

...

The above command would cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item directories to item handles. SAVE THIS MAP FILE. Using the map file you You can use it for replacing or deleting (unimporting) the filemapped items.

Testing. You can add --test (or -t) to the command to simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the actual import.

...

The above command would unpack the zipfile, cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item directories to item handles. SAVE THIS MAP FILE. Using the map file you You can use it for replacing or deleting (unimporting) the filemapped items.

Testing. You can add --test (or -t) to the command to simulate the entire import process without actually doing the import. This is extremely useful for verifying your import files before doing the actual import.

Replacing Items in a Collection

Replacing existing items is relatively easy. Remember that mapfile you saved above? Now you will use it. The command (in short form):

...

  • Resume. If, during importing, you have an error and the import is aborted, you can use the --resume (-R) flag that you can try to resume the import where you left off after you fix the error.

  • Specifying the owning collection on a per-item basis from the command line administration tool

    If you omit the -c flag, which is otherwise mandatory, the ItemImporter searches for a file named "collections" in each item directory. This file should contain a list of collections, one per line, specified either by their handle, or by their internal db id. The ItemImporter then will put the item in each of the specified collections. The owning collection is the collection specified in the first line of the collections file.

    If both the -c flag is specified and the collections file exists in the item directory, the ItemImporter will ignore the collections file and will put the item in the collection specified on the command line.

    Since the collections file can differ between item directories, this gives you more fine-grained control of the process of batch adding items to collections.

  • Importing with BTE

    The DSpaceOutputGenerator, which writes the metadata into the DSpace Simple Archive Format, has been updated to produce the collections file, if a metadata field named "collections" (reserved word) exists in the original metadata. This is mainly applicable to the CSV input format which is more flexible, but could also be implemented with a Modifier that adds the "collections" field to each Record in the BTE pipeline.

    Important note: an entry with the "collections" key should be in the output map that is used by the DSpaceOutputGenerator.
    More info in Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, TSV, CSV) and online services (OAI, arXiv, PubMed, CrossRef, CiNii).

...

  1. Items, i.e. the metadata and their bitstreams, must be in the Simple Archive Format describer described earlier in this chapter. Thus, for each item there must be a separate directory that contains the corresponding files of the specific item.
  2. Moreover, in each item directory, there can be another file that describes the collection or the collections that this item will be added to. The name of this file must be "collections" and it is optional. It has the following format:


    Each line contains the handle of the collection. The collection in the first line is the owning collection while the rest are the other collection collections that the item should belong to.
  3. Compress the item directories into a zip file. Please note that you need to zip the actual item directories and not just the directory that contains the item directories. Thus, the final zip file must directly contain the item directories.
  4. Place the zip file in a public domain URL, like Dropbox or Google Drive or wherever you have access to do so. Since such a zip file can be very big in size, the batch import UI needs the URL to download it for a public location rather than just upload it and get a timeout exception

...

  1. Login as an administrator.
  2. Find the menu on the top right of the page, and select the "Administer" option.



  3. Select the "Batch Import" option from the "Content" drop down menu on the top of the page.



  4. Fill in the form that appears as follows:

  • Field #1: select the type of the input data that you want to batch import. Be sure to select "Simple Archive Format" in this drop down menu.
  • Field #2: Copy/Paste the public URL where the zip file mentioned earlier is located.
  • Filed #3: Select the owning collection of the items you are importing. This field is optional, meaning that if you leave it empty, you are supposed to must include per item collection information (via the "collections" file mentioned before) in the Simple Archive Format.
  • Field #4: Select the other collections the item will belong to. You can select more than one collection by just holding down the Ctrl key on your keyboard.  If you select the owning collection in this multiselect input control, it will be ignored at the very end.

...

C. View past batch imports (that have be done via the UI)

  1. Login.
  2. Visit "My DSpace" page.

  3. On the next page, you can see the history of batch imports. For each import, the following information is available:

    The status of the batch import (success or failure)
    The number of items that the user tried to import
    The number of items that were actually imported

     

    Moreover, the user can take the following actions:

    Download the map file that was produced during the import. This file contains a list of items that were imported with the corresponding handle assigned to them by DSpace.

    Delete the imported items. Everything that was imported will be deleted (including the history directory in the "[dspace]/import" directory)

    In case of failure, the user can "Resume" the import. The user is taken to the upload form again, but the system recognizes the initial import (and the map file) in order to resume the old import. There is a red label in the form that informs the user about the "Resume" form.

...

The item exporter can export a single item or a collection of items, and creates a DSpace simple archive in the aforementioned format for each exported item. The items are exported in a sequential order in which they are retrieved from the database. As a consequence, the sequence numbers of the item subdirectories (item_000, item_001) are not related to DSpace handle or item idsIDs.

Command used:

[dspace]/bin/dspace export

Java class:

org.dspace.app.itemexport.ItemExport

Arguments short and (long) forms:

Description

-t or --type

Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You will actually key in the keywords in all caps. See examples below.)

-i or --id

The ID or Handle of the Collection or Item to export.

-d or --dest

The destination path where you want the file of items to be placed.

-n or --number

Sequence number to begin

export the items

with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export directory is the same as the layout used for import.

-m or --migrate

Export the item/collection for migration. This will remove the handle and any other metadata that will be re-created in the new instance of DSpace.

-x or --exclude-bitstreamsDo not export bitstreams
, see
.  See the usage scenario below.

-h or --help

Brief Help.

Exporting a Collection

The CLI command to export the items of a collection:

...

Code Block
[dspace]/bin/dspace export -t COLLECTION -i collectionID_or_handle -d /path/to/destination -n seq_num

...

The keyword COLLECTION means that you intend to export an entire collection. The ID can either be the database ID or the handle. The exporter will begin numbering the simple archives with the sequence number that you supply.

Exporting a Single Item

To export a single item use the keyword ITEM and give the item ID as an argument:

...