Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DSpace has a set of command line tools for importing and exporting items in batches, using the DSpace Simple Archive Format. Apart from the offered functionality, these tools serve as a prime an example for users who aim to implement their own item importer.

...

The basic concept behind the DSpace's Simple Archive Format is to create an archive, which is directory full of items, with a a directory containing one subdirectory per item. Each item directory contains a file for the item's descriptive metadata, and the files that make up the item.

...

  1. Create a separate file for the other schema named metadata_[prefix].xml, where the [prefix] is replaced with the schema's prefix.
  2. Inside the xml file use the dame Dublin Core syntax, but on the <dublin_core> element include the attribute schema=[prefix].
  3. Here is an example for ETD metadata, which would be in the file metadata_etd.xml:

    Code Block
    <?xml version="1.0" encoding="UTF-8"?>
    <dublin_core schema="etd">
         <dcvalue element="degree" qualifier="department">Computer Science</dcvalue>
         <dcvalue element="degree" qualifier="level">Masters</dcvalue>
         <dcvalue element="degree" qualifier="grantor">Texas>Michigan AInstitute &of M<Technology</dcvalue>
    </dublin_core>

Importing Items

...

  • Resume. If, during importing, you have an error and the import is aborted, you can use the --resume (-R) flag that you can try to resume the import where you left off after you fix the error.

  • Importing (per item) into multiple collections from the command line administration tool

    If you omit the -c flag, which normally is mandatory, the ItemImporter searches for a file named "collections" in each item directory. This file should contain a list of collections, one per line, specified either by their handle, or by their internal db id. The ItemImporter then will put the item in each of the specified collections. The owning collection is the collection specified in the first line of the collections file.

    If both the -c flag is specified and the collections file exists in the item directory, the ItemImporter will ignore the collections file and will put the item in the collection specified on the command line.

    Since the collections file can differ between item directories, you have more fine grained control of the process of batch adding items to collections.

  • Importing with BTE

    The DSpaceOutputGenerator, that writes the metadata in DSpace Simple Archive Format, has been updated to produce the collections file, if a metadata field named collections (reserved word) exists in the original metadata. This is mainly applicable to the CSV input format which is more flexible, but could also be implemented with a Modifier that adds the collections field to each Record item in the BTE pipeline.

    Important note: an entry with key "collections" should be in the output map that is used by the DSpaceOutputGenerator.
    More info here

...

Batch import can also take place via the Administrator’s UI. The steps to follow are:

A. Prepare the data

  1. Records, that is, Items, i.e. the metadata and their bitstreams, must be in the Simple Archive Format describer earlier in this chapter. Thus, for each record item there must be a separate folder directory that contains the corresponding files of the specific recorditem.
  2. Moreover, in each record folderitem directory, there can be another file that describes the collection or the collections that this record item will be added to. The name of this file must be “collections” "collections" and it is optional. It has the following format:


    Each line contains the handle of the collection. The collection in the first line is the owning collection while the rest are the other collection the item should belong to.
  3. Compress the record folders in item directories into a zip file. Be careful, Please note that you need to zip the whole list of record folders and actual item directories and not just the folder directory that has within it the record folderscontains the item directories. Thus, the final zip file must have inside it the records folder straight throughdirectly contain the item directories.
  4. Place the zip file in a public domain URL, like Dropbox or Google Drive or wherever you have access to do so. Since such a zip file can be very big in size, the batch import UI needs the URL to download it for a public location rather than just upload it and get a timeout exception

B. Import the records items via the UI

  1. Login as an administrator
  2. Find the menu on the top right of the page, and select the “Administer” "Administer" option



  3. Select the option “Batch Import” "Batch Import" option from the "Content" drop down menu named “Content” on  on the top of the page



  4. Fill in the form that appears as follows:

  • Field #1: select the type of the input data that you want to batch import. Be sure to select “Simple "Simple Archive Format” Format" in this drop down menu
  • Field #2: Copy/Paste the public URL where the zip file mentioned earlier is located
  • Filed #3: Select the owning collection of the records items you are importing. This field is optional meaning that if you leave it empty, you are supposed to include per item collection information (via the “collections” "collections" file mentioned before) in the Simple Archive Format
  • Field #4: Select the other collections the record item will belong to. You can select more than one collection by just holding down the Ctrl key on your keyboard.  If you select the owning collection in this multiselect input control, it will be ignored at the very end.

...

1) If you select an owning collection from this form, then , the “collections” "collections" file that may be included in the records will item will be ignored.

2) If you do not specify an owning collection, and for some items , there exist no “collections” file in the record folderno "collections" file exists in the item directory, then the item will not be imported in DSpace

...

C. View past batch imports (that have be done via the UI)

  1. Login
  2. Visit “My DSpace” "My DSpace" page

  3. In On the new next page, you can see the history of batch imports that have be done in the past. For each import, the following information is available:

    The status of the batch import (success or failure)
    The number of records that were finally importeditems that the user tried to import
    The number of items that were actually imported

    records that the user tried to import

     

    Moreover, the user can take the following actions:

    Download the map file that was produced during the import. In this file, someone can see the records that were imported and This file contains a list of items that were imported with the corresponding handle that was given assigned to them by DSpace.

    Delete the imported recordsitems. Everything that was imported will be deleted (and including the history folder in the “import” directory in the DSpace installation folder"[dspace]/import" directory)

    In case of failure, the user can “Resume” "Resume" the import. He is transferred in The user is taken to the upload form again, but the system knows recognizes the initial import (and the map file) in order to resume the old import. There is a red label in the form that informs the user for about the “Resume” "Resume" form.

Exporting Items

The item exporter can export a single item or a collection of items, and creates a DSpace simple archive according to in the aforementioned format for each exported item to be exported. The items are exported in a sequential order in which they are retrieved from the database. As a consequence, the sequence numbers of the item subdirectories (item_000, item_001) are not related to DSpace handle or item id'sids.

Command used:

[dspace]/bin/dspace export

Java class:

org.dspace.app.itemexport.ItemExport

Arguments short and (long) forms:

Description

-t or --type

Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You will actually key in the keywords in all caps. See examples below.)

-i or --id

The ID or Handle of the Collection or Item to export.

-d or --dest

The destination of path where you want the file of items to be placed. You place the path if necessary.

-n or --number

Sequence number to begin export the items with. Whatever number you give, this will be the name of the first directory created for your export. The layout of the export directory is the same as you would set your the layout used for an Importimport.

-m or --migrate

Export the item/collection for migration. This will remove the handle and metadata that will be re-created in the new instance of DSpace.

-h or --help

Brief Help.

Exporting a Collection

To export a collection's items you type at the CLIThe CLI command to export the items of a collection:

Code Block
[dspace]/bin/dspace export --type=COLLECTION --id=collIDcollectionID_or_handle --dest=dest_dir/path/to/destination --number=seq_num

Short form:

Code Block
[dspace]/bin/dspace export -t COLLECTION -i [CollID or Handle]collectionID_or_handle -d /path/to/destination -n Someseq_numbernum

Exporting a Single Item

The keyword COLLECTION means that you intend to export an entire collection. The ID can either be the database ID or the handle. The exporter will begin numbering the simple archives with the sequence number that you supply. To export a single item use the keyword ITEM and give the item ID as an argument:

Code Block
[dspace]/bin/dspace export --type=ITEM --id=itemID_or_handle --dest=dest_dir/path/to/destination --number=seq_num

Short form:

Code Block
[dspace]/bin/dspace export -t ITEM -i [itemID or Handle]_or_handle -d /path/to/destination -n someseq_numbernum

Each exported item will have an additional file in its directory, named '"handle'". This will contain the handle that was assigned to the item, and this file will be read by the importer so that items exported and then imported to another machine will retain the item's original handle.

...