This functionality has been introduced in DSpace-CRIS 5.6.0 and it is considered experimental. It hasn't tested extensively so we like to receive feedback from the community

DSpace-CRIS provides a lot of way to import, update and manipulate both native dspace objects than CRIS objects in bulk. Other than the ones offered by a basic DSpace it is possible to use (also from the UI) excel files (CRIS Objects) or adhoc simplified database tables (currently only DSpace items) to perform operation over the data.

future plan

We hope to extend the framework to perform operations directly over the CRIS entities as well. It should be noted that right now operation on DSpace item can result in creation or update to related CRIS entities automatically as by the Filler functionality

The following database tables have been introduced:

  • imp_record: contains information about the operations to perform. Each row represent a specific operation on a single item
  • imp_metadatavalue: contains all the metadata associated with an item that need to be created or updated (optional)
  • imp_bitstream: contains all the information related to bitstreams to attach / replace in the item (optional)
  • imp_record_to_item: this table is populated by the framework to track the result of creation action so that subsequent operation over the same origin record will result in update instead of duplication of entries

To elaborate the imp_* tables you need to run the following script

org.dspace.app.cris.batch.ItemImportMainOA

-p Send the email for the in archive event to the authors, coauthors, etc. - the workflow email are EVER disabled
-E BatchJob User email
-x Indexing disabled (improve performance)
-n Summary EMail disabled (improve performance)
-b Delete bitstream related to the item in the update phase (you need to provide details about the new bitstream or the bitstream to keep in the imp_bitstreams table)
-m List of metadata that are cleanup before to perform the operation. By default all metadata are delete, specifying only the dc.title it will obtain an append on the other metadata. Use this option many times on the single metadata e.g. -m dc.title -m dc.contributor.*
-s Invert the logic for the -m option, using the option -s only the metadata list with the option -m are saved (ad es. -m dc.description.provenance) the other will be delete
-S muted logs
-t Threads numbers (default 0, if omitted read by configuration). Very experimental.

imp_record

  • imp_id: the unique ID used to link the operation with the additional data in the other imp_* tables
  • imp_record_id: an unique ID for the record in the external source system. This is used togheter with the imp_sourceref to guarantee that subsequent operation over the "same" source record will be performed always on the same DSpace object without forcing the external system to know about DSpace-CRIS
  • imp_sourceref: an unique acronym for the system that have provided the data 
  • imp_eperson_id: the id of the eperson to use to perform the action
  • imp_collection_id: the collection where create the item if relevant
  • status: can be one of the following values:
    • p = workspace 
    • w = workflow step 1
    • y = workflow step 2
    • x = workflow step 3
    • z = in archive
    • g = withdrawn
  • operation: can be one of update or delete. Update is used also for record creation
  • integra: not used, to be revisited to manage versioning
  • last_modified: must be empty. It will be populated when the record is used
  • handle: only for creation of new item is it possible to force a specific handle , otherwise the system will assign a new one in the usual way

imp_metadatavalue

  • imp_metadatavalue_id: an unique id sequence generated
  • imp_id: link to the imp_record main table
  • imp_schema: the shortname of the schema (dc, dcterms, etc.)
  • imp_element: the element
  • imp_qualifier: the qualifier
  • imp_value: the textual value of the metadata
  • imp_authority: the authority key if any for this value. Since 40eeb989c4354731c0ee3fce6e80d6df64b80c94 the authority and confidence values are used by default as is forcing the metadata creation to skip the getBestMatch method of the authority framework. To guess a potential match it is possible to use the value, case insensitive, [GUESS], to force the use of the authority framework getBestMatch method. 
  • imp_confidence: the confidence of the authority if any (600 mean accepted match)
  • imp_share: not used, for future use
  • metadata_order: used to sort the metadata values within the same schema.element.qualifier to insert/update 
  • text_lang: the lang for the metadata value (en, it, etc.)

imp_bitstream

  • imp_bitstream_id: an unique id sequence generated
  • imp_id: link to the imp_record main table
  • filepath
  • description
  • bundle: the name of the Bundle where put the bitstream (ORIGINAL, TEXT, etc.)
  • bitstream_order: to sort the processing of the rows
  • primary_bitstream: flag to mark the bitstream as primary
  • assetstore
  • name
  • imp_blob: the content of the bitstream (alternative to filepath)
  • embargo_policy: can be one of:
    • 0 --> mean open access
    • 1 --> embargo (need to use also the embargo_start_date column)
    • 2 --> assign a READ policy to epersongroup ID 2 (you need to create a epersongroup with such ID for "authorized users")
    • 3 --> assign a READ policy only to the administrators group
  • embargo_start_date: to use as start date of Anonymous READ  policy when embargo_policy = 1



  • No labels