Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning
titleDSpace 7.0 does not yet support this

OAI Harvesting is not available in DSpace 7.0. It is scheduled to be restored in a later 7.x release (currently has been added in DSpace 7.1), see DSpace Release 7.0 Status

Harvesting from another DSpace

...

Setting up a harvest to import content into a collection

There are two options to set up a collection for harvesting. One is by using the DSpace scripts "harvest", the other is by setting up the content source of a collection through the UI.

Using the "harvest" script

The harvest script can be called from both the CLI and REST API by calling "harvest". It uses the paramaters as defined in the following table.

Short optionLong optionArgumentExplanation
-p--purgeNODelete all the items in the collection provided with the -c parameter.
-r--runNORun the standard harvesting procedure for the collection provided with the -c parameter.
-g--pingNOVerify that the server provided through the -a parameter and the set provided through the -i parameter can be resolved and work.
-s--setupNOSet the collection provided with the -c parameter up for harvesting. The server will need to be provided through the -a parameter, and the oai set id needs to be provided by the -i parameter.
-S--startNOStart the harvest loop for all collections.
-R--resetNOReset the harvest status on all collections.
-P--purgeCollectionsNOPurge all harvestable collections.
-o--reimportNOReimport all items the items in the collection provided by the -c parameter. This is the equivalent of running both the -p and the -r command for the provided collection.
-c--collectionYESThe harvesting collection (handle or id)
-t--typeYESThe type of harvesting: 0 for no harvesting, 1 for metadata only, 2 for metadata and bitstream references (requires ORE support), 3 for metadata and bitstreams (requires ORE support)
-a--addressYESThe address of the OAI-PMH server to be harvested
-i--oai_set_idYESThe id of the PMH set representing the harvested collection. In case all sets need to harvested the value "all" should be provided.
-m--metadata_formatYESThe name of the desired metadata format for harvesting, resolved to namespace and crosswalk in the dspace.cfg
-h --helpNOPrint the help message
-e--epersonYES(CLI ONLY) The eperson that performs the harvest. When the command is used from the REST API, the currently logged in user will be used.

...

Examples of harvesting a collection through CLI commands

1. Verify whether the harvester source can be reached

dspace/bin/dspace -g -a https://harvest.source.org -i harvest-set

Replace https://harvest.source.org with the source you want to use, the harvest-set with the set/sets you want to harvest or all in case you want to harvest all sets.

2. Set up a collection for harvesting

dspace/bin/dspace harvest -s -c 123456789/123 -a https://harvest.source.org -i harvest-set -m dc -t 1

Replace the 123456789/123 with your collection, https://harvest.source.org with the source you want to use, the harvest-set with the set/sets you want to harves or all in case you want to harvest all sets. The -m parameter indicated the metadata format to be used and the -t parameter indicates the harvest type to be used. When the value 0 is used for -t , harvesting will be disabled.

3. Run the harvest for the set up collection

dspace/bin/dspace harvest -r -c 123456789/123 -e harvest-user@dspace.org 

Replace the 123456789/123 with your collection, the harvest-user@dspace.org with an existing user in DSpace that has sufficient rights to perform the ingestion.

Setting up a harvest content source from the UI

A collection can be configured to retrieve its content from an external source. This can be done from the "Edit Collection" UI by using the following steps.

1. Configure the collection to harvest its content from an external source

Navigate to the "Edit collection" > "Content Source" tab. Tick the checkbox "This collection harvests its content from an external source".

Image Modified

2. Configure the harvest source

Once the checkbox has been ticket, the OAI provider, set id and metadata format can be configured. An example of the configuration can be found in the image below.

Image Modified

When all sets need to be harvested, the field can be left empty.

The server configuration will be tested upon clicking the "Save" button.

3. Start the harvest

Click the "Import Now" button to start the import. When the import has started, the button will indicate that the import is in progress, however, there is no need to remain on this page as the harvest will continue to run after leaving this page.

Image Modified

If the current server configuration needs to be retested at a later point, the "Test configuration" button can be used. To fully reset the collection by

...

purging all items and starting a reimport, click the "Reset and reimport" button.