Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

EndNote /export/export-endnote -i endnote
InputCommand
BibTex[dspace]/bin/dspace import -b -m mapFile -e  example@email.com -c 123456789/1 -s /export/exportpath-to-my-bibtex-file -i bibtex
CSV[dspace]/bin/dspace import -b -m mapFile -e example@email.com -c 123456789/1 -s /export/export path-to-my-csv-file -i csv
TSV[dspace]/bin/dspace import -b -m mapFile -e example@email.com -c 123456789/1 -s /export/export path-to-my-tsv-file -i tsv
RIS[dspace]/bin/dspace import -b -m mapFile -e example@email.com -c 123456789/1 -s /export/export-ris -i ris path-to-my-ris-file -i ris
EndNote[dspace]/bin/dspace import -b -m mapFile -e example@email.com -c 123456789/1 -s path-to-my-endnote-file -i endnote
OAI-PMH[dspace]/bin/dspace import -b -m mapFile -e example@email.com -c 123456789/1 -s path-to-my-ris-file -i ris
arXiv[dspace]/bin/dspace import -b -m mapFile -e example@email.com -c 123456789/1 -s path-to-my-arxiv-file -i arxivXML
PubMed[dspace]/bin/dspace import -b -m mapFile -e example@email.com -c 123456789/1 -s path-to-my-pubmed-file -i pubmedXML
CrossRef[dspace]/bin/dspace import -b -m mapFile -e example@email.com -c 123456789/1 -s path-to-my-crossref-file -i crossrefXML

Keep in mind that the value of the "-e" option must be a valid email of a DSpace user and value of the "-c" option must be the target collection handle. Attached, you can find a .zip file that includes examples of all the file formats that are mentioned above.

BTE Configuration

The basic idea behind BTE is that the system holds the metadata in an internal format using a specific key for each metadata field. DataLoaders load the record using the aforementioned keys, while the output generator needs to map these keys to DSpace metadata fields.

The BTE configuration file is located in path: [dspace]/config/spring/api/bte.xml and it's a Spring XML configuration file that consists of Java beans. (If these terms are unknown to you, please refer to Spring Dependency Injection web site for more information.)

 
Explanation of beans:

Code Block
languagehtml/xml
<bean id="grorg.ektdspace.bteapp.coreitemimport.TransformationEngineBTEBatchImportService" />

This bean is instantiated when the import takes place. It deploys a new  BTE transformation engine that will do the transformation from one format to the other. It needs one input argument, the workflow (the processing step mentioned before) that will run when transformation takes place. Normally, you don't need to modify this bean. 

 

Code Block
languagehtml/xml
<bean id="org.dspace.app.itemimport.DataLoaderService" />

the top level bean that describes the service of the batch import from the various external metadata formats. It accepts three properties:

a) dataLoaders: a list of Within this bean we declare all the possible data loaders that we need to supportare supported. Keep in mind that for each data loader we specify a key that can be used as the value of option "-i" in the import script that we mentioned earlier. Here is the point where you would add a new custom DataLoader in case the default ones doesn't match your needs.

Moreover, this bean holds the "outputMap" which is b) outputMap: a Map between the internal keys that BTE service uses to hold metadata and the DSpace metadata fields. (See later on, how data loaders specify the keys that BTE uses to hold the metadata)

c) transformationEngine: the BTE transformation engine that actually consisits of the processing steps that will be applied to metadata during their import to DSpace

 

Code Block
languagehtml/xml
<bean id="batchImportTransformationEngine" />

This bean is instantiated when the batch import takes place. It deploys a new  BTE transformation engine that will do the transformation from one format to the other. It needs one input argument, the workflow (the processing step mentioned before) that will run when transformation takes place. Normally, you don't need to modify this bean.  

 

Code Block
languagehtml/xml
<bean id="batchImportLinearWorkflow"linearWorkflow" />

This bean describes the processing steps. Currently, there are no processing steps meaning that all records loaded by the data loader will pass to the output generator, unfiltered and unmodified. ( See next section "Case studies" for info about how to add a filter or a modifier)

 

Code Block
languagehtml/xml
<bean id="bibTeXDataLoader" />
<bean id="csvDataLoader" />
<bean id="tsvDataLoader" />
<bean id="gr.ekt.bteio.loaders.BibTeXDataLoaderrisDataLoader" />
<bean id="gr.ekt.bteio.loaders.CSVDataLoaderendnoteDataLoader" />
<bean id="gr.ekt.bteio.loaders.TSVDataLoaderpubmedFileDataLoader" />
<bean id="gr.ekt.bteio.loaders.RISDataLoaderarXivFileDataLoader" />
<bean id="gr.ekt.bteio.loaders.EndnoteDataLoadercrossRefFileDataLoader" />
<bean id="gr.ekt.bteio.loaders.OAIPMHDataLoaderoaipmhDataLoader" />

Each one of the 5 first These data loaders (which are actually are of two types: "file" data loaders ) has and "online" data loaders. The first  8 of them belong to file data loaders while the last one (OAI data loader) is an online one.

The file data loaders have the following properties:

...

c) quoteChar: This property specifies the quote character used in the CSV file. The default value is the double quote character (").

 

For the The OAIPMHDataLoader,has the supported properties arefollowing properties:

a) fieldMap: Same as above, the mapping between the input keys holding the metadata and the ones that we want to have internal in BTE.

...

c) prefix: The metadata prefix to be used in OAI requests.

 

Since DSpace administrators may have incorporated their own metadata schema within DSpace (apart from the default Dublin Core schema), they may need to configure BTE to match their custom schemas.

So, in case you need to process more metadata fields than those that are specified by default, you need to change the data loader configuration and the output map.

Case Studies

Since DSpace administrators may have incorporated their own metadata schema within DSpace (apart from the default Dublin Core schema), they may need to configure BTE to match their custom schemas.

1) I have my data in a format different from the ones that are supported by this functionality. What can I do?

Either you try to easily transform your data to one of the supported formats or you need to create a new data loader. To do this, create a new Java class that implements the following Java interface from BTE:

...

in which you have to create records - most probably you will need to create your own Record class (by implementing the gr.ekt.bte.core.Record classinterface) and fill a RecordSet. Feel free to add whatever code you like in this method, even to read data from multiple sources. All you need is just to return a RecordSet of Records.

You may also extend the abstract class

Code Block
gr.ekt.bte.core.dataloader.FileDataLoader

if you want to create a "file" data loader in which you need to pass a filepath to the file that the loader will read the data from. Normally, a simple data loader is enough for the system to work, but file data loaders are also utilized in the administration UI discussed later in this documentation. 

After that, you will need to declare the new DataLoader in the Spring XML configuration file (in the bean with id="org.dspace.app.itemimport.DataLoaderServiceBTEBatchImportService") using your own unique key. Use this key as a value for option "-i" in the batch import key in order to specify that your the specific data loader must run.

 
2) I need to filter some of the input records or modify some value from records before outputting them
 
In this case you will need to create your own filters and modifiers.
 
To create a new filter, you need to extend the following BTE abstact class:

...