Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

The repository home for this project is: https://github.com/peterdietz/SAFBuilder

The input for a command-line batch ingest of materials to DSpace is well documented, and is called "Simple Archive Format", however there needs to be a tool that easily facilitates creating a Simple Archive Format package. The use case satisfied with the Simple Archive Format Packager is that someone has a spreadsheet filled with metadata as well as content files that are eventually destined for repository ingest.

Thus the input to the Simple Archive Format Packager is a spreadsheet (.csv) that has the following columns:

  • filename for the bitstream/file
  • metadata with namespace.element.(qualifer). Examples would be: dc.description or dc.contributor.author

Java Compiling and Running Instructions

The commands below will: check out the code from Git, download the external java libraries used to make the tool, compile the source code, and execute it.

git clone git://github.com/peterdietz/SAFBuilder.git
cd SAFBuilder
wget http://mirrors.ibiblio.org/pub/mirrors/maven2/net/sourceforge/javacsv/javacsv/2.0/javacsv-2.0.jar
wget http://mirrors.ibiblio.org/pub/mirrors/maven2/xmlwriter/xmlwriter/2.2/xmlwriter-2.2.jar
wget http://mirrors.ibiblio.org/pub/mirrors/maven2/commons-io/commons-io/1.4/commons-io-1.4.jar
mkdir classes
javac -classpath javacsv-2.0.jar:commons-io-1.4.jar:xmlwriter-2.2.jar src/edu/osu/kb/batch/*.java -d classes
java -cp classes edu.osu.kb.batch.BatchProcess

The final command will then give you the arguments used to invoke the program.

USAGE: BatchProcess /path/to/directory metadatafilename.csv
Hint -- directory: Use absolute path and no trailing slashes
Hint -- metadatafilename: needs to be in the directory, as do the content files

There is sample data included with the tool to give an idea of how to use this.

To run the tool over the sample data:

java -cp classes:javacsv-2.0.jar:commons-io-1.4.jar:xmlwriter-2.2.jar edu.osu.kb.batch.BatchProcess /home/peter/NetBeansProjects/SAFBuilder/src/edu/osu/kb/sample_data AAA_batch-metadata.csv

This creates the SimpleArchiveFormat directory inside of the directory specified, along with subdirectories, content files, metadata files that is ready to import into DSpace.

Further Work

This packager works as a stand-alone tool, and requires knowledge of Java to be able to run. Thus satisfying the initial need to be able to package many items to be batch loaded into DSpace, using DSpace's launcher item-import. So the remaining goal of this project is to streamline the process of batch loading materials into DSpace.

Possibilities include:

  • refactoring so that it can become a Packager Plugin. Packager plugins allow you to implement a way for DSpace to accept an input package (containing content files, manifest, and metadata) that then creates DSpace items.
  • creating a client GUI for the desktop.
  • No labels