Title: An Add-On to facilitate the existing DSpace Batch Import Procedure

Student: Blooma Mohan John

Mentor: Jayan C Kurian

Co-Mentors: Stuart Lewis, Richard Jones, ???

Contents

Abstract:

Efficient content acquisition strategies make it easier to import scholarly information into repositories. DSpace supports batch content acquisition through the ItemImport procedure. This procedure requires digital resources to be represented in a Submission Information Package (SIP). The lead time required for preparing this format can be facilitated by encoding document metadata and digital resource location in a spreadsheet. This has been implemented at The Nanyang Technological University (Singapore), The Institute of Scientific and Technical Information of CNRS (INIST-CNRS, France), The University of Calgary Library, National Informatics Centre (India), and The Lanzhou Branch of Chinese Academy of Sciences (China). Few recent requests include The University of Waikato Library, The University of Sydney Library and the NITLE (U.S.A). Although the current implementation on Windows environment looks promising for the user community, there has been considerable request (New York University Library, Raman Research Institute Library (India)etc) to make this development compatible with the UNIX environment. It's anticipated that this add-on would facilitate content acquisition in DSpace installations.

Project Plan:

  1. Contact DSpace administrators from a pool of geographically spread DSpace instances and gather information regarding widely used mode of importing items in their repositories.
  2. Initially a standalone application would be implemented using Java.
  3. The developed prototype is implemented and tested on Windows and Linux platforms.
  4. In the next stage the program would be integrated as a stand alone web application using JSP technology. The user interface enables retrieving detailed metadata descriptions and item resource locations to automatically generate Submission Information Packages.
  5. Finally explore implementing this development as an Eclipse RCP Application (suggested by Mark R. Diggory, DSpace Systems Manager) to ensure OS portability.

Development Progress:

Project Deliverables:

Algorithm

  1. Start
  2. Input the Main Submission Information Package (SIP) folder name
  3. Create SIP folder with the name mentioned in Step 2
  4. Store digital object metadata and location details in a Resultset
  5. For each record in Resultset do Step 6 to Step 13
  6. Create individual SIP folder
  7. Create an xml file named dublin_core inside the SIP folder
  8. Create a file named contents inside the SIP folder
  9. Check the type of digital object
  10. Add comments about digital object to dublin core file
  11. Add digital object file name to contents file
  12. Copy digital object from an external location to individual SIP folder.
  13. Add metadata details to dublin core file.
  14. Stop

Future Work:

In the long run this project could be extended for an automatic extraction of metadata descriptions from digital resources (e.g. University theses that has standard format) using template based extraction techniques, populating an intermediate collection template to generate SIPs for an automatic batch import.

My University, School, and Me:

Nanyang Technological University

Wee Kim Wee School of Communication and Information

Blooma Mohan John

References: