What is DSpace?
A groundbreaking digital repository system, DSpace captures, stores, indexes, preserves and redistributes an organization's research material in digital formats. Research institutions worldwide use DSpace for a variety of digital archiving needs - from institutional repositories (IRs) to learning object repositories or electronic records management, and more. DSpace is freely available as open source software you can customize and extend. An active community of developers, researchers and users worldwide contribute their expertise to the DSpace Community.
Who can join the DSpace Community?
Anyone who uses DSpace can get involved, in a number of different ways: programming, defining feature requirements, writing documentation, testing new features , sharing your design or marketing expertise. Get involved by joining the DSpace mailing lists, adding your projects, experiences, and comments to the DSpace Wiki, and collaborating with other DSpace community members.
Who built DSpace?
The MIT Libraries and Hewlett-Packard (HP) originally developed DSpace, but the software is now supported by DuraSpace. The system is now freely available to research institutions world-wide as an open source system that can be customized and extended.
Who manages DSpace?
DSpace is freely available as open source software. The DSpace Community manages the code base and releases new versions of the software. An active community of developers, researchers and users worldwide contribute their expertise to the DSpace Community.
How is DSpace different from other digital repositories?
DSpace is the first digital repository to address the myriad issues inherent in a multi-disciplinary archive, including:
- Differing policies, practices and cultures established by individual disciplines
- The variety of digital formats produced in today's multi-media research environments
- The complexity of metadata standards needed to accommodate and maintain access to the digital formats supported by the system.
DSpace is designed with a flexible storage and retrieval architecture adaptable to a multitude of data formats and distinct research disciplines, known as "communities." Each community has its own customized user portal that can use the community's own practices and terminology.
Does the DSpace project have a supporting organization?
The success of any open-source project lies with the community contributing its collective energy, knowledge, enthusiasm, and effort. DSpace is developed and supported by the user community, with the help and guidance of DuraSpace. DuraSpace is a not-for-profit organization formed in July 2009. The organizations which supported the DSpace project previously, the DSpace Foundation (2007-2009) and the DSpace Federation (2003-2004) have ceased operation. To learn more about DuraSpace, please visit www.DuraSpace.org. For technical questions about the DSpace software platform, please refer to Who provides technical support for the DSpace platform? listed below.
Is DSpace free?
Yes. The DSpace system is freely available as open-source software (see locations below), under the terms of the BSD distribution license. We have also tried to find good open-source tools to package with the DSpace application, all freely available under an open-source license (although not all the same license as the one for DSpace itself), so that you get a complete system along with the part that we created.
Who can download the software?
Where can I download the DSpace open-source software?
Can I change the DSpace system?
Yes, you can customize and extend the system to suit your organization's needs. DSpace was designed to make adapting it for individual organizations as easy as possible. See the section on how to contribute on the DSpace Wiki for information on submitting code changes to DSpace. Each application is different, but most organizations need to customize the authentication system, for example, to work with existing systems. Some organizations may want to substitute the open-source tools supplied with DSpace with different ones (for example, replacing PostgreSQL with mySQL or Oracle).
Where can I learn more about Open Source?
O'Reilly & Associates has a very helpful web site devoted to open source: http://opensource.oreilly.com/.
What kind of content does DSpace support?
DSpace accepts all manner of digital formats. Some examples of items that DSpace can accommodate are:
- Documents, such as articles, preprints, working papers, technical reports, conference papers
- Data sets
- Computer programs
- Visualizations, simulations, and other models
- Multimedia publications
- Administrative records
- Published books
- Overlay journals
- Bibliographic datasets
- Audio files
- Video files
- e-formatted digital library collections
- Learning objects
- Web pages
What are DSpace Communities and Collections?
Each DSpace service is comprised of Communities – groups that contribute content to DSpace – and Communities in turn each have Collections, which contain the content items, or files. In a university environment, for example, Communities might be departments, labs, research centers, schools, or some other administrative unit within an institution. Communities determine their own content guidelines and decide who has access to the community's contributions. An administrator on the DSpace team, usually the DSpace User Support Manager, works with the head of a community to set up workflows for content to be approved, edited, tagged with metadata, etc. Collections belong to a community or multiple communities (for example, research collaborations between two communities may result in a shared collection) and house the individual content items and files.
Can I export my digital material out of DSpace?
Yes. Currently DSpace has a Item Exporter which supports exporting digital content, along with its metadata, in a simple XML-encoded file format (where each item is exported into a separate directory). See the "Item Importer and Exporter" section of the DSpace Documentation's System Administration chapter for more details.
There are also basic packagers which allow exporting using the METS standard. For more information on these, see the "Package Importer and Exporter" section of the DSpace Documentation's System Administration chapter (see above link).
Can I import content into DSpace in batch mode?
Yes. Currently DSpace supports importing content in batch using a variety of options:
- DSpace Item Importer - Can import content in batch if it matches the "DSpace Simple Archive Format", which is the format generated by the DSpace Item Exporter
- See the Importing and Exporting Items via Simple Archive Format Item Importer and Exporter section of the DSpace Documentation's System Administration chapter for more details.
- Also see DSpace Batch Importer Overview (blog post by Dorothea Salo).
- Batch Editing Tool - allows you to import new metadata-only items (not bitstreams) (DSpace 1.6.0 and above).
- See the Batch Metadata Editing section of the DSpace Documentation's System Administration chapter for more details.
- Community and Collection Structure Importer- Allows you to import an entire Community/Collection hierarchy given an XML file.
- See the Importing Community and Collection Hierarchy section of the DSpace Documentation's System Administration chapter for more details.
- DSpace Packagers - Packagers exist which support importing METS documents provided they are in the DSpace METS SIP (Submission Information Package) format.
- See the Importing and Exporting Content via Packages section of the DSpace Documentation's System Administration chapter for more details.
- A prominent package format in DSpace is AIP which can hold and restore any part of the content including a community/collection, items, rights and epersons. See AIP Backup and Restore.
- DSpace SWORD Interface - DSpace comes with its own SWORD Server (the 'sword' webapp), which allows any SWORD client to submit documents electronically to DSpace.
Will DSpace interoperate with other systems running at my organization?
Yes, DSpace has documented Java APIs you can customize to allow interoperation with other systems an institution might be running (for example, a department's web document system auto-depositing in DSpace, or a campus data warehouse).
What sort of persistent identifiers does DSpace use?
DSpace requires that a persistent identifier is assigned to each digital object (Item, Collection, Community). Because the developers wanted a solution which will work for a very long time, the identifier system had to be independent of any underlying network protocols, such as HTTP.
DSpace uses the Handle System from CNRI (Corporation for National Research Initiatives) as the persistent identifier for each digital object. Handles are resolved to actual URLs via a resolution service. The Handle resolver is an open-source system. Handles in DSpace (and elsewhere) are currently implemented as HTTP URIs, but can also be modified to work with future protocols. The Handle system is also able to support existing bibliographic identifiers such as ISBN or ISSN.
In the current version of DSpace, Handles are used as internal identifiers. By default, DSpace utilizes a 'dummy' (non-external) Handle prefix of '123456789' when assigning Handles to new objects. If an organization wishes to obtain a valid Handle prefix (which can be resolved from external locations), one can be purchased from the Handle System site for a small annual service fee.
It should be noted that if an organization has a policy requiring the use of another persistent identifier system, it is possible to use it as the public or external persistent identifier to the resource. In such a case, the public identifier - for instance, a URN - could resolve to a DSpace-generated page which contains metadata about the resource - including the Handle-based persistent link to the resource itself.
Future versions of DSpace may support multiple internal persistent identifiers. However, this work is still under investigation, and we are looking for developers and institutions willing to volunteer to help with this project.
How does DSpace preserve digital material?
DSpace itself does not guarantee the preservation of your digital materials. However, DSpace software is suited to play a central role in your overall digital preservation strategy. Keep in mind that your local digital preservation strategy should likely include a backup/restore plan, along with virus checking, etc.
DSpace allows you to identify two levels of digital preservation: bit preservation, and functional preservation. Bit preservation ensures that a file remains exactly the same over time - not a single bit is changed - while the physical media evolve around it. Functional preservation goes further: the file does change over time so that the material continues to be immediately usable in the same way it was originally while the digital formats (and physical media) evolve over time. Some file formats can be functionally preserved using straightforward format migration, such as TIFF images or XML documents. Other formats are proprietary, or for other reasons are much harder to preserve functionally. No one can predict the formats all users will choose for their research material. They use the best tools for their purposes, and research institutions will get whatever formats those tools produce. For this reason, DSpace allows you to choose three levels of preservation for a given format: supported, known, or unsupported.
- Supported formats are those you feel you can functionally preserved using either format migration or emulation techniques. Examples include TIFF, SGML, XML, AIFF, and PDF.
- Known formats are those that you can't promise to preserve, such as proprietary or binary formats, but which are so popular that third party migration tools will likely emerge to help with format migration. Examples include Microsoft Word and Powerpoint, Lotus 1-2-3, and WordPerfect.
- Unsupported formats are those that you don't know enough about to do any sort of functional preservation. This would include some proprietary formats or a one-of-a-kind software program.
It is important to note that, although DSpace provides some default values for Supported, Known and Unknown formats, your institution should determine the appropriate values based on your local preservation strategy.
DSpace also provides other tools to help you to meet your preservation goals:
- Checksum Checker- This tool can be scheduled to perform a full fixity (checksum) check of all (or some) content files stored in your DSpace instance.
- For more information on the Checksum Checker, see the "Checksum Checker" section of the DSpace Documentation's System Administration chapter
Where can I find DSpace technical documentation?
I've installed DSpace and I have questions/problems/comments. What should I do?
The DSpace Community of developers support one another and exchange ideas and solutions on the DSpace mailing lists. Before you post a question or problem, check to see if your question has been answered already.
- Start by searching the DSpace-Tech mailing list archives.
- Also check the Technical FAQ and check the technical documentation.
- If you still haven't found an answer or solution, post your questions to dspace-tech (tech support list), where members of the DSpace community will offer their assistance.
More hints/tips are available on the How-To Troubleshoot an Error page.
I've found a bug in the software. How do I report it?
You can report bugs and suggest enhancements through the Software Bug/Feature Tracking System (JIRA).
Bugs will be fixed as soon as possible, within the limits of the DSpace team's technical support resources. The team considers all enhancements, and if an enhancement is accepted, adds it to the enhancement list for development as time and resources allow. Of course, any users working with the open-source code are welcome to fix a bug or make an improvement to the system. See our DSpace Contribution Guidelines to learn how.
Who provides technical support for the DSpace software platform?
DSpace has a very active community of developers which contribute expertise and support through the DSpace-Tech mailing list, and the DSpace wiki. To work with the DSpace system you'll need local technical resources (hardware, technical experts, and so on) to really take advantage of the system. The DSpace web site offers technical documentation, and you can join the DSpace-Tech mailing list, to ask questions or post solutions.
In case you require professional assistance, consult one of the Registered DSpace Service Providers.
What sort of hardware does DSpace require? What about sizing the server? How much disk space do I need?
DSpace is written in Java, it will therefore run on any Operating System (Linux, Windows, Mac OSX). DSpace is built on top of free, open-source tools, such as the Apache Web server, the Tomcat Servlet engine, and the PostgreSQL relational database system. For your convenience, we package the necessary JDBC and other drivers and libraries together with DSpace. This set of tools should run on any UNIX-type OS, such as Linux, HP/UX, or Solaris, and you can substitute other libraries if you need to run on another platform. The system runs on anything from a laptop to a $500K server, but there are a few general recommendations for hardware architectures. In a production setting where DSpace is actively used in public, DSpace requires a reasonably good server (see below) and a decent amount of memory and disk storage. For such production usage, following requirements are meant as a guideline:
Minimal DSpace Production system requirements
- 2-3 GB of Random Access Memory (RAM)
- 1GB for Tomcat (e.g. "TOMCAT_OPTS=-server -Xms1024M -Xmx1024M -XX:MaxPermSize=128M -Dfile.encoding=UTF-8")
- 1GB for Database (PostgreSQL or Oracle).
- Keep in mind your Operating System also needs some memory to function. So, while DSpace may only need ~2GB of memory, you should ensure the computer itself has at least 3-4GB of RAM available overall.
- 20 GB of Storage (or roughly enough storage for all the files you wish to store in DSpace)
This minimal system should be able to support DSpace sites of roughly 20,000 items or less. Though the exact number of items will depend on the amount of activity (searches, accesses, downloads, etc) within the DSpace site.
An empty installation of DSpace will effectively require less than 1GB of storage. The storage estimates are very rough. The actual amount of storage you will need depends on the size of the files you plan to store in DSpace. Files are not compressed in any way, so at a bare minimum you need enough space to store all of your files, plus some extra space for database storage and logfiles. You also will need to be prepared to add additional storage space as you add more content to DSpace.
Approximate cost: around $599. (roughly verified 12/05/2012 through Dell - basic R210II rack server).
Mid-range DSpace Production system
- 4 GB of Random Access Memory (RAM)
- ~2GB for Tomcat (e.g. "TOMCAT_OPTS=-server -Xms2048M -Xmx2048M -XX:MaxPermSize=128M -Dfile.encoding=UTF-8")
- ~2GB for Database (PostgreSQL or Oracle).
- Keep in mind your Operating System also needs some memory to function. So, while a mid-range DSpace may only need ~4GB of memory, you should ensure the computer itself has at least 5-6GB of RAM available overall.
- 200 GB of Storage (or roughly enough storage for all the files you wish to store in DSpace)
This mid-range system may be necessary for DSpace sites which either have a larger number of items (roughly 50,000 or more) or a larger amount of activity (searches, accesses, downloads, etc) within the system.
Again the storage estimates are very rough. The actual amount of storage you will need depends on the size of the files you plan to store in DSpace. Files are not compressed in any way, so at a bare minimum you need enough space to store all of your files, plus some extra space for database storage and logfiles. You also will need to be prepared to add additional storage space as you add more content to DSpace.
High End DSpace Production system requirements:
- Any modern processor / CPU. (During normal function, DSpace is not very CPU heavy. However, some backend tasks which are scheduled via "cron" do require CPU. As your amount of content increases, you may need a higher end CPU.)
- 8GB of Random Access Memory (RAM)
- ~4-6GB for Tomcat
- ~2-4GB for Database (PostgreSQL or Oracle)
- Keep in mind your Operating System also needs some memory to function. So, while a mid-range DSpace may only need ~8GB of memory, you should ensure the computer itself has at least 9-10GB of RAM available overall.
- 1TB of Storage (or roughly enough storage for all the files you wish to store in DSpace)
- Storage examples:
- 73 GB 15,000 rpm network disks in RAID accessible over a gigabit connection for storing the database and indexes
- 7,400 rpm network disks in RAID accessible over a gigabit connection for storing the data whose size can be easily expanded.
- Storage examples:
The high-end system should only be necessary for extremely large (500,000 or more items) or extremely active DSpace sites. The majority of DSpace sites should not require this high end system until they experience a larger amount of growth or activity.
Approximate cost: around $2500. (roughly verified 12/04/2012 through Dell)
Cloud hosting recommendations
You can use the above hardware recommendation to analyse whether a virtualized cloud hosting platform will meet your needs. For example, when evaluating Amazon's Elastic Cloud services, you can compare the instance types with the above hardware recommendations. Currently, a Small EC2 instances roughly corresponds with the above Minimal Production system requirements (though it is slightly less memory at 1.7GB). It may be possible to run a production DSpace site on Amazon's small instance to begin with, but you may need to update to a Medium as activity ramps up in your DSpace site.
If you are considering other online hosting services, always keep in mind that you need a service that offers you SSH remote access in order for you to follow the DSpace installation procedures. Many online hosting platforms offer only very basic PHP or MySQL support that don't suffice for installing and operating DSpace.
At all times, your own bandwidth, storage and processor requirements (and associated costs) will vary depending on what you plan to do with the system.
Once you are running DSpace in a production environment, it is highly recommended to run a separate, second instance of DSpace on a test or staging server. Any DSpace upgrades, customizations or other modifications can first be evaluated on this staging server before you move to production. If the actual testing is carried out by only a few people, your staging server will not experience the same levels of load as your production server. Therefor, you can bring down the system requirements for your staging server, even below the above minimum requirements.
To make sure that your staging environment is a realistic simulation of your production server, it is recommended to mirror as much settings or configuration as possible. Needless to say, a staging server on windows will not provide reliable testing outcomes if your actual production machine is running linux and vice versa.
To simulate higher levels of load on your staging server, you can use free tools like JMeter.
How much time does it take to set up a DSpace installation?
A person familiar with installing Java and database based open source applications should be able to complete a prototype DSpace installation in a few hours to a day. After this experience, and an additional day to a week of exploring the software, a production installation should also take about a day for the basic software in a standalone configuration. If customizations or local integrations are required (for example, for user authentication), then additional time should be allocated according to the complexity and quantity of the changes required.
How much maintenance does a DSpace instance require?
While the DSpace software itself requires very little maintenance, the usual IT overhead for data backup, etc. applies. It is also typical for expectations and requirements to evolve once the instance is operational. Initial system usage often exposes areas for reorganization, metadata correction, and the inevitable requests to remove prematurely or incorrectly deposited items. An ongoing maintenance estimate should incorporate IT time based on the expected size of the repository to backup, ongoing local development based on the amount of customization, and repository content management based on the anticipated rate of deposit.
Can anyone help me to setup or install DSpace?
When you run into any trouble using or installing DSpace, a large community of users lies at your fingertips in the mailing lists. The Technical mailing list is the place to be for technical troubleshooting, while more general questions about the software can best be posted to the General Mailing list.
In case you require professional assistance, consult one of the Registered DSpace Service Providers.
What is Metadata?
Metadata is literally "data about data." It is descriptive information used for querying. Some metadata can be done mechanically, such as file sizes, checksums, and full-text indexing, for instance. Other metadata is a higher order of human-made description such as titles, authors, unique identifiers, and abstracts. DSpace uses a qualified version of Dublin Core metadata across all content. Some communities or collections may also have tailored metadata available (such as MARC records for book collections, or FGDC records for geographic datasets). But even where that's available for some items, we crosswalk more detailed metadata records into our Dublin Core vocabulary to ensure a common layer of descriptive specificity for browsing and searching across everything.
What metadata standards does DSpace support? Can I create metadata using the [SCORM or VRA or FGDC or MARC or myOwnSchema]?
In this context support for a given metadata schema means that metadata can be entered into DSpace, stored in the database, indexed appropriately, and made searchable through the public user interface. This currently applies mainly to descriptive metadata, although as standards emerge it could also include technical, rights, preservation, structural, and behavioral metadata.
Currently DSpace supports only the Dublin Core metadata element set with a few qualifications conforming to the library application profile. HP and MIT also has a research project called SIMILE which is investigating how to support arbitrary metadata schemas using RDF as applied by the Haystack research project in the Lab for Computer Science and some of the Semantic Web technologies being developed by the W3C.
Does DSpace support OAI?
DSpace supports the Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH) v2.0 as a data provider. OAI support was implemented using OCLC's OAICat open-source software to make DSpace item records available for harvesting. Many institutions running DSpace choose to register as a data provider with the Open Archives Initiative.
Building a DSpace Service
Where can I find information on how to build a DSpace service?
For technical information see, the technical documentation and join the DSpace-Tech mailing list. For non-technical questions and concerns, see the DSpace.org web site and join the DSpace-Community mailing list for DSpace related announcements and general discussion.
Other helpful resources include:
- QuickStart Guide on the DSpace.org website.
Is there a list of live DSpace services?
See the list of Who's Using DSpace? on DSpace.org. If your institution is running DSpace and isn't listed, please send us your information via the form on that page.
Is there a list of DSpace Service Providers?
Yes, see the Service Providers page on DSpace.org for a list of consultants and organizations who can help you build and run your DSpace service.
Do I have to name my service "DSpace"?
No. We suggest you create a unique name for your repository. All the language in the user interface resides in one file, to make it easier to modify and translate. You just need to replace "DSpace" with the name of your repository in that file.
What kinds of DSpace services are other institutions building?
Research institutions worldwide use DSpace to meet a variety of digital archiving needs:
- Institutional Repositories (IRs)
- Learning Object Repositories (LORs)
- Electronic Records Management (ERM)
- Digital Preservation
- and more
There are many DSpace Use Case Examples on the DSpace.org website.
Where can I find information on Digital Preservation?
There are several good resources available. Start by reading Paul Wheatley's article "A way forward for developments in the digital preservation functions of DSpace : options, issues and recommendations".
- http://www.digitalpreservation.gov/ has some good specifics about formats.