This page addresses the most commonly-asked questions about DSpace. See also the TechnicalFAQ page for anwers to technical questions about DSpace.
A groundbreaking digital repository system, DSpace captures, stores, indexes, preserves and redistributes an organization's research material in digital formats. Research institutions worldwide use DSpace for a variety of digital archiving needs - from institutional repositories (IRs) to learning object repositories or electronic records management, and more. DSpace is freely available as open source software you can customize and extend. An active community of developers, researchers and users worldwide contribute their expertise to the DSpace Community.
Anyone who uses DSpace can get involved, in a number of different ways: programming, defining feature requirements, writing documentation, testing new features , sharing your design or marketing expertise. Get involved by joining the DSpace mailing lists, adding your projects, experiences, and comments to the DSpace Wiki, and collaborating with other DSpace community members.
The MIT Libraries and Hewlett-Packard (HP) originally developed DSpace, but the software is now supported by DuraSpace. The system is now freely available to research institutions world-wide as an open source system that can be customized and extended.
DSpace is freely available as open source software. The DSpace Community manages the code base and releases new versions of the software. An active community of developers, researchers and users worldwide contribute their expertise to the DSpace Community.
See also Does the DSpace project have a supporting organization?
DSpace is the first digital repository to address the myriad issues inherent in a multi-disciplinary archive, including:
DSpace is designed with a flexible storage and retrieval architecture adaptable to a multitude of data formats and distinct research disciplines, known as "communities." Each community has its own customized user portal that can use the community's own practices and terminology.
The success of any open-source project lies with the community contributing its collective energy, knowledge, enthusiasm, and effort. DSpace is developed and supported by the user community, with the help and guidance of DuraSpace. DuraSpace is a not-for-profit organization formed in July 2009. The organizations which supported the DSpace project previously, the DSpace Foundation (2007-2009) and the DSpace Federation (2003-2004) have ceased operation. To learn more about DuraSpace, please visit www.DuraSpace.org. For technical questions about the DSpace software platform, please refer to Who provides technical support for the DSpace platform? listed below.
Yes. The DSpace system is freely available as open-source software (see locations below), under the terms of the BSD distribution license. We have also tried to find good open-source tools to package with the DSpace application, all freely available under an open-source license (although not all the same license as the one for DSpace itself), so that you get a complete system along with the part that we created.
DSpace is freely available as open-source software from GitHub. For more information on the most recent release of the software see the Current Release Notes.
DSpace is freely available as open-source software from GitHub. If you are familiar with Git, you can also download the latest code via our GitHub Code Repository.
Yes, you can customize and extend the system to suit your organization's needs. DSpace was designed to make adapting it for individual organizations as easy as possible. See the section on how to contribute on the DSpace Wiki for information on submitting code changes to DSpace. Each application is different, but most organizations need to customize the authentication system, for example, to work with existing systems. Some organizations may want to substitute the open-source tools supplied with DSpace with different ones (for example, replacing PostgreSQL with mySQL or Oracle).
O'Reilly & Associates has a very helpful web site devoted to open source: http://opensource.oreilly.com/.
DSpace accepts all manner of digital formats. Some examples of items that DSpace can accommodate are:
Each DSpace service is comprised of Communities – groups that contribute content to DSpace – and Communities in turn each have Collections, which contain the content items, or files. In a university environment, for example, Communities might be departments, labs, research centers, schools, or some other administrative unit within an institution. Communities determine their own content guidelines and decide who has access to the community's contributions. An administrator on the DSpace team, usually the DSpace User Support Manager, works with the head of a community to set up workflows for content to be approved, edited, tagged with metadata, etc. Collections belong to a community or multiple communities (for example, research collaborations between two communities may result in a shared collection) and house the individual content items and files.
Yes. Currently DSpace has a Item Exporter which supports exporting digital content, along with its metadata, in a simple XML-encoded file format (where each item is exported into a separate directory). See the "Item Importer and Exporter" section of the DSpace Documentation's System Administration chapter for more details.
There are also basic packagers which allow exporting using the METS standard. For more information on these, see the "Package Importer and Exporter" section of the DSpace Documentation's System Administration chapter (see above link).
Yes. Currently DSpace supports importing content in batch using a variety of options:
Yes, DSpace has documented Java APIs you can customize to allow interoperation with other systems an institution might be running (for example, a department's web document system auto-depositing in DSpace, or a campus data warehouse).
DSpace requires that a persistent identifier is assigned to each digital object (Item, Collection, Community). Because the developers wanted a solution which will work for a very long time, the identifier system had to be independent of any underlying network protocols, such as HTTP.
DSpace uses the Handle System from CNRI (Corporation for National Research Initiatives) as the persistent identifier for each digital object. Handles are resolved to actual URLs via a resolution service. The Handle resolver is an open-source system. Handles in DSpace (and elsewhere) are currently implemented as HTTP URIs, but can also be modified to work with future protocols. The Handle system is also able to support existing bibliographic identifiers such as ISBN or ISSN.
In the current version of DSpace, Handles are used as internal identifiers. By default, DSpace utilizes a 'dummy' (non-external) Handle prefix of '123456789' when assigning Handles to new objects. If an organization wishes to obtain a valid Handle prefix (which can be resolved from external locations), one can be purchased from the Handle System site for a small annual service fee.
It should be noted that if an organization has a policy requiring the use of another persistent identifier system, it is possible to use it as the public or external persistent identifier to the resource. In such a case, the public identifier - for instance, a URN - could resolve to a DSpace-generated page which contains metadata about the resource - including the Handle-based persistent link to the resource itself.
Future versions of DSpace may support multiple internal persistent identifiers. However, this work is still under investigation, and we are looking for developers and institutions willing to volunteer to help with this project.
DSpace itself does not guarantee the preservation of your digital materials. However, DSpace software is suited to play a central role in your overall digital preservation strategy. Keep in mind that your local digital preservation strategy should likely include a backup/restore plan, along with virus checking, etc.
DSpace allows you to identify two levels of digital preservation: bit preservation, and functional preservation. Bit preservation ensures that a file remains exactly the same over time - not a single bit is changed - while the physical media evolve around it. Functional preservation goes further: the file does change over time so that the material continues to be immediately usable in the same way it was originally while the digital formats (and physical media) evolve over time. Some file formats can be functionally preserved using straightforward format migration, such as TIFF images or XML documents. Other formats are proprietary, or for other reasons are much harder to preserve functionally. No one can predict the formats all users will choose for their research material. They use the best tools for their purposes, and research institutions will get whatever formats those tools produce. For this reason, DSpace allows you to choose three levels of preservation for a given format: supported, known, or unsupported.
It is important to note that, although DSpace provides some default values for Supported, Known and Unknown formats, your institution should determine the appropriate values based on your local preservation strategy.
DSpace also provides other tools to help you to meet your preservation goals:
You can find DSpace system documentation on the DSpace.org website or on the DSpace Resources wiki page.
The DSpace Community of developers support one another and exchange ideas and solutions on the DSpace mailing lists. Before you post a question or problem, check to see if your question has been answered already.
More hints/tips are available on the How-To Troubleshoot an Error page.
You can report bugs and suggest enhancements through the Software Bug/Feature Tracking System (JIRA).
Bugs will be fixed as soon as possible, within the limits of the DSpace team's technical support resources. The team considers all enhancements, and if an enhancement is accepted, adds it to the enhancement list for development as time and resources allow. Of course, any users working with the open-source code are welcome to fix a bug or make an improvement to the system. See our DSpace Contribution Guidelines to learn how.
DSpace has a very active community of developers which contribute expertise and support through the DSpace-Tech mailing list, and the DSpace wiki. To work with the DSpace system you'll need local technical resources (hardware, technical experts, and so on) to really take advantage of the system. The DSpace web site offers technical documentation, and you can join the DSpace-Tech mailing list, to ask questions or post solutions.
In case you require professional assistance, consult one of the Registered DSpace Service Providers.
DSpace is written in Java, it will therefore run on any Operating System (Linux, Windows, Mac OSX). DSpace is built on top of free, open-source tools, such as the Apache Web server, the Tomcat Servlet engine, and the PostgreSQL relational database system. For your convenience, we package the necessary JDBC and other drivers and libraries together with DSpace. This set of tools should run on any UNIX-type OS, such as Linux, HP/UX, or Solaris, and you can substitute other libraries if you need to run on another platform. The system runs on anything from a laptop to a $500K server, but there are a few general recommendations for hardware architectures. In a production setting where DSpace is actively used in public, DSpace requires a reasonably good server (see below) and a decent amount of memory and disk storage. For such production usage, following requirements are meant as a guideline:
Minimal DSpace Production system requirements
This minimal system should be able to support DSpace sites of roughly 20,000 items or less. Though the exact number of items will depend on the amount of activity (searches, accesses, downloads, etc) within the DSpace site.
An empty installation of DSpace will effectively require less than 1GB of storage. The storage estimates are very rough. The actual amount of storage you will need depends on the size of the files you plan to store in DSpace. Files are not compressed in any way, so at a bare minimum you need enough space to store all of your files, plus some extra space for database storage and logfiles. You also will need to be prepared to add additional storage space as you add more content to DSpace.
Approximate cost: around $599. (roughly verified 12/05/2012 through Dell - basic R210II rack server).
Mid-range DSpace Production system
This mid-range system may be necessary for DSpace sites which either have a larger number of items (roughly 50,000 or more) or a larger amount of activity (searches, accesses, downloads, etc) within the system.
Again the storage estimates are very rough. The actual amount of storage you will need depends on the size of the files you plan to store in DSpace. Files are not compressed in any way, so at a bare minimum you need enough space to store all of your files, plus some extra space for database storage and logfiles. You also will need to be prepared to add additional storage space as you add more content to DSpace.
High End DSpace Production system requirements:
The high-end system should only be necessary for extremely large (500,000 or more items) or extremely active DSpace sites. The majority of DSpace sites should not require this high end system until they experience a larger amount of growth or activity.
Approximate cost: around $2500. (roughly verified 12/04/2012 through Dell)
Cloud hosting recommendations
You can use the above hardware recommendation to analyse whether a virtualized cloud hosting platform will meet your needs. For example, when evaluating Amazon's Elastic Cloud services, you can compare the instance types with the above hardware recommendations. Currently, a Small EC2 instances roughly corresponds with the above Minimal Production system requirements (though it is slightly less memory at 1.7GB). It may be possible to run a production DSpace site on Amazon's small instance to begin with, but you may need to update to a Medium as activity ramps up in your DSpace site.
If you are considering other online hosting services, always keep in mind that you need a service that offers you SSH remote access in order for you to follow the DSpace installation procedures. Many online hosting platforms offer only very basic PHP or MySQL support that don't suffice for installing and operating DSpace.
At all times, your own bandwidth, storage and processor requirements (and associated costs) will vary depending on what you plan to do with the system.
Once you are running DSpace in a production environment, it is highly recommended to run a separate, second instance of DSpace on a test or staging server. Any DSpace upgrades, customizations or other modifications can first be evaluated on this staging server before you move to production. If the actual testing is carried out by only a few people, your staging server will not experience the same levels of load as your production server. Therefor, you can bring down the system requirements for your staging server, even below the above minimum requirements.
To make sure that your staging environment is a realistic simulation of your production server, it is recommended to mirror as much settings or configuration as possible. Needless to say, a staging server on windows will not provide reliable testing outcomes if your actual production machine is running linux and vice versa.
To simulate higher levels of load on your staging server, you can use free tools like JMeter.
A person familiar with installing Java and database based open source applications should be able to complete a prototype DSpace installation in a few hours to a day. After this experience, and an additional day to a week of exploring the software, a production installation should also take about a day for the basic software in a standalone configuration. If customizations or local integrations are required (for example, for user authentication), then additional time should be allocated according to the complexity and quantity of the changes required.
While the DSpace software itself requires very little maintenance, the usual IT overhead for data backup, etc. applies. It is also typical for expectations and requirements to evolve once the instance is operational. Initial system usage often exposes areas for reorganization, metadata correction, and the inevitable requests to remove prematurely or incorrectly deposited items. An ongoing maintenance estimate should incorporate IT time based on the expected size of the repository to backup, ongoing local development based on the amount of customization, and repository content management based on the anticipated rate of deposit.
When you run into any trouble using or installing DSpace, a large community of users lies at your fingertips in the mailing lists. The Technical mailing list is the place to be for technical troubleshooting, while more general questions about the software can best be posted to the General Mailing list.
In case you require professional assistance, consult one of the Registered DSpace Service Providers.
Metadata is literally "data about data." It is descriptive information used for querying. Some metadata can be done mechanically, such as file sizes, checksums, and full-text indexing, for instance. Other metadata is a higher order of human-made description such as titles, authors, unique identifiers, and abstracts. DSpace uses a qualified version of Dublin Core metadata across all content. Some communities or collections may also have tailored metadata available (such as MARC records for book collections, or FGDC records for geographic datasets). But even where that's available for some items, we crosswalk more detailed metadata records into our Dublin Core vocabulary to ensure a common layer of descriptive specificity for browsing and searching across everything.
In this context support for a given metadata schema means that metadata can be entered into DSpace, stored in the database, indexed appropriately, and made searchable through the public user interface. This currently applies mainly to descriptive metadata, although as standards emerge it could also include technical, rights, preservation, structural, and behavioral metadata.
Currently DSpace supports only the Dublin Core metadata element set with a few qualifications conforming to the library application profile. HP and MIT also has a research project called SIMILE which is investigating how to support arbitrary metadata schemas using RDF as applied by the Haystack research project in the Lab for Computer Science and some of the Semantic Web technologies being developed by the W3C.
DSpace supports the Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH) v2.0 as a data provider. OAI support was implemented using OCLC's OAICat open-source software to make DSpace item records available for harvesting. Many institutions running DSpace choose to register as a data provider with the Open Archives Initiative.
For technical information see, the technical documentation and join the DSpace-Tech mailing list. For non-technical questions and concerns, see the DSpace.org web site and join the DSpace-Community mailing list for DSpace related announcements and general discussion.
Other helpful resources include:
See the list of Who's Using DSpace? on DSpace.org. If your institution is running DSpace and isn't listed, please send us your information via the form on that page.
Yes, see the Service Providers page on DSpace.org for a list of consultants and organizations who can help you build and run your DSpace service.
No. We suggest you create a unique name for your repository. All the language in the user interface resides in one file, to make it easier to modify and translate. You just need to replace "DSpace" with the name of your repository in that file.
Research institutions worldwide use DSpace to meet a variety of digital archiving needs:
There are many DSpace Use Case Examples on the DSpace.org website.
There are several good resources available. Start by reading Paul Wheatley's article "A way forward for developments in the digital preservation functions of DSpace : options, issues and recommendations".
Still have questions? For general questions about DSpace and DuraSpace, you can search the DSpace-Community mailing list archives archives or post a question to the DSpace-Community mailing list.
For technical or software questions, see the TechnicalFAQ and the DSpace system documentation. You can also search the DSpace-tech archives or post a question to DSpace-tech mailing list.