Skip to end of metadata
Go to start of metadata

Problem Summary

Configuring and managing DSpace test and development environments requires significant effort.

When developing and testing against multiple versions of DSpace, the management of system software becomes significantly more complex especially when there is a need to downgrade versions of software.

DSpace 4, 5, and 6 require Postgres and Tomcat as prerequisite software.

A full DSpace 7 instance will require Postgres, Tomcat, Node (angular), and Solr as pre-requisite software.  A developer may wish to focus on either the full set of components or a subset of these components.

Assumption

The complexity of managing these development and test environments has likely prevent some institutions from contributing to the platform. 

The project might be able to engage a larger audience of developers if the prerequisites for creating a development environment could be simplified.

Many DSpace stakeholders are not developers.  If it was easier to deploy a snapshot instance of DSpace to the cloud, these stakeholders could play a more active role in system testing.

Goal 1: Simplify and standardize the process for creating a development environment for all supported versions of DSpace

Docker provides a consistent, predictable environment

Docker provides users with a consistent and predictable runtime environment.  With such an environment, new DSpace developers will be able to more easily isolate local environment inconsistencies when seeking help from the DSpace community.

Docker allows a user to manage multiple (and incompatible) environments

Docker containers allow a developer to manage and run multiple system configurations from a single desktop or test server.

Significant work has already been done in this area.  https://github.com/DSpace-Labs/DSpace-Docker-Images

Each major branch of DSpace and each recent release of DSpace has been published as a Docker image.

Goal 2: Publish a standard set of AIP files (archival ingest packages) to facilitate system testing

In order to expedite and simplify DSpace testing, it would be useful to provide developers with content suitable for testing a majority of DSpace use cases.

Fortunately, the DSpace AIP ingest process provides a mechanism for constructing a DSpace repository that is DSpace version agnostic.

What test resources exist?

We have generated one simple set of re-usable AIP files posted on GitHub.  https://github.com/DSpace-Labs/AIP-Files

These files are too simplistic for testing real DSpace use cases.

Demo.dspace.org is populated with a set of AIP files that are stored on Amazon S3.  These files are approximately 1.5G in size. 

A regular GitHub repository is not appropriate for sharing large files.  See https://help.github.com/articles/working-with-large-files/.

What test resources should exist?

  • A community/collection hierarchy containing items of various sample document types (PDF, image) that illustrate DSpace features including some access-restricted material.  
    • These examples should be created from contrived examples or from open-access resources.
  • A community/collection hierarchy containing multilingual metadata
  • A community/collection hierarchy containing some digital collection material

How should these test resources be distributed?

The collections of material that are likely to be shared will exceed the space and bandwidth limitations of a GitHub code repository, so there are likely to be some storage and bandwidth costs associated with sharing these resources.

A test collection would need to be downloadable as a single zip file or as a collection of individual zip files.

  • Explore GitHub LFS (large file storage) as an option
  • Share resources through a cloud service such as AWS S3.
  • Share resources using a collaboration platform such as Box or Google Drive
  • Upload these asses to a data repository and share them from that platform

Goal 3: Create a simple workflow to deploy a DSpace branch or snapshot to the cloud

Each of the major cloud providers provides a mechanism for running Docker containers.

If we could streamline the process for deploying published Docker images to the cloud, we could support a unique test environment for each major branch of DSpace.  This would allow users to compare and contrast functionality on each version of DSpace.  It would also allow an institution to evaluate the functionality on a specific DSpace branch.

Here is a write up of my experimentation with Docker Images running on specific cloud providers.

Goal 4: Manage hosted instances of DSpace for each supported branch of the system

If the prior goals are achieved, it would be possible to manage multiple hosted instances of DSpace for each major branch of the system in addition to the production reference version at demo.dspace.org.

Goal 5: Publish instructions for a replicable development environment for on-boarding, tutorials, and troubleshooting

Frequent contributors to DSpace are likely to manage and maintain a robust development environment that includes best of class development tools such as IDE's.

There is also a need to assist new contributors and potential contributors with the creation of a simple and basic development environment for DSpace. 

While Docker can standardize the runtime and test environments, it may be possible to document a replicable development environment to help with on-boarding.

See Using Published Images on AWS Cloud9 as a possible example.

Implementation Costs

  • Hosting AIP resources (storage and bandwidth)
  • Hosting multiple DSpace reference instances (dspacer-4x, dspace-5x, dspace-6x, dspace-7x) in addition to demo.dspace.org (compute, storage, bandwidth)




  • No labels

3 Comments

  1. Not sure how realistic my idea is, but in terms of very large AIP distribution – we could encourage the community to actively seed it on P2P, so it's distributed by torrent rather than having to pay for long-term hosting costs?

  2. I don't see a direct relation between goal 2 (AIP) and Docker. I now that we would like to use the AIPs within Docker and that it makes sense for Docker to provide AIPs, but I considder this more a side effect than a goal of our docker work. At least it should not be the second goal (expecting at least some people to just quickly read the headlines).

    Regarding Docker I see more goals, maybe some may be covered indirectly already, but I think they are worth to get emphasized:

    • Making our own development work faster and easier
      Changing between DSpace versions is crucial for DSpace committers while testing and fixing bugs and releasing DSpace versions
    • Make it easier to test DSpace for people looking for a new repository software
      With Docker we could provide complex setups that people can use easily to see all of the features DSpace offers. This is a very good option to test DSpace without the necessity to understand the complex setup and installation process.
    • Enabling the Community to easily provide previews on coming DSpace versions
      For example we could publish a DSpace 7 preview, that includes example contents, the rest api and angular running in multiple containers, being started just by docker run. We can automate easily all the steps that would be necessary to create such an environment even if the deployment/installation process is not fully developed and even if we still miss a strategy to pack DSpace 7 (or any other version while working on it).
    • Provide comparable setups for DSpace testathrons
      For every new version of DSpace we run a testathron. People are asked to got through a testplan provided by DCAT and/or to test their own uses cases for an upcoming version of DSpace. Dockers enables us to provide comparable environments for everyone who is taking part on such a testathron, making it easier to reproduce reported problems. Beside the uses cases, the installation and updates of DSpace has to be tested as well. This is outside of the scope or Docker, but it is only a small part of the whole testathron.
  3. Yes, I suppose I just see a lot of those goals like "comparable setup for testathons", "easy testing for new users", "easy, stable previews" etc, as needing both a well-maintained docker / docker-compose resources, but also a decent set of AIPs to make it a more realistic user experience?

    For development, agree they'd mainly be around for just testing and quite often the nature of the fix/improvement being worked on means it needs some special test conditions anyway.

    So perhaps the test AIPs are a dependency of a goal like "shared testing / preview / review" experience? (of which docker is another dependency?)