Page tree
Skip to end of metadata
Go to start of metadata

Technology Goals

Over the last few years, the Steering Group along with various strategic working groups have validated the following vision statements which describe the goals of the DSpace open source product:

  1. DSpace will focus on the fundamentals of the modern "Institutional Repository" use case. We are striving to meet the IR needs of the next 5-10 years.
  2. DSpace will be "lean", with agility and flexibility as primary goals.
  3. DSpace will include a "core" set of functionality that can be "extended" (think plugins) or have "hooks" (integration points) to complementary services/tools
  4. DSpace will be designed in such a way that it can be easily/quickly configured to integrate with new & future tools/services in the larger digital scholarship "ecosystem"
  5. DSpace will support low-cost, hosted solutions and deployments (by featuring an easy, "just works" setup)

Assumptions

It is worth being aware that several assumptions are made in the drafting of this strategic plan for technology:

  • We do NOT plan to rewrite DSpace from scratch, for the following reasons:
    • We have a highly active (and global) development community on the existing platform. We are averaging 50+ contributors in recent major releases. We also have a very active and healthy set of service providers.
    • A complete rewrite would require significant funding and centralized resources, neither of which are currently available. There also seem to be few (if any) grant opportunities to rebuild existing, established platforms.
    • A complete rewrite is very risky in the open source world. While in some cases it can succeed, it also can run the risk of fragmenting or fracturing a user community or developer community.
  • We ARE aiming for a potentially substantial leap forward in user experience / web user interface.
    • We've heard the feedback that neither of the two UIs (JSPUI or XMLUI) provides an optimal user or administrative experience. So, a User Interface rewrite or major refactoring would be "on the table".
  • The below actions and goals are ambitious, but we believe they are achievable provided that
    1. We can also achieve our Sustainability goal of increasing project revenue to hire a Product Manager, and
    2. Institutional stakeholders are willing to commit developers to spend time working on organized development sprints under the direction of the DSpace Technical Lead to achieve the deliverables. 

Based on this proposed value proposition and assumptions, the Steering Group recommends the following actions corresponding to each goal:

Goal 1: DSpace will focus on the fundamentals of the modern "Institutional Repository" use case.

In November 2002, DSpace was initially announced as an out-of-the-box "institutional repository software platform" (see DSpace 1.0 release announcement). While that basic goal has not changed, the common needs and use cases of an "institutional repository" have changed significantly in the last decade or so.  Therefore, this goal is oriented towards striving to retain DSpace's niche while revitalizing it to meet current and future use cases associated with the modern repository platform.

  • Action 1A: Verify and validate the needs of a "modern institutional repository". This is instrumental in formalizing the value proposition of DSpace.
  • Action 1B: Survey the community on Technology Roadmap / drafted feature rankings
    • Ranking of features or validation of our Technology Roadmap
    • Also an opportunity to possibly gather volunteers for specific feature projects
  • Action 1C: Identify minimum set of functionality/features for 'IR-core' and refactor codebase to provide this. This core may not be functional as-is, since it may require plugins that aren't extensions (e.g. authN)

Goal 2: DSpace will be "lean", with agility and flexibility as primary goals

Since its initial release in 2002, numerous features, configurations and options have been added to the DSpace codebase in an ongoing effort to keep up with the changing needs of its user base. While many of these changes have helped us to achieve new use cases, in some instances they have also complicated the codebase and made setup and upgrades more complex. Therefore, this goal is oriented towards cleaning up (and simplifying) the codebase and its configuration options, while also working towards avoiding duplication (of code and development efforts). We feel DSpace can be a "leaner" platform, which will allow the codebase to better adapt to the needs of the future and simplify its maintenance, setup and upgrade processes.

To be "lean", the DSpace technology platform should avoid duplicative functionality except where necessary to meet use cases or achieve "flexibility" goals.  Where unnecessary duplicative functionality already exists, the technology team should choose a "best option" solution, or propose building a new solution when a "best option" does not exist.

  • Action 2A: Converge on a single, out-of-the-box user interface (UI). DSpace will no longer be released with multiple User Interfaces (JSPUI vs XMLUI).  A single user interface should be developed as DSpace's out-of-the-box UI. Early discussions on the requirements of this single UI (and some brainstormed candidates) are at Brainstorms on a Future UI.
  • Action 2B: Converge on a single, out-of-the-box search/browse system.  DSpace will only support Apache Solr for search/browse, and the older, deprecated Lucene and DB search/browse system should be removed.
  • Action 2C: Converge on a single, built-in statistical engine. DSpace will only support a single, built-in statistical engine (based on Apache Solr), and support for Elastic Search statistics should removed or migrated to an optional module. Support for Google Analytics will be retained, as it's an optional integration with an external statistics engine.
  • Action 2D: Develop a basic User Interface "style / layout guide". In order to ensure a consistent user experience across all pages/features within the User Interface, we should provide basic guidelines for layout and styling of common page elements, etc. Examples may include basic guidelines for how errors/warnings/notices should be displayed, what class(es) to use for types of buttons, etc.

Goal 3: DSpace can be "extended" (think plugins) or have "hooks" (integration points) to complementary services/tools

There will obviously be limitations to what DSpace can and should do, so we need to have ways to support plugins/addons/extensions to that core functionality. Not all users of DSpace will need to achieve the same set of Use Cases, so we will need to define which are "core" and which would be better implemented as plugins/extensions (either centrally supported or third-party supported).

  • Action 3A: Define a family of specifications/interfaces/tooling for 'implementation' plugins (like authN, storage, persistent ID services, etc).
  • Action 3B: Define a family of specifications/interfaces for functional extensions to 'core' DSpace ( working title: 'modules'), and refactor existing bundled code to conform to new model (if appropriate/cost-effective).
  • Action 3C: Provide infrastructure/tools for a module registry, where users can discover, and install modular extensions. Likely include both modules maintained by committers and community contributions.
  • Action 3D: Devise a flexible but rigorous system of versioning all components (core, module, etc) where compatibility requirements can be checked/enforced by the build/deploy tools.
  • Action 3E: Define and expose new interfaces (in 'core' DSpace and possibly modules) to allow local customized code to run: 'integration points'.
  • Action 3F: (Highly dependent on UI architectural work) Provide a user-discoverable registry/library of user interface templates (working title: 'themes'), that can be installed and adapted for local use. 

Goal 4: DSpace will be designed in such a way that it can be easily/quickly configured to integrate with new & future tools/services in the larger digital scholarship "ecosystem"

In order to continue to play a key role in the larger digital scholarship "ecosystem", DSpace must provide ways to both share and consume data/content from external services. We should strive to make all information in DSpace "shareable", and also ease the process of adding information to DSpace by providing Administrators with tools to consume data from other locations.

DSpace should provide easy and out of the box integration with external services in the following areas:

  • Action 4A: Support ingest of complete metadata records (items) from external services, with or without files. External services may include: CrossRef, DataCite, PubMed, ORCID works, SHARE, etc.

  • Action 4B: Provide the ability to consume external authority control sources to enrich specific metadata fields.  External authority control services may include: ORCID or VIVO for author/contributor data, Fundref for funder metadata, Sherpa Romeo for Publisher OA policies information, etc.

  • Action 4C: Expose DSpace metadata and content to external services. Allowing pushing metadata and content to external APIs, or allowing external services to harvest (pull) information from a DSpace repository. For example, DSpace content should be made available to Europeana, OpenAIRE, RIOXX compliant harvesters (UK), SHARE

  • Action 4D: Expose DSpace usage data to external services.

  • Action 4E: Integrate with external Authentication and Single Sign on services. Examples may include: UK Federation, OpenAthens, OpenID, Google/Facebook/Linkedin/ORCID authentication

  • Action 4F: Integrate with external services providing identifiers (Handle, DOI, DataCite, ...)
  • Action 4G: Integrate with external storage and backup services (DuraCloud, Amazon Glacier/S3, Arkivum, Archivematica, ...)

To integrate with parallel projects and initiatives (fedora, hydra, islandora) we first need to pin down the use cases of what those integrations will bring to DSpace, or what these will bring to the other platforms. They currently do not fit immediately in any of these five areas.

Goal 5: DSpace will support low-cost, hosted solutions and deployments (by featuring an easy, "just works" setup)

DSpace should be easy to install without requiring Java development expertise, to configure without requiring server access, and to monitor from within the application. Basic configuration options, including the look and feel and selecting themes should be accessible from within the DSpace online administration area.

  • Action 5A: Improve and simplify the installation experience for DSpace. This may include, but is not limited to...
    • Investigate download, packaging and installation tools for Java web applications to make it easier to build a working system. (What do similar systems use?)
    • Examine options for lightweight installation, with most configuration taking place from the web interface upon first use (see for example WordPress, etc)
  • Action 5B: Improve and assist with the upgrade experience for DSpace, especially in terms of simplifying the management of local customizations (branding, custom themes, etc). This may include, but is not limited to...
    • Investigate options to assist with upgrades (for example highlighting changes from core code or configurations)
  • Action 5C: Make configuration and basic theming an easier experience for hosted or low-cost deployments by migrating most options to the administrative interface. Some examples include..
    • Move configuration of basic theme configuration options (colours, logo) into administrative interface;
    • Allow most configurations to be edited (and refreshed) from the administrative interface
  • Action 5D: Ensure data is never solely stored in "transient" technologies (e.g. Solr indexes or other such indexes) where it could be accidentally corrupted or lost. All DSpace data should be stored in a stable, persistent data storage system (e.g. database, filesystem), and then indexed from that location into tools like Solr, etc.
  • Action 5E: Provide recommendations around scaling and load-balancing large DSpace instances.
  • Action 5F: Provide administrators with additional system reporting features within the UI. Example use cases may include..
    • Alert administrators when new upgrades are available
    • Alert administrators when common system issues or misconfigurations are encountered (e.g. "Solr is not accessible / working", "assetstore is unavailable / unattached", "space in assetstore is low");
    • See Admin UI - System Alerts via Admin UI for more examples.

 

  • No labels

17 Comments

  1. My take on Goal 4 & its sub areas. 

    The following areas of integration should be well tested, documented and as out of the box as possible. DSpace will strive to provide the easiest experience to setup at least basic integrations in these areas. 

    Metadata ingest (complete records/items) from external services

    Following sources have been recognized as important for our community for this type of integration:

    • Crossref
    • Datacite
    • PubMed
    • ORCID works
    • SHARE
    • ...

    As a minimum, there will be UI support to lookup in these sources in order to import metadata and potentially also files from these external sources.

    Exposing metadata and content to external services

    Allowing pushing metadata and content to external APIs, or allowing external services to harvest (pull) information from a DSpace repository. As a key component, UI functionality to modify specific crosswalks for specific services is vital. We should at least ensure that content in DSpace can be easily made available to:

    • Europeana
    • OpenAIRE
    • RIOXX compliant harvesters (UK)
    • SHARE

    Exposing usage statistics to external services

    A future trend is that governments and funders will require automated collection of usage information. This is already happening in the UK with IRUS. It should be easy for authorized services to access the usage statistics in a machine readable way.

    Consuming external authority control sources for enriching DSpace item metadata

    If good and trustworthy metadata can be found elsewhere, lookup is always better than manual entry by submitters. In this context we should make integration with external authority sources as easy as possible, which will also reduce the time needed to submit new content. 

    Sources for which metadata lookup should be enabled:

    • ORCID, VIVO, ... for author/contributor authority data
    • Fundref for funder metadata
    • Sherpa Romeo for Publisher OA policy information
    • ???? for subjects/keywords 

    By acquiring identifiers for these metadata rather than just the string values, we will store more things and less strings.

    Integration with external Authentication and Single Sign on services

    Automating account creation and integration with external authentication services should be easier and more out of the box. Examples include:

    Integrate with external storage and backup services

    The filesystem of the DSpace server can address certain usecases of DSpace. In the context of preservation there are valid reasons for wanting to hand off the storage of the actual assets to different external services. One reason could be that the asset store simply becomes too large to be kept on fast, operational storage. Examples of services to integrate with include:

    • Amazon services including Glacier or S3
    • Duracloud
    • Arkivum

    This integration goal could be seen as an overlap with goal 5, because it's more of a systems aspect than addressing functional goals.

     

  2. Stuart Lewis What do you think about moving 5I (handles) into 4? Integrating with external services that manage identifiers (datacite, handle, ...) seems like an "integration with external services" that should be easier. 

  3. Hi Bram Luyten (Atmire) - thanks for the suggestion.  Yes, perhaps it might be better in goal 4?  Also, maybe 4E might be better in 5, as this is really local setup, rather than part of the wider open scholarship infrastructure?

  4. For a local ldap or shibboleth I would agree, but I also see a lot of demand for integration with things like OAuth2/OpenID/OpenAthens/UK Federation login. So it's like: providing the base infrastructure, but shipping it together with out of the box support for some of these external providers. So I can put "external" in 4E, and the local aspect could be a point in 5.

  5. For Goal #3, I am curious to understand what is defined as "core" functionality.  I presume the collection/item/bitstream hierarchy, dublin core metadata, and the handle system would be considered "core".

    1. Hi Terry,
      The "core" functionality is still to be defined, and actually relates to actions under Goal #1 (e.g. "verify and validate the needs of a modern institutional repository").
      I think the Use Cases and the Use Case Analysis will help us to get more to a definition of what is "core" versus "non-core" (as I expect there will be some use-cases we eventually decide are not "core" to the role that DSpace plays in the larger "ecosystem".

      But, all that said, I think your presumption is correct to some extent. A simple hierarchy would be "core", as would a simple metadata scheme, and (one or more) permanent identifier scheme. Other "core" functionality would be obvious things like the ability to search, browse, deposit new content, export in useful formats, etc.

      All in all, the Goal #3 is more about stating that "there will obviously be limitations to what DSpace can and should do, so we need to have ways to support plugins/addons/extensions to that core functionality".

      I hope that helps. I think your observation here is a good one though, we need to be able to define what is truly "core" and what is "non-core" (likely as part of the Use Case Analysis and upcoming Technical Roadmap)

  6. Stuart Lewis : After reading through Goal #5, I noticed a bit of overlap between some of the "actions", and some also seem a bit more like feature requests or even Use Cases (rather than strategic actions/goals).  So, I've taken the liberty of trying to do a re-draft below...let me know what you think:

    • Action 5A: Enhance and improve the installation experience for DSpace. Some examples may include...
      • Investigate download, packaging, and installation tools for Java web applications to make it easier to build a working system; What do comparator systems use

      • Examine options for lightweight installation with setup tasks such as database population undertaken via the web interface upon first use, and creation of initial admin account

    • Action 5B: Make configuration and basic theming an easier experience for hosted or low-cost deployments. Some examples include..
      • Move configuration of basic theme configuration options (colours, logo) into administrative interface;
      • Move most configuration into the database, so that it can be updated via the administration screens
    • Action 5C: Enhance and improve the upgrade experience for DSpace. Some examples may include...
      • Investigate options to assist with upgrades (for example highlighting changes from core code or configurations
      • Create alerting tool for new upgrades to alert administrators within the admin user interface
    • Action 5D: Ensure data is never solely stored in "transient" technologies (e.g. Solr indexes or other such indexes) where it could be accidentally corrupted or lost. All DSpace data should be stored in a stable, persistent data storage system (e.g. database, filesystem), and then indexed from that location into tools like Solr, etc.
    • Action 5E: Document options for scaling and load-balancing DSpace
      • (Note on this one: Great idea, but we may need to rely somewhat on larger institutions/hosting providers to help us with these docs. I think they know much more about scaling/load balancing than even most Committers may know. Maybe a part of it is really about opening up discussion about how folks are scaling DSpace, to develop better best practices)

    The only ones I have not included above are your existing 5F (Simple Asset Store) and 5G (Health Check Tool).  Those both seem slightly more like feature requests or Use Cases which need more explanation/detail, and may need broad discussion about implications. To be clear though, I think both are good ideas and worth consideration. But, we'd need to investigate whether a "Simple Asset Store" is really that high of a priority, and whether there would be complications.  Additionally, a "health check tool" is a nice idea, but it needs use cases (What sort of "checks" would it perform? Is it really about DSpace "hooking" into external tools (like virus checkers, or other third party health check tools etc)?

    I'll gladly paste this above if others agree with this reorg. I really just wanted to ensure though that I wasn't missing or overlooking something in the existing actions in this re-draft.

    1. In reply to the health check:

      • most basic thing to offer is an in-browser log viewer. The Clarin/LINDAT guys already made it for DSpace cfr:

      • When it comes to performance monitoring etc, I would rather see this OUTSIDE of the scope of DSpace. Tools like new relic are really fantastic and there's no way we can beat that kind of experience.
      • One other thing that could use health checking is the SOLR core for statistics: are IPs being registered correctly, is the Geo data correct, do you have any big gaps in stats or big spikes ...

       

      1. We weren't really thinking of server monitoring tools (zenoss / newrelic, etc), rather DSpace being aware of its own requirements, for example a functioning solr app, and being able to say when it doesn't have the things it needs available to it.

      2. I would urge us to sternly clean up the set of things that are dumped into dspace.log and cocoon.log.  This needs some study – I don't have any specific recommendations, only that these logs are hard to use because they are stuffed with so many records of so many very different things, having very different useful lifetimes.

    2. Hi Tim Donohue - thanks, you're right about the subtle difference with feature requests.  Thanks for editing the language here - it looks good.

      Perhaps one thing that would be helpful is to standardise what the goals can look like, for example must start with either "DSpace must/will xxx xxx" or "Investigate how to make DSace xxx xxx"?  This will help ensure consistency of approach across the document.

      I need to send apologies for this afternoon - I'll either be absent or late.

      Thanks, Stuart

  7. Hi all: I will also be absent (local holiday today), but put some language in Goal 3 for discussion.

    Thanks, Richard

    1. Not sure when we're tackling the core vs non-core question, but it would be interesting to compare our ideas with the older diagram at

      http://www.dspace.org/sites/dspace.org/files/media/DSpace%20Diagram_0.pdf

      Some questions pop up:

      • Do we consider usage stats "core" ?
      • Do we still consider the preservation aspect core, knowing that right now our main tools are the curation tasks, checksums and bitstream format registry
      • The diagram only shows human end users. Do we consider machine-access core?

       

  8. Based on today's call I'm adding Action 4G on integration with external storage and archiving services (think duracloud, arkivum, glacier, s3, ...)

  9. I suggest a bigger alignment between Goal 4 and COAR Interoperability roadmap https://www.coar-repositories.org/activities/repository-interoperability/coar-interoperability-project/coar-interoperability-roadmap/ because a number of relevant people already tacked theses issues.

    COAR is already working to advance interoperability in several of the priority areas including author identification systems, publication lists, persistent identifiers, usage statistics and bibliometric formats. In the fall of 2014, COAR launched an international working group with the major regional repository networks, as well as CASRAI and EuroCRIS to develop a blueprint for interoperability with the aim  of developing a formal mechanism whereby these interoperability issues can be discussed and addressed.

     

     

     

     

    1. Hi Joao,

      Sounds reasonable.  Do you happen to know if the "COAR interoperability roadmap" itself has a draft anywhere we can refer to?  I notice the webpage itself just states that it's in process, and will be based on information gathered in earlier phases (the last one seemingly completed in 2012 with the "Current State of Open Access Repository Interoperability" Report). 

      I do agree that, at a minimum, COAR should be included in the DSpace 2015-18 Strategic Plan - Community document as a group we should be tracking / keeping in touch with.

      But, it might be hard to align completely with the COAR roadmap, until we know what it may look like (even in a draft form). We definitely need to track it, and find a way to help DSpace achieve the COAR interoperability guidelines/best practices, as soon as they are drafted or documented. 

      If the COAR interoperability roadmap is still in progress, we might also want to see if someone from our DSpace Community is involved and willing to report back to our Steering Group on the early status, so that we can better align DSpace interoperability goals.

      Thanks for reminding us all about COAR!

      1. The Roadmap is done. It was published on Fev this year.

        https://www.coar-repositories.org/files/Roadmap_final_formatted_20150203.pdf 

        One of the nice things is that priorities are already defined.

         

        Regards,

        João