Discussions have begun between DSpace and Fedora developers to evaluate having a common storage layer abstraction.  Leading this effort from the DSpace community is Richard Rodgers and Brad McLean, from the Fedora community is Chris Wilper.  A meeting will take place the week of October 7th, during the time of the DSpace 2.0 developer meeting to outline possible proposals from which both communities can give feedback.

Some Motivations and Expectations

  • Since storage systems underpin all repository platforms, there is an obvious advantage to sharing and leveraging work in this area regardless of other differences in data model, architecture, etc; work which would otherwise be duplicative. Expressed another way, it multiplies the resources that can be brought to bear on the rapidly evolving world of cloud-based, grid, enterprise CMS, etc, storage options.
  • A common abstraction could enable and encourage a common storage fabric (meaning a large collection of content addressable by a variety of repository systems), which can form the foundation for inter-repository services. Examples: replication for data security, distributed mirroring for optimized access, custody transfers, etc. Such services are possible - but far more difficult - in a heterogeneous storage environment.
  • This effort can also function as an invaluable forum for understanding requirements and use-cases around repository content management encompassing the DSpace and Fedora constituencies. Example: what additional requirements - if any - are placed on a storage back-end by holding video or other streamable content?
  • A common storage layer invites us to re-imagine, or poke holes in, our conception of a repository. Is storage an 'out-sourceable' service of a repository? Can a Fedora and a DSpace instance, or a consortium of them, share a single store? Such questions can help us answer the critical questions about the distinctive nature of the systems we are inventing and how they fit into the larger digital ecosystem.
  • No labels

2 Comments

  1. I want to point out that we are currently reviewing DSpace, Sakai, Fedora, JCR and CMIS for commonalities within the DSpace 2.0 development group, you can find details for this effort here on the DSpace Wiki. We require a centralization of the work around DSpace 2.0 so that the whole team and the community at large is aware of the expectations and initiatives that encompass the DSpace 2.0 rearchitecture and implementation.

    Its our viewpoint within the current funded developers group that the scope of DSpace 2.0 work this fall is to establish:

    1. A data model generic enough to express all entities and their relationships in DSpace and which can be flexible enough for users to achieve a rich level of detail in describing resources that does not have the limitations found in the current hardcoded DSpace domain model.
    2. An explicit Content Repository Service API that brings together Content and Metadata in a convenient and consistent representation for developers to work with. In this regard, we are reviewing and drawing on the existing commonalities we see in storage solutions like Fedora, JCR implementations, and standards efforts such as CMIS. 

    With the participation of Aaron Zeckoski, we are currently establishing a new application framework and suite of Service API for DSpace that is modeled similar to the Service Locator strategy found in Sakai, this includes a possible modeling the above Content Repository Service API similarly to the Sakai JCR Content Hosting Service API. 
    I have a concern that a seperate "common storage layer abstraction" effort may start to duplicate the current work we are already engaged in within the DSpace 2.0 group. I feel its extremely important that we work to centralize all the effort in this area so we are not replicating efforts by seperately discussing how tools such as DSpace and Fedora relate to one another.  I want to let the community also know that we have been gathering a consensus around opening of the dspace-architecture listserv to public membership.  We will be inviting the community to critique and participate in the DSpace 2.0 funded work via the listserv in the very near future and currently via the DSpace 2.0 section of the dspace wiki. We certainly request and welcome the viewpoints and participation of the Fedora Community in this effort as well.

    The success of the DSpace 2.0 initiative is heavily dependent upon getting this Content Repository Service API and common abstraction correct and properly aligned with not only Fedora but with other common content storage solutions as well.

    Sincerely,
    Mark Diggory

  2. Extending beyond this theme now...

    I think at this point with the introduction of a "Domain Model" for DSpace we are now looking beyond simple assetstorage for mapping between DSpace and Fedora.  That the appropriate path forward has to do with abstracting where the persistence of DSpace Objects happens in general.  For those not familiar with the DSpace Object architecture. A DSpace Object is a Community, Collection, Item or Bitstream, etc.  These objects are persisted into one or more Database tables and only in the case of Bitstreams is there a binary Assetstore or "content" involved with said processes. Thus mapping just assetstorage is but a small part of the equation.

    The DSpace Domain Model represents an expression of the existing DSpaceObject model as JAVA "Interfaces" rather than "concrete classes", it is meant to formalize the DSpace Data Model and allow Applications such as JSPUI, XMLUI, etc to have a Data Model contract which they can rely upon.  This proposal and its associated prototype facilitates the introduction of new implementations of persistence other than databases to be developed while allowing the applications to rely on the Interfaces as a contract for implementations to adhere to.  It is basically the missing piece that was speculated and proposed in the original DSpace 2.0 funded initiative, but not completed in that development cycle.  Migration of DSpace Applications to use the DSpace Domain Model, implementation of storage persistence mappings (DAO), along with the utilization of dspace-storage as a means of persistence interaction with Fedora, JCR, Semantic Storage, as well as Legacy Database Storage are the current areas of development that I currently see leading us to a DSpace with Fedora Inside integration.