You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

Backport of DSpace 2 Storage Services API for DSpace 1.x

Student: Andrius Blažinskas
Mentor: Mark Diggory

Abstract

DSpace 2.0 storage mechanism provides convenient way to store DSpace contents in various storage solutions. It is based on set of interfaces for which various implementations are possible and some beta releases already exist (Jackrabbit, Fedora, etc). DSpace 2.0 is in its early stages of development and DSpace 1.x releases yet can not take advantage of this new mechanism. To fix this, it is necessary to port DSpace 2.0 storage interfaces to 1.x. I propose implementing this backport. – Andrius Blažinskas

Relevant modules/classes

Module/class name

Description/Comments

Source code

dspace-api

DSpace API

http://scm.dspace.org/svn/repo/dspace/trunk/dspace-api

dspace-xmlui

XMLUI (Manakin)

http://scm.dspace.org/svn/repo/dspace/trunk/dspace-xmlui

storage-api

Constitute of DSpace 2 storage interfaces. Will be referenced from dspace-xmlui and other modules which will use new storage mechanism. Subject to change.

http://scm.dspace.org/svn/repo/modules/dspace-storage/trunk/api/

storage-legacy

Yet non existant module. Module will implement storage-api interfaces. Basically it will be the shim allowing modules to access DSpaceObjects (in dspace-api) using new storage-api.

-

dspace-services

DSpace services module. DSpace services framework will be used to manage and gain access to storage-api implementations.

http://scm.dspace.org/svn/repo/modules/dspace-services/

ProvidedStorageService

Class which acts as a mediator between caller and storage service implementations. However, its usage is questionable.

http://scm.dspace.org/svn/repo/modules/dspace-storage/trunk/impl/src/main/java/org/dspace/services/storage/ProvidedStorageService.java

Development plan

  • Analysis part:
    • Analysis of dspace-api module
    • Analysis of dspace-services module
    • Deeper review of spring usage in DSpace
    • Analysis of dspace-database module
    • Analysis of dspace-storage-db-2.0.x module
    • Analysis of AIP prototype
  • Better dspace-api adaptation to changing needs:
  • Implementation of storage-legacy module
  • dspace-xmlui relation to storage-api
  • Creation of java documentation
    ...

Evolution of storage-api

Recommended changes to "existing" DSpace 2 storage-api:

  • "StorageProperty[] parameters should be dropped from the StorageEntity object all together." [2]

  • "StorageProperty service methods for performing CRUD operations on Storage properties be maintained on a separate mixin interface." [2]

  • "StorageRelation be removed from the object model and relations be captured only by attaching StorageEntities as "values" of StorageProperties." [2]

  • "... remove methods like getEnititesAtLocation("/community/collection") and would recommend the use of the Search API instead for the retrieval..."
  • "Mapping a prefix to the provider should warrant needing a separate interface to be implemented. That could just be part of assigning the StorageService to the map it is cached in the ProvidedStorageService."

Proposed storage-api

Below is provided proposed storage-api "core" subset class/interface diagram, considering previously described changes. Interfaces are prefixed with "I" which will not necessarily be used in implemented interfaces.

 
Observations:

  • Diagram is "symmetric" till some level, since similar apis can be used for both StorageProperty and StorageBinary management.
  • Like "current" DSpace 2 storage-api, proposed solution also separates read-only and writable storage interfaces.
  • After changes to storage-api, StorageEntity class basically contains only entityId, so it is proposed that it should probably disappear from model or be used only as a structure (container) for properties, binaries, id(-s). This also solves entity versionability question, since properties essentially can have versions, not the entities themselves. In this case, IEntityStorage becomes entity identifier (simple string) management service.
  • IMetadataStorage (and its variants) is a service for management of both - metadata and relations.
  • IBinaryWritableStorage method saveBinary(...) like saveMetaProperties(...) creates or updates property/binary depending whether it exists or not. Of course, these can be split into create and update methods if needed.
  • It should be noted, that final property/binary storage implementation will have to store not only property/binary value and its name, but also entityId. Name and entityId could be combined into one name, however it is left up to implementation.

Backporting strategies

There are different ways to backport dspace-storage into DSpace 1.x, some of these are described here.

Since DSpace 1.x model data is mainly accessed through particular DSpace 1.x entities (Community, Collection, Item, Bundle, Bitstream, BitstreamFormat), new storage mechanism somehow will interact with them. There was discussions (during IRC meetings) on whether DSpaceObjects should be backed by dspace-storage or is it something what should be "covered over" by dspace-storage.

  • Backing DSpaceObjects by dspace-storage allows immediate effect since all current modules uses these entities. However, this approach also involves changing internals of these entities, which opens possibility to introduce bugs affecting everything. This way created storage-legacy module would probably have to overtake the most DSpaceObjects internals which also are coupled back with dspace-api (authorization etc.).
  • DSpaceObjects "cover over" by dspace-storage, if correctly implemented, is a cleaner choice, since changes in dspace-api can be avoided. storage-legacy module in this case would act only as a shim, providing access to dspace-api through storage-api. Conceptually, such solution probably is bad (storage logics should reside in storage-legacy), however it is a good "temporary" measure helping in moving DSpace 1.x to using new storage api.

Proposed backport strategy

Shim or "cover over" solution is chosen as backporting strategy. Diagram below describes it in more detail.

Elements in red are yet to be implemented.

References

1. GSOC 2010 proposal: Backport of DSpace 2 Storage Services API for DSpace 1.x, http://ab.labt.lt/gsoc/2010/dspace/proposal1.html
2. GSoC Collaboration Scratchpad, https://wiki.duraspace.org/display/DSPACE/GSoC+Collaboration+Scratchpad

  • No labels