This Confluence wiki site, maintained by DuraSpace prior to the recent merger with LYRASIS, will transition from the domain to the domain on Saturday, Nov 16 beginning at approximately 7pm ET. A period of downtime of 2-3 hours is expected. After the transition, this wiki will be available at All links to wiki pages will be redirected to the correct URL. If you have questions prior to or following the transition please contact:

Page tree
Skip to end of metadata
Go to start of metadata

DSpace Architectural Review

Notes from Tuesday, 24 Oct 2006 (JSE)

I. Review of Data Model (Rob)

See also: DSpace Data Model

  • Communities
  • Collections
  • Items
    • all items have a Submitter who is an ePerson
    • MDRecord (flat: name, value pairs)
    • fields
    • metadata schema
  • ePersons
    • permissions for items
  • bundles
    • name
  • bitstreams
    • size
    • checksum
    • name
    • description
    • format
    • sequence number
  • bitstream format
    • name
    • long name
    • mime/type
    • file extension

II. Problems with the Data Model

1. Versioning (Rob)

  • Rob's versioning idea (item-level down)
    • "snapshot" of an item at a instant of time
    • coarser-grained notion of "transaction"
      • "logical item change"
      • makes dealing with "events" easier
      • sort-of wiki "locking" model (i.e. very short-term)
    • concept: items are immutable
      • previous "versions" never go away (maybe policy driven)
      • encapsulating higher-level item object
    • What about e.g. Pre-/Post-print versioning?
    • discussion about identification syntax for versions...
      • also, Handles and versioning
    • (ms) issues of complexity presented to users
      • policies concerning displaying major and minor changes
      • keeping around all copies, etc

2. Identifiers

  • Q: should DSpace dictate the identifier system?
    • if so, should that be HS?
  • Rationale: concern over long-term interoperability
    • avoiding "Tower of Babel"
  • separate issue: if the org has a HDL prefix, it needs to be a HDL
  • concern over long-term, exit strategies, etc...
  • (hj) HS is an RFC and CNRI patent protects implementations thereof
  • (jmo) (service, namespace, resource_in_namespace)
  • (Rob) if we treat everything as uri or opaque string...

*RECOMMENDATION: Each CONTENT COMPONENT should have some sort of PERSISTENT IDENTIFIER associated with it

3. Metadata Flexibility Options

  • Structural (e.g. METS)
    • today, no way to specify structure and relationships
  • Descriptive
    • today, item-level descriptive MD
  • Representational
  • Binding metadata to structure
  • Use cases (high level):
    • Versions (alternatives)
    • Versions (versioning)
    • Complex file structures
  • (MS) from a libary perspective, whether something is a unique work or not
  • Lengthy discussion of metadata as bitstreams
    • and TYPEs of bundles
  • (hj) should DSpace allow item-specific md models?
    • reaction in room: wow, huge implications
    • (Rob) crosswalks required
    • problems with user interface
    • e.g. have subsystem whose job it is to deal with MD in specific way
  • (Rob) all of this is possible with the current architecture
    • media filter to convert whatever the scheme is into DC (e.g. for OAI)
  • (MS) ultimate model would be RDF, someday

RECOMMENDATION: Always must be able to CROSSWALK to DC

  • mechanism should be the default (DC)
  • but MD typing mechanism/bundling needs to be extensible
  • currently there are examples (oai, mit 'dwell') that allow asking for specific
  • JD: Right now, flat metadata structure is the bottleneck
  • (RJ) Perhaps we need to think about what it means to be an item in DSpace
    • a specification we put out
    • (MS) Larry Stone's "IP" proposal
      • canonical components of a "DSpace Item"
      • i.e. a manifest of the AIP, with structure map
      • then, arbitrary complex objects
      • MIT's purpose is for interop with SRB

RECOMMENDATION: Put "whether to keep bundles or not" on Half-Baked list

4. Relational Metadata

  • inter-item relationships
  • intra-item relationships
  • between bitstreams, bundles, etc
    • sets of bitstreams to sets of bitstreams
  • between objects
  • Q (Rob) Bundles and bitstreams?

5. Content Format Support

6. Aggregation

7. JSR-170

  • versioning?

8. Terminology

  • bitstream vs datastream vs...

III. Interfaces and Modularity

1. Review of the APIs (Rob)

2. Pain Points: Enumerating the reoccuring types of mods that break things

  • JSPs
  • Servlets
  • Ingest workflow
  • metadata extensions
    • esp. adding new fields
    • downstream indexing
    • browsing
  • Authentication
    • issue: synching with ePerson database
  • Authorization
  • Code Protection on content classes
    • i.e. for extensions on content classes
  • Persistent data store for extensions

3. Are we going to decide to stay with servlets and JSPs

  • or discard and move toward e.g. Manakin?
  • To be discussed (below)

4. Much discussion of the current layering ("Application"/"Business Logic"/"Storage")

  • (MD) We need to understand why certain code keeps getting replicated

5. Overview of the AddOnMechanism (RJ)

  • See [ AddOnMechanism Wiki|]
  • See [ AddOnMechanism presentation|]

6. Summary: What should we be able to do without changing code? (Rob)

  • Add persistent storage for customizations
  • Add new UI pages, link to new pages from existing pages
  • Modify existing UI pages
  • Modify workflow

7. DSpace Manakin Overview (SP)
See: [ DSpace Manakin Wiki|]

  • Pain points
    • Upgradability
    • Modularity
    • Uniformity
  • Aspects and Themes
    • Aspects contain Java source code, static resources, Cocoon's sitemap
  • Manakin solves these pain-points:
    • JSPs
    • Servlets
    • MD extensions (certain cases)
    • workflow (UI aspects)
  • Draft Recommendation (SP)
    • first, embrace the AddOnMechanism
    • proposed road map:
      • 1.5: JSPUI full support, initial version with XMLUI
      • 1.6: JSPUI full support, XMLUI full support & rec'd
      • 1.7: JSPUI depreciated, XMLUI full support & rec'd
      • 2.x: XMLUI only

*What are the alternatives to Manakin?

    • Are there potential incompatibilities with other of these frameworks?
    • (sp) Could put Manakin into the same source tree as JSPs
  • (Rob) Is there a really-really dumbed-down version of the AddOnMechanism that could be put in to the*Main tree
      • (sp) "A day's worth of work..."

8. OSGi Overview (RR)

*What are the alternatives to OSGi?

  • RJ's AddOnMechanism,*but it isn't complete
    • (ms) Strategy: Choose framework, get resources, do analysis to identify APIs, implement...

9. What about Maven? (gt)

  • "Maven is about the application of patterns in order to achieve an infrastructure which displays the characteristics of visibility, reusability, maintainability, and comprehensibility..."
  • "Maven uses a declarative approach, where the project structure and contents are described, rather then the task-based approach used in Ant or in traditional make files ... This helps enforce ... development standards and reduces the time needed to write and maintain build scripts..."
  • Another option to consider for making the add-on build process a "little easier to do"

*What are the alternatives to Maven?

    • Ant
    • (jd) Maven does what it wants to do
      • (rr) It enforces "patterns" across an org (see above)

10. What is our recommendation?

  • where to focus resources?
  • should the focus be on major refactoring?
  • should the focus instead be on key pain points (e.g. persistent storage)?
  • (rob) multiple trajectories
  • (md) need some reorg of code base
    • keep dependencies separate and isolated
    • once isolated, then refactor/define the interfaces
    • fix what breaks
    • use the tools (e.g. Eclipse)
    • need to experiment!
  • (rj) But if we refactor the information model, we'll need to refactor the core anyways
  • (jd) we're not starting from scratch!
    • need to define the goal and go there

11. Break-time discussion of what level of difficulty refactoring should take on, and how it might be managed...

12. (MS) Attempt at summary

  • Manakin with an AddOnMechanism addresses a lot of the pain points
  • See above: Short-term solution putting simplified AddOnMechanism in*main tree
  • A refactoring will be required with the refactored information model
  • There will be a 2.0 with an AddOnMechanism more like OSGi (see JSE questions above)
  • (MS) RJ's approach is a short-term but not long-term
  • NEED: "Plug-in framework" (e.g. OSGi) plus "build framework" (e.g. Maven)