Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

The presentation on DSpace+2.0 architecture at the conference put me
in a tizzy of random musings on "where we (could) go from here".
Here's a spot to capture thoughts, lest they disappear
forever. "NSpace" should be read as "DSpace v.N.x".
Quick links to the sections of this ever-lengthening ramble:

DIDL, LANL, Hierarchy, and Identifiers || #didl
Modular Architecture Strategizing || #modstr
Sakai and Development Environments || #sakai
SOAP Services and the Axis Toolkit || #soap


Anchor( didl ) DIDL, LANL, Hierarchy, and Identifiers

Here are three good ULs, recommended to me by my supervisor after I
mentioned the acronym "DIDL" in reference to one of the presentations
at the conference. Thoughts on the articles follow the ULs. http://www.dlib.org/dlib/november03/bekaert/11bekaert.html http://www.dlib.org/dlib/february04/bekaert/02bekaert.html http://www.cni.org/tfms/2003b.fall/abstracts/P-digital-sompel.html
The word for metadata in DIDL is "descriptors". This is a great piece
of vocabulary, since "metadata" sounds awfully partitive ("some milk")
and the definitive "metadata record" ("a glass of milk") is sort of
clunky. "DSpace works by putting bitstreams into containers, and
attaching descriptors to them," flows rather nicely. There are also
the ideas that (a) a given descriptor might have a specific type (DC,
LOM, obsYourUncle), and (b) descriptors can be attached to anything
in DSpace (i.e. attach a descriptor of type "community" to a
container, instead of having a seperate set of non-generalizable
columns in your non-generalizable community table in your metadata,
excuse me, your descriptor store).
Looking at the hierarchy decisions that LANL made, I'm wondering (just
wondering, mind you) if it wouldn't be opportune to destroy all
artificial hierarchy in NSpace. Which is to say, just have pools of
each level of component (bitstream pool, item pool, collection pool,
container pool) and then maintain a hierarchical view into this truly
flat structure merely as a kind of linking metadata ("hierarchy
descriptors"?). This would make multiple inclusion very easy at all
levels, although it would mean taking some serious time to get the
AuthN/Z stuff rock solid.
eading about LANL's "identifier-centric approach" makes me think that
everything in NSpace, every single digital object at every level in
the hierarchy, needs to have a distinct (internal?) identifier. This
probably follows from my previous point in any case, and it doesn't
mean "we must assign handles to everything!" We can in fact choose
to expose greater or lesser amounts of our identifier data through
modules that implement interfaces to handles, AKs, etc. We can do it
purely algorithmically, or we could choose to implement identifiers as
just another kind of descriptor, i.e. "Hang a descriptor of type
"handle" on that collection. Now go to the handle interface module and
show me everything in this instance for which descriptors of type
handle have been defined." As opposed to "export_handle_identifiers =
yes", restart, wave hands, walk away.


Anchor( modstr ) Modular Architecture Strategizing

In MikeSimpsonThoughtsDuringConference, I waxed rhapsodic on how it
would be nice to have NSpace implement a "stackable modular"
architecture, where multiple modules can "register" their ability to
provide a given API (AuthN/Z, Asset Store, etc.) and some kind of
dispatching framework takes care of turning an API call into a series
of calls through the stack of registered modules, where each module
can opt to handle the call or decline and pass it to the next module.
I've now realized that I have absolutely zero idea how to implement
something like that in Java, let along in C/C++ or something similar,
never having done anything like it before. In C, I could guess that
you'd be dealing with arrays of function pointers, but other than
that, I'm sort of at a loss.
I'd like to find some citations to literature describing strategies
for implementing this style of architecture ... fifteen minutes of
googling on words like "dispatch" and "modular" producing very little
of value, I ask the wiki at large: if you've got citations, go ahead
and edit them into place here, and I'll check back and read them when
I can. If you're not into wiki, just email them to me instead.
This is the spot where I should say: "Thanks In Advance."

  • A lightweight method I've used in the past is to define the services
    your front end (web app) will use as interfaces. Then you have a
    service factory class (see Java Design Patterns || http://www.amazon.com/exec/obidos/tg/detail/-/0201485397/qid=1079619962/sr=1-3/ref=sr_1_3/104-8703109-2892715?v=glance&s=books
    if you're not factory savvy...) that uses a config file to decide
    which implementations of those interfaces it's going to use. y
    making sure that all your webapp classes have access to a service
    factory instance (some people use the singleton pattern here, but I
    prefer to keep a reference to the object in ServletContext, or bind
    one through JNDI), and that they always use the interface returned
    (and don't cast it to the implementing class), you can change
    service implementations through configuration without changing a
    line of code, and add implementations as easily.
    I've been wondering about the potential value of going as far as a
    component based architecture (e.g. avalon || http://avalon.apache.org/ ).
    Whichever approach we use, the most important aspect of a modular
    architecture is having a good abstraction of what functions exist in
    the application and how they fit together.
    JimDowning – 2004-03-18
  • I agree. Are "what functions exist" equivalent to the set of APIs
    that individual modules choose to implement, as obert Tansley
    talked about in his presentation? Is there a difference (in
    vocabulary and/or in implementation) between the "internal" APIs
    (modules calling modules) and the "external (interface?)" APIs
    (services calling modules). Or is the interface code just a set of
    XML (SOAP?) services offered by the DSpace server?
    MikeSimpson – 2004-03-19

    Anchor( sakai ) Sakai and Development Environments

    I got emailed a copy of "eport and Commentary" from the Sakai Developers Workshop.
    The content hasn't shown up at The Sakai Project || http://www.sakaiproject.org/
    yet, but the most interesting part is the technology choices being made: JDK 1.4
    as the base runtime environment, Eclipse as the IDE of choice, Maven for deployment
    and release control, and JSF ("Java Server Faces") for the display technology (the
    notes are a little unclear, but it sounds like the JSF gets embedded in JSP, which
    is then run under Tomcat; I know zero about JSF but it's compared with Struts, which
    I've used and liked as an MVC framework several times).
    Dare I ask it, even as a strawman: how wedded are we to Java? And now I shall duck.
  • We should seriously consider maven for the build system - it sits on
    top of ant, and does a whole load of stuff for free that would be
    laborious to achieve in ant.
    Do they mention source control? If you've ever been frustrated with
    cvs you should check out subversion || http://subversion.tigris.org/. Miles better, especially for
    developing Java (e.g. you can actually rename files (gasp)).
    I've been using Struts for a few years now, and I've found that it
    doesn't scale fantastically well (in complexity terms) for large
    projects. I've also got really frustrated with it's inability to
    define proper pipelines - makes it difficult to write reusable
    components and also manage resources efficiently. Much as I dislike
    XSLT I suspect cocoon is a good fit for many dspace end user
    applications.
    Java alternatives... I guess it makes sense to at least take stock
    of what skills we have in the developer community. What do you have
    against Java, incidentally? For me the sheer weight of support and
    choice of open source frameworks and tools in Java has always made
    up for it's failings as a language.
    JimDowning – 2004-03-18
  • I've heard a lot of good things about Maven. It looks like they're
    close to a 1.0 release, which would be convenient for us if its
    available if/when we start developing this beast. And while
    refreshing my memory of Maven, I took a look at Gump, the continuous
    integration tool – that also looks really interesting. No, I
    haven't fully bought into the Extreme Programming paradigm, but I
    read an article in Dr. Dobbs awhile back that made continuous
    integration sound like a eally Good Thing.
    I haven't used subversion – we're pretty standardized on CVS, but
    I'd be happy to have a better alternative. From the Maven reference
    guide page for the changelog plugin || http://maven.apache.org/reference/plugins/changelog/
    it looks like subversion is supported.
    I agree about Struts. Great for small projects, but has scalability
    issues. I've looked at the Turbine and Tapestry frameworks (not too
    recently), but they looked like they hadn't really settled down yet,
    and I had trouble finding adequate documentation. That may have
    changed. Turbine was also in the middle of a huge decoupling rewrite,
    modularizing various services out of the code code.
    I don't have anything in particular against Java, except that I've
    been working with reams of it recently, so its warts are in full
    view at the top of my annoyance stack, and there's so much hype
    around it as a language that I can't help but wonder what we're
    missing in the noise. I just didn't want to make any unwarranted
    up-front assumptions. Python is a great language for rapid
    prototyping, although I don't know if there's the wealth of
    framework/support software available for it. I'd love to say we
    should do the whole thing in autotool'd C/C++, for pure speed and
    efficiency, but I don't know that I have the skillset to pull that
    off, frankly.
    MikeSimpson – 2004-03-19

    Anchor( soap ) SOAP Services and the Axis Toolkit

    I started picking through the SOAP 1.2 protocol documentation, and the
    Axis user guide: http://www.w3.org/T/soap12-part0/ http://ws.apache.org/axis/java/user-guide.html
    I really like the idea of using SOAP as the "public face" of NSpace.
    That is, there are internal APIs that are used for module-to-module
    communication, and external interfaces that each manifest as a SOAP
    service. That makes building the application layer on top of NSpace
    extremely standardized, easy, and extensible.
    Notes to self on SOAP:
    A SOAP "message" is composed of a SOAP "envelope". The envelope is
    composed of an optional "header" and a mandatory "body".
    The header can containing zero or more "header blocks". As a message
    moves through a network of SOAP nodes, header blocks may be added,
    altered, or deleted (they are meta to the application payload).
    Header blocks can be marked in ways that indicate processing
    requirements (i.e. "must be processed by the next node, or throw a
    fault") for the SOAP nodes that encounter the message.
    The body is for information that must be maintained end-to-end,
    between the "initial SOAP sender" and the "ultimate SOAP receiver".
    SOAP messages can be embedded into many different transmission
    protocols, i.e. HTTP, SMTP, etc.

    How about that? NSpace could be wrapped in service layers that

    respond to "check the status of this item" via HTTP/HTML, or via
    SMTP/ascii. Sweet.
    SOAP leaves it up to the application to determine the form the
    "conversation" will take, i.e. request/response, fire-and-forget, etc.
    SOAP header blocks can be modified by "env:role",
    "env:mustUnderstand", and "env:relay" attributes.
    SOAP nodes can assume one or more roles. Standardized roles
    (specified for header blocks) include "none" (no node should process
    this header block, although it may be examined), "next" (the next node
    in the message chain must process it), or the default
    "ultimateeceiver" (the ultimate receiver node is responsible for
    processing this header block).
    If "env:mustUnderstand" is "true", a node that has assumed the
    specified role for that block must process that block (whatever that
    means) or generate a fault and stop forwarding the message.
    If "env:relay" is "true", the header block must be forwarded if it is
    not processed.
    The specification of how SOAP messages are to be transmitted using a
    specific protocol is called a "SOAP binding". A given binding will
    provide a set of "features", like "request/response correlation", or
    "encrypted channel", or "reliable delivery channel", etc. The feature
    may be implemented by the underlying protocol, or it may be
    implemented within the message itself, using header blocks: a
    specification of a feature implemented using header blocks is called a
    "SOAP module".
    SOAP 1.2 describes an HTTP binding implementing the SOAP Web Method
    feature. This uses an HTTP GET to implement the SOAP response message
    exchange pattern, and an HTTP PUT to implement the SOAP
    request-response message exchange pattern.
    SOAP intermediaries can be "forwarding intermediaries" (route messages
    based on header blocks) or "active intermediaries" (processes messages
    as they pass through, i.e. encryption, timestamping, etc.) that change
    or add header blocks.
    ... more content as it occurs ...
  • No labels