Contribute to the DSpace Development Fund
The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.
The presentation on DSpace+2.0 architecture at the conference put me
in a tizzy of random musings on "where we (could) go from here".
Here's a spot to capture thoughts, lest they disappear
forever. "NSpace" should be read as "DSpace v.N.x".
Quick links to the sections of this ever-lengthening ramble:
DIDL, LANL, Hierarchy, and Identifiers || #didl
Modular Architecture Strategizing || #modstr
Sakai and Development Environments || #sakai
SOAP Services and the Axis Toolkit || #soap
Anchor( didl ) DIDL, LANL, Hierarchy, and Identifiers
Here are three good ULs, recommended to me by my supervisor after I
mentioned the acronym "DIDL" in reference to one of the presentations
at the conference. Thoughts on the articles follow the ULs.
http://www.dlib.org/dlib/november03/bekaert/11bekaert.html
http://www.dlib.org/dlib/february04/bekaert/02bekaert.html
http://www.cni.org/tfms/2003b.fall/abstracts/P-digital-sompel.html
The word for metadata in DIDL is "descriptors". This is a great piece
of vocabulary, since "metadata" sounds awfully partitive ("some milk")
and the definitive "metadata record" ("a glass of milk") is sort of
clunky. "DSpace works by putting bitstreams into containers, and
attaching descriptors to them," flows rather nicely. There are also
the ideas that (a) a given descriptor might have a specific type (DC,
LOM, obsYourUncle), and (b) descriptors can be attached to anything
in DSpace (i.e. attach a descriptor of type "community" to a
container, instead of having a seperate set of non-generalizable
columns in your non-generalizable community table in your metadata,
excuse me, your descriptor store).
Looking at the hierarchy decisions that LANL made, I'm wondering (just
wondering, mind you) if it wouldn't be opportune to destroy all
artificial hierarchy in NSpace. Which is to say, just have pools of
each level of component (bitstream pool, item pool, collection pool,
container pool) and then maintain a hierarchical view into this truly
flat structure merely as a kind of linking metadata ("hierarchy
descriptors"?). This would make multiple inclusion very easy at all
levels, although it would mean taking some serious time to get the
AuthN/Z stuff rock solid.
eading about LANL's "identifier-centric approach" makes me think that
everything in NSpace, every single digital object at every level in
the hierarchy, needs to have a distinct (internal?) identifier. This
probably follows from my previous point in any case, and it doesn't
mean "we must assign handles to everything!" We can in fact choose
to expose greater or lesser amounts of our identifier data through
modules that implement interfaces to handles, AKs, etc. We can do it
purely algorithmically, or we could choose to implement identifiers as
just another kind of descriptor, i.e. "Hang a descriptor of type
"handle" on that collection. Now go to the handle interface module and
show me everything in this instance for which descriptors of type
handle have been defined." As opposed to "export_handle_identifiers =
yes", restart, wave hands, walk away.
Anchor( modstr ) Modular Architecture Strategizing
In MikeSimpsonThoughtsDuringConference, I waxed rhapsodic on how it
would be nice to have NSpace implement a "stackable modular"
architecture, where multiple modules can "register" their ability to
provide a given API (AuthN/Z, Asset Store, etc.) and some kind of
dispatching framework takes care of turning an API call into a series
of calls through the stack of registered modules, where each module
can opt to handle the call or decline and pass it to the next module.
I've now realized that I have absolutely zero idea how to implement
something like that in Java, let along in C/C++ or something similar,
never having done anything like it before. In C, I could guess that
you'd be dealing with arrays of function pointers, but other than
that, I'm sort of at a loss.
I'd like to find some citations to literature describing strategies
for implementing this style of architecture ... fifteen minutes of
googling on words like "dispatch" and "modular" producing very little
of value, I ask the wiki at large: if you've got citations, go ahead
and edit them into place here, and I'll check back and read them when
I can. If you're not into wiki, just email them to me instead.
This is the spot where I should say: "Thanks In Advance."
- A lightweight method I've used in the past is to define the services
your front end (web app) will use as interfaces. Then you have a
service factory class (see Java Design Patterns || http://www.amazon.com/exec/obidos/tg/detail/-/0201485397/qid=1079619962/sr=1-3/ref=sr_1_3/104-8703109-2892715?v=glance&s=books
if you're not factory savvy...) that uses a config file to decide
which implementations of those interfaces it's going to use. y
making sure that all your webapp classes have access to a service
factory instance (some people use the singleton pattern here, but I
prefer to keep a reference to the object in ServletContext, or bind
one through JNDI), and that they always use the interface returned
(and don't cast it to the implementing class), you can change
service implementations through configuration without changing a
line of code, and add implementations as easily.
I've been wondering about the potential value of going as far as a
component based architecture (e.g. avalon || http://avalon.apache.org/ ).
Whichever approach we use, the most important aspect of a modular
architecture is having a good abstraction of what functions exist in
the application and how they fit together.
JimDowning – 2004-03-18 - I agree. Are "what functions exist" equivalent to the set of APIs
that individual modules choose to implement, as obert Tansley
talked about in his presentation? Is there a difference (in
vocabulary and/or in implementation) between the "internal" APIs
(modules calling modules) and the "external (interface?)" APIs
(services calling modules). Or is the interface code just a set of
XML (SOAP?) services offered by the DSpace server?
MikeSimpson – 2004-03-19
Anchor( sakai ) Sakai and Development Environments
I got emailed a copy of "eport and Commentary" from the Sakai Developers Workshop.
The content hasn't shown up at The Sakai Project || http://www.sakaiproject.org/
yet, but the most interesting part is the technology choices being made: JDK 1.4
as the base runtime environment, Eclipse as the IDE of choice, Maven for deployment
and release control, and JSF ("Java Server Faces") for the display technology (the
notes are a little unclear, but it sounds like the JSF gets embedded in JSP, which
is then run under Tomcat; I know zero about JSF but it's compared with Struts, which
I've used and liked as an MVC framework several times).
Dare I ask it, even as a strawman: how wedded are we to Java? And now I shall duck. - We should seriously consider maven for the build system - it sits on
top of ant, and does a whole load of stuff for free that would be
laborious to achieve in ant.
Do they mention source control? If you've ever been frustrated with
cvs you should check out subversion || http://subversion.tigris.org/. Miles better, especially for
developing Java (e.g. you can actually rename files (gasp)).
I've been using Struts for a few years now, and I've found that it
doesn't scale fantastically well (in complexity terms) for large
projects. I've also got really frustrated with it's inability to
define proper pipelines - makes it difficult to write reusable
components and also manage resources efficiently. Much as I dislike
XSLT I suspect cocoon is a good fit for many dspace end user
applications.
Java alternatives... I guess it makes sense to at least take stock
of what skills we have in the developer community. What do you have
against Java, incidentally? For me the sheer weight of support and
choice of open source frameworks and tools in Java has always made
up for it's failings as a language.
JimDowning – 2004-03-18 - I've heard a lot of good things about Maven. It looks like they're
close to a 1.0 release, which would be convenient for us if its
available if/when we start developing this beast. And while
refreshing my memory of Maven, I took a look at Gump, the continuous
integration tool – that also looks really interesting. No, I
haven't fully bought into the Extreme Programming paradigm, but I
read an article in Dr. Dobbs awhile back that made continuous
integration sound like a eally Good Thing.
I haven't used subversion – we're pretty standardized on CVS, but
I'd be happy to have a better alternative. From the Maven reference
guide page for the changelog plugin || http://maven.apache.org/reference/plugins/changelog/
it looks like subversion is supported.
I agree about Struts. Great for small projects, but has scalability
issues. I've looked at the Turbine and Tapestry frameworks (not too
recently), but they looked like they hadn't really settled down yet,
and I had trouble finding adequate documentation. That may have
changed. Turbine was also in the middle of a huge decoupling rewrite,
modularizing various services out of the code code.
I don't have anything in particular against Java, except that I've
been working with reams of it recently, so its warts are in full
view at the top of my annoyance stack, and there's so much hype
around it as a language that I can't help but wonder what we're
missing in the noise. I just didn't want to make any unwarranted
up-front assumptions. Python is a great language for rapid
prototyping, although I don't know if there's the wealth of
framework/support software available for it. I'd love to say we
should do the whole thing in autotool'd C/C++, for pure speed and
efficiency, but I don't know that I have the skillset to pull that
off, frankly.
MikeSimpson – 2004-03-19
Anchor( soap ) SOAP Services and the Axis Toolkit
I started picking through the SOAP 1.2 protocol documentation, and the
Axis user guide: http://www.w3.org/T/soap12-part0/ http://ws.apache.org/axis/java/user-guide.html
I really like the idea of using SOAP as the "public face" of NSpace.
That is, there are internal APIs that are used for module-to-module
communication, and external interfaces that each manifest as a SOAP
service. That makes building the application layer on top of NSpace
extremely standardized, easy, and extensible.
Notes to self on SOAP:
A SOAP "message" is composed of a SOAP "envelope". The envelope is
composed of an optional "header" and a mandatory "body".
The header can containing zero or more "header blocks". As a message
moves through a network of SOAP nodes, header blocks may be added,
altered, or deleted (they are meta to the application payload).
Header blocks can be marked in ways that indicate processing
requirements (i.e. "must be processed by the next node, or throw a
fault") for the SOAP nodes that encounter the message.
The body is for information that must be maintained end-to-end,
between the "initial SOAP sender" and the "ultimate SOAP receiver".
SOAP messages can be embedded into many different transmission
protocols, i.e. HTTP, SMTP, etc.respond to "check the status of this item" via HTTP/HTML, or viaHow about that? NSpace could be wrapped in service layers that
SMTP/ascii. Sweet.
SOAP leaves it up to the application to determine the form the
"conversation" will take, i.e. request/response, fire-and-forget, etc.
SOAP header blocks can be modified by "env:role",
"env:mustUnderstand", and "env:relay" attributes.
SOAP nodes can assume one or more roles. Standardized roles
(specified for header blocks) include "none" (no node should process
this header block, although it may be examined), "next" (the next node
in the message chain must process it), or the default
"ultimateeceiver" (the ultimate receiver node is responsible for
processing this header block).
If "env:mustUnderstand" is "true", a node that has assumed the
specified role for that block must process that block (whatever that
means) or generate a fault and stop forwarding the message.
If "env:relay" is "true", the header block must be forwarded if it is
not processed.
The specification of how SOAP messages are to be transmitted using a
specific protocol is called a "SOAP binding". A given binding will
provide a set of "features", like "request/response correlation", or
"encrypted channel", or "reliable delivery channel", etc. The feature
may be implemented by the underlying protocol, or it may be
implemented within the message itself, using header blocks: a
specification of a feature implemented using header blocks is called a
"SOAP module".
SOAP 1.2 describes an HTTP binding implementing the SOAP Web Method
feature. This uses an HTTP GET to implement the SOAP response message
exchange pattern, and an HTTP PUT to implement the SOAP
request-response message exchange pattern.
SOAP intermediaries can be "forwarding intermediaries" (route messages
based on header blocks) or "active intermediaries" (processes messages
as they pass through, i.e. encryption, timestamping, etc.) that change
or add header blocks.
... more content as it occurs ...