Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

This page is outdated and kept for archival purposes only. It contains the proposal and discussion of the "Metadata for all DSpace objects" concept. The actual implementation of this concept in DSpace 5 is outlined in Metadata for all DSpace objects.

This is based on Claudia's comments in the commiter list and the topic discussed elsewhere. Goals should be to expand on this and relate it to other projects/efforts. For instance, the async release proposal will now include establishing a true "API" for DSpace legacy objects and placing this in the "modules" directory where other projects can depend on it.

Enable Metadata on All DSpace Objects

- enable metadata for all dspace objects (communities, collections,
items, bundles, bitstreams, metadatavalues, epersons, groups).

Revise the Default Metadata Registery

- revise the default metadata registry see
https://jira.duraspace.org/browse/DS-433

Add Metadata Authority Controls / Vocabularies to the Data Model

- not only manage the metadata schema, but also related vocabularies
(like DCMI Type) and encoding schemata.

Refactor Metadata Schema to Support inheritance from existing Fields

- enable multiple metadata schemas per default (dcterms, an
administrative one and a couple of standard namespaces like those used
for prism)

Standardize the Default Namespaces

- think about whether standard namespaces (supposed the namespace is
complete in the default registry) should be editable at all. An instance
can always use it's own namespace.

Support Attaching Rendering Hints to DSpace Items and Metadata Fields.

- manage metadata field related configurations like hide option,
display, browse, search etc. in the db rather than in dspace.cfg. At the
moment one can delete/move a field which via the UI which is used as
configuration parameter

Improve Support for MetadataValue

- get rid of the deprecated DCValue

6 Comments

  1. I would like to add support for "Metadata Sections" to this proposal.

    1. Metadata Sections allow one or more groups of Metadata to be defined on any DSpace Object.
    2. Metadata Sections may have ResourcePolicies attached to control access to them (protected provenance metadata, administrative metadata, etc)
    3. Metadata Sections would be stored in Bitstreams

    I believe the most appropriate means to make metadata available on all DSpace Objects without bloating the database these "sections" directly in Bitstreams (in the same manner that Hydra stores these sections in Fedora Datastreams).

    This will also align DSpace Objects more directly with METS (and likewise, in the same process, Fedora and Hydra).

    In this approach Metadata is stored in Individual Bitstreams and the Data Model associates these bitstreams with specific Communities, Collections, Items or Bitstreams.  The best example for this will be a dmdSec bitstream that associates a Dublin Core Record with a DSpace Bitstream within an Item.

    Benefits:

    1. A generally clear and standard strategy for encoding of these metadata "sections" within the AIP METS Manifest for all Additional Metadata For these objects.
    2. Flexibility to support alternate encodings for Metadata without "shoehorning" them into flat fields (ie, direct support for mods, ead, VRACore, DDI, RDF, etc)
    3. Cocoon is ideal for generating user interfaces on XML sources such as mods, dublin core xml, ead, VRACore and so-on.
    4. Opportunity to directly map DSpace Objects (Communities, Collections and Items) directly to Fedora Objects, storing Metadata Bitstreams 1 to 1 as Fedora Datastreams.
    5. Fits with original Architectural Review recommendations to allow "MetadataFiles" to be attached to DSpaceObjects.

    Caveats:

    1. User interfaces will need to be enhanced to support editing models that may be different then our current relational transaction approach.
    2. Rendering views on DSpace resources will become more file access centric rather than database query centric (will have its own benefits and caveats)
    3. Metadata Registries may themselves become Schema or Ontology centric (again will have its own caveats and benefits).

    Fedora Object Model

    Hyrdra Content Model

    Common metadata content model

    As noted above, all Hydra objects will subscribe to a common metadata content model which provides for the types of metadata that all objects are likely to need.

    Datastreams as follows:

    • DC (compulsory) – The Fedora built-in minimal descriptive metadata, possibly derived automatically from the DescriptiveMetadata datastream below.
    • RELS-EXT (compulsory)
      • hasContentModel
      • isMemberOf or isMemberOfCollection (as needed to create groups of ETDs by type, source, etc.)
      • etc
    • descMetadata (XML) (compulsory)
      • 'Out of the box' Hydra will expect MODS but we have already shown that it is relatively straightforward to modify Hydra to work with other schemas here. Largely it requires changing the indexing XSLT and possibly the Solr properties file
    • rightsMetadata (XML) (compulsory), may contain
      • PREMIS premisRights, or
      • METS rightsMD, or
      • MODS accessConditions, or
      • [locally defined rights metadata structure|Hydra rights metadata], or
        ** etc - but there must be a machine actionable and/or human readable entry here
        * contentMetadata (XML) (optional, however it should be present in all objects (simple,compound or parent) that can present a splash page containing onward links in order to provide structural detail for displaying them; may contain
        ** METS FileSec
        ** METS StructMap
        ** ORE map
        ** [locally defined schema|^StanfordContentMetadata.pdf] The schema here was developed by Stanford and is being adopted by the Hydra partners.
        ** etc
        * technicalMetadata (XML) (optional), may contain
        ** PREMIS premisObject
        ** type specific (e.g., MIX for images)
        ** etc
        * provenanceMetadata (XML) (optional)
        ** eg PREMIS premisEvents
        * sourceMetadata (XML) (optional)
        ** eg METS sourceMD snippet? (only a wrapper to object-specific MD)

    DSpace AIP METS Representation

    See DSpace AIP Format

    mets/dmdSec element(s)

    I would add dmdSec for metadata attached to Bitstreams and Bundles would be associated with the those files per these definitions of dmdSec (and other sections in the METS standard

    See Standard Definition: http://www.loc.gov/standards/mets/METSOverview.v2.html

    Note that all <dmdSec> elements must possess an ID attribute. This attribute provides a unique, internal name for each <dmdSec> element which can be used in the structural map to link a particular division of the document hierarchy to a particular <dmdSec> element. This allows specific sections of descriptive metadata to be linked to specific parts of the digital object.

    I believe that the above approach would open up possibilities to meet all the suggested requirements outlined above.

    • Default Metadata Registry : Metadata Registry may shift and become UI feature for Submissions and Item Edit Form Validation and less intrinsic to DSpace Domain Model. Will allow storage of more metadata with fewer restrictions.
    • Add Metadata Authority Controls / Vocabularies to the Data Model: Authority Controls again are a UI feature for managing metadata values in form fields etc.  Actual encoding into data model would be specific to the metadata format.
    • Refactor Metadata Schema to Support inheritance from existing Fields: For Dublin Core can become RDF/Ontology driven, will have dramatic degree of flexibility.
    • Standardize the Default Namespaces: RDF and XML Schema standard namespaces would be expressed in encoded Bitstreams.
    • Support Attaching Rendering Hints to DSpace Items and Metadata Fields.: With RDF or XML, alternate namespaces may allow extension of statements to include rendering hints.
    • Improve Support for MetadataValue: May actually do away with the approach altogether, relying instead on standard Java bindings to XML and/or RDF to support editing.
  2. I'd recommend separating many/all? of these headers out into their own proposal pages (maybe all sub-pages of this "Metadata for All" proposal?).

    It just becomes harder & harder to comment on or brainstorm any one of these ideas, as this has turned into a "one metadata proposal to rule them all" page.

    Don't get me wrong, I think each of these ideas has merit & is worth close consideration. I just think they don't all need to be implemented simultaneously nor are they all interdependent.

    The idea of the "Metadata Sections" is interesting and worth thinking through. Though, it does sound like a rather large change (affecting a lot of code, especially editing/submission code in the UIs). So, it may need its own discussion page where we can start to dig deeper into the benefits/caveats and what they may mean for DSpace.

    1. partly why I was hesitant to add it directly to the page.  Ideally, it would be good not to break up the page into subsections, but instead to approach the topic allong a general phased project planning strategy

      a.) Business Requirements for Additional Metadata Support (that does not address actual implementation)

      b.) Technical Requirements for Additional Metadata Support (that discusses some of the issues and implementation needs)

      c.) Final proposal for adding the support.

      I would suggest that after "a" and parts of "b" were completed, the topic of how to fund this activity might be possible to explore with various granting agencies or funding entities.

      1. I agree with taking it in stages, but I'd still want to separate these various projects out more. Everything is still lumped together under "Metadata for All" when most topics technically could be only loosely related (or even parallel) projects.

        For example:

        Project #1 - Get DSpace up-to-date in terms of DCMI standards (DS-433) without changing anything else in the system
        Project #2 - Next, standardize default namespaces (perhaps creating a new "DSPACE" namespace and "LOCAL" namespace or similar), and provide sample upgrade scripts for users to migrate to these namespaces
        Project #3 (could be done in parallel) - Investigate options to enable metadata on Collections & Communities
        Project #4 - Enhance metadata with authority control / vocabs

        ... you get the idea.

        Essentially, we shouldn't be treating these all as one big project. Rather, we should work to split them up into smaller, more manageable chunks. That way we can reasonably expect some of this to make it into 3.0, and then make further enhancements in 4.0, etc. It will also ease in funding if we break this up more – it may not be that easy to get a giant all-encompassing grant (which is harder and harder to come by these days). But, we might be able to find volunteers or small funds/teams to help us make small steps in the right direction.