Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Expectations

This proposal is based on the premis that changes to DSpace metadata characteristics must be backward comparable and retain the same functionality as previously existed to ease transitional for all existing users of the platform.  So many different functional areas of DSpace are reliant on existing metadata functionality, that it is criticial that any changes in functionality also have well defined and scripted updates across releases.

Primary Objective

The primary objective of this proposal is that the DSpace metadata registry be "naturally" extended to support a richer and more expressive "Metadata Schema". Technical Objectives of the rpoposal are to provide the following features:

  1. Capability to Define "Metadata Profiles" for specific DSpace Objects and/or types of Objects.
  2. Capability to Define DMCI "subPropertyOf" relationships outside of the legacy ns.element.qualifier approach
  3. Capability to have "immutable" DC, DCTERMS and other "well established" namespaces  
  4. Capability to Validate Existing DSpace item Metadata based on a profile that is either assigned via the parent container or directly tot he DSpace Item
  5. Capability to Apply these profiles similarly to DSpace Communities, Collection, Items, Bundles and Bitstreams.

Another very critical feature of this proposal is that this new Schema model should support the above features without significant need to transform existing DSpace Item metadata nor the registry itself.  

Conceptual Definition of "Schema"

The DSpace MetadataSchema registry was designed based on an outdated concept of "Application Profiles" and "Qualified Dublin Core" that predated the current DCMI Abstract Model.  Due to this, there are number of significant shortcomings to the current implementation.

  1. Namespaces are not "Schema"
  2. Qualification does not effectively meet needs for use of alternative namespaces while still providing clear mappings to DC for exposing metadata in OAI_DC.
  3. The Schema and Fields defined are insufficient to support validation of DSpace metadata fields in relation to Item Submission or other methods of Deposit.

The current "DSpace Schema" does not meet the requirements that a Schema is traditionally used for.  Schema are traditionally used to define a scaffolding or framework of rules which actual content can be validated against. While the current MetadataSchema/Field does restrict what can be assigned to any item in DSpace, it does not provide any support for validation of these assignments, nor allow us to further define the encoding of the metadata values nor if they are required or not.  At this time, much if of the validation, rules and encoding is poorly assigned instead, at the UI/Presentation level in the DSpace Submission input-forms.xml file and only enforced in the Describe Step of the Submission workflow.

This proposal seeks to extend the definition of the DSpace Metadata Schema to include support of these features previously found only in the Submission input-forms.xml. Formaizing a strategy for metadata validation in DSpace that is a new core feature.

Repurposing of MetadataSchema and MetadataField as Custom Metadata Template

Rather than MetadataSchema applying to the namespace of the metadata fields that are allowed by the repository.  We instead recommend that this table be repurposed to embody "templates" of MetadataFields that should be used for specific types of DSpace Objects.   Typing would be based on:

These above types will be expressed through the addition of properties to the MetadataSchemaRegistry and MetadataFieldRegistry tables to provide the facility to expand on and add additional Schema.  Some Hypothetical examples of such schema would be:

  • Community or Collection Profiles
    • Document Collection Profile
    • Journal Issue Profile
    • Image Gallery Profile
  • Item Profiles : 
    • Scholarly Item Profile
    • Website Item Profile
    • Thesis Item Profile
    • Technical Report Item Profile
    • Journal Article Profile
    • Learning Object Item Profile
  • Bitstream Profiles
    • Multimedia Profile
      • Streaming Video Profile
      • Image Profile
    • Document Profile
      • Article
      • Spreadsheet
      • Etc
  • Custom
    • Custom Profile for any new type of DSpace content

The above profiles could be applied heterogeniously though metadata attached to any level of the DSpace object hierarchy.

Metadata Field Inheritance

Individual Metadata Fields, like DCMI metadata properties will support subTyping or inheritance. For example, from the DCMI Website, we have the following:

http://dublincore.org/documents/dcmi-terms/#terms-title

Term Name:    title
URI:http://purl.org/dc/terms/title
Label:Title
Definition:A name given to the resource.
Type of Term:Property
Refines:http://purl.org/dc/elements/1.1/title
Version:http://dublincore.org/usage/terms/history/#titleT-002
Has Range:http://www.w3.org/2000/01/rdf-schema#Literal

Supporting a similar level of refinement for DSpace Metadata can be supported through the addition of new  MetadataFieldRegistry properties that are capable of storing this relationship.


The following are some basic  features of the proposal:

  • Metadata fields can include additional properties for
    • Validation rules such as syntax or vocabulary encodings
    • Flag to designate the field is required.
    • Form field types for input forms
    • Type fields to designate Dublin Core or other metadata schema types, types are initially hard coded but new schema registry is extensible
    • MetadataField should be extended with methods to derive its "dc" type. In the absence of an assigned type, all fields default to dc.description typing.
  • MetadataSchema filed will be repurposed and extended to support
    • Identification of the types of DSO it may be assigned to
    • DSpaceObjects will be extended to allow them to have a specific "MetadataSchema" assigned. For example, different schema can be created for Publications, thesis, Multimedia, and so on, each having a different set of fields.
    • DSpace will be able to use the new MetadataSchema registry will replace majority of input-forms.xml file.
    • Additional table(s) will more than likely be required to designate schema that can be used in a specific collection, and thus the input forms that may be enabled in that collection.
    • Inheritance may be used in schema to reduce replication of fields. For example, a base schema with Required DSpace fields that are generate during submission, workflow, archive processes (title, issued, accessioned, available)
    • New schema may inherit from it to reduce replication of metadata fields.

In the case of DSpace

 

IDFieldrefinesencodingdefaultrequiredScope Note
15dc.date.issueddc:dateW3CDTF${now}trueDate of publication or distribution.
10dc.datedc:dateW3CDTF${now} Use qualified form if possible.
25dc.identifier.uridc:identifierURI trueUniform Resource Identifier
17dc.identifierdc:identifierLiteral  Catch-all for unambiguous identifiers not defined by qualified form; use identifier.other for a known identifier common to a local collection instead of unqualified form.
38dc.language.isodc:languageRFC5646en Current ISO standard for language of intellectual content, including country codes (e.g. "en_US").
37dc.languagedc:languageRFC5646en Catch-all for non-ISO forms of the language of the item, accommodating harvested values.
44dc.relation.haspartdc:relationURI   References physically or logically contained item.
40dc.relationdc:relationURI   Catch-all for references to other related items.
62dc.subject.meshdc:subjectURI   MEdical Subject Headings
63dc.subject.otherdc:subjectLiteral   Local controlled vocabulary; global vocabularies will receive specific qualifier.
57dc.subjectdc:subjectLiteral   Uncontrolled index term.
65dc.title.alternativedc:titleTEXT  Varying (or substitute) form of title proper appearing in item, e.g. abbreviation or translation
64dc.titledc:titleTEXT trueTitle statement/title proper.
66dc.typedc:typeClass  Nature or genre of content.

 


  • No labels