Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 71 Next »

 

Introduction

Recommendations towards “Updating the Qualified Dublin Core registry in DSpace to the latest standards of the DCMI,” a priority identified in the October 2011 community survey on improving metadata support. It also seeks to comply with the proposal to Standardize the Default Namespace

 

Note: In addition to the child page of mappings linked below, see the grandchild pages, "Samples and decision points for mappings." and " Proposed phased schemas"

Glossary of Terms

Main goal of these recommendations

  • The ultimate goal of these recommendations is to implement fully functional DCTERMS as the default metadata schema, thus ensuring compliance with current standards endorsed by DCMI and linked data capabilities, and enabling a metadata infrastructure that supports other hierarchical and relational data structures. 
  • These recommendations also provide for intermediate steps towards this ultimate goal. 
    • By bringing the existing default 'dc' schema into compliance with the Qualified Dublin Core standard, we provide an intermediate migration step, enabling repositories to meet compliance with QDC upon upgrade, which will ease the transition to DCTERMS. (See "Possible Phases of Update," below, for further details about staging.)
    • By locking down schemas (at least at the element level), we ensure compliance with QDC and DCTERMS standards but provide tools to allow customizations not compliant with QDC/DCTERMS to persist in local schema. 

Possible Phases of Update

Ultimate goal = 'dcterms' schema as metadata registry default schema

For proposed phased schema changes see: "Proposed phased schemas"


Phase One: Update current default 'dc' schema in metadata registry to latest qualified Dublin Core, add 'dcterms' and 'local' as parallel schemas in registry

  • Update "dc" registry that ships with DSpace to current qualified Dublin Core (QDC) standard. In Phase One, name of default schema remains "dc".
    • Current "dc" elements are migrated to updated "dc" registry (where compliant with that standard) or pushed to "local" or "dspace" schema as appropriate.
  • Develop and implement flat "dcterms" schema.
  • Develop and implement "local" schema.
  • Develop and implement a DSpace admin/internal metadata schema - "dspace"
  • Ship DSpace with "dc," "dcterms," "dspace," and "local" schemas in metadata registry ("dc" remains default schema).
  • Map out relationships between these four schemas.

  • Address all areas of code affected (search/browse, import/export, crosswalks, hard-coded 'dc' elements/qualifiers, etc.). Resolve issues with features that rely on metadata solutions (i.e., CC). Consider plugins, add-ons, interfaces, etc. as well.
  • Lock down the "dc" and "dcterms" schema from the UI (at least at the element level for 'dc').
  • Provide tools for current DSpace repositories to migrate to these schemas (i.e. edit their metadata registry and data), if desirable (i.e. provide tools for migrating "dc" elements not compliant with QDC to "local" registry).

Phase Two: Plan for migration from "dc" to "dcterms" as default schema in DSpace metadata registry.

  • Create "qdc" schema (move the 'dc' that is QDC to 'qdc').
  • Current default "dc" schema is pared down to just "dc" (the fifteen elements).
  • "dcterms" is changed to default schema.
  • Map out relationships between the five schemas.
  • Address all areas of code affected (search/browse, import/export, crosswalks, etc.). Consider features, plugins, add-ons, interfaces, etc. as well.
  • Provide tools for migration from DSpace "dc" (the qualified version) to new schemas.
  • DSpace ships with "dc," "qdc," dcterms," "dspace," and "local".

Phase Three: Develop "dcterms" as fully functional default registry in DSpace, with range and domain values enabled and formally assigned.

Phase Four: Celebrate.

Outstanding issues for committers and community

  • Is it possible to ultimately implement DCTERMS with full functionality (vocabularies, etc.)? What changes to the data model will be necessary?
  • If both QDC and DCTERMS are included, which will be considered the default? Is this projected to change?
  • How will this proposal integrate with other suggested changes to DSpace metadata, including Proposal for Metadata Enhancement? How might it affect integration with Fedora? How might it affect other desired changes to metadata in DSpace, including implementing functional structured metadata such as MODS, METS, and PREMIS?
  • What challenges will this proposal present—or solve—for harvesting?
  • To enable repositories to migrate existing metadata to QDC and DCTERMS schemas, we will need to develop robust tools for repositories to deploy. One outstanding issue is the design and development of these tools.  
  • Need help on how best to map between schemas - QDC vs. DCTERMS schemas and how they will work together.
  • Should DSpace admin/internal metadata (not including DIM) have its own schema ("dspace"), or use 'local' schema?

Recommendation background

The original DCAT Discussion forum topic that lead to this proposal can be found at "Updating the Qualified Dublin Core registry in DSpace."

  • Update current default 'dc' schema in DSpace metadata registry to current qualified Dublin Core (QDC)

  • Add DCTERMS as new, parallel schema in the default metadata registry

    • Background:
      • DCMI has not updated its Qualified Dublin Core standard since 2005. The community standard has shifted towards DCMI Metadata Terms, which, unlike QDC, is not a flat schema based on the schema.element.qualifier format. DCTERMS include range and domain values. A particular term may link to another term that it refines or is refined by (for example: the dcterm "hasPart" refines "relation"; "created" refines "date").

    • Rationale:
      • DCTERMS is the currently maintained DCMI standard.
        • As Sarah Shreeves recently commented:
          "I want to strongly urge the group to look at conforming with DCMI terms (http://dublincore.org/documents/dcmi-terms/) - even if we can't conform to the vocabulary, etc, this is the most up to date and current form of the namespace. If we use the dc qualifiers document we will be perpetuating the same problem, IMO. I think we can, as Tim suggests, have a graceful path forward. I will admit that a real part of my fear of just moving to DC Qualified is that DSpace--in terms of metadata--will continue to be seen as out of touch with where much of the metadata world is headed."

        • Also, from http://dublincore.org/documents/dces/:
          "Since 1998, when these fifteen elements [dc: namespace] entered into a standardization track, notions of best practice in the Semantic Web have evolved to include the assignment of formal domains and ranges in addition to definitions in natural language. Domains and ranges specify what kind of described resources and value resources are associated with a given property. Domains and ranges express the meanings implicit in natural-language definitions in an explicit form that is usable for the automatic processing of logical inferences. When a given property is encountered, an inferencing application may use information about the domains and ranges assigned to a property in order to make inferences about the resources described thereby.Since January 2008, therefore, DCMI includes formal domains and ranges in the definitions of its properties. So as not to affect the conformance of existing implementations of "simple Dublin Core" in RDF, domains and ranges have not been specified for the fifteen properties of the dc: namespace (http://purl.org/dc/elements/1.1/). Rather, fifteen new properties with "names" identical to those of the Dublin Core Metadata Element Set Version 1.1 have been created in the dcterms: namespace (http://purl.org/dc/terms/). These fifteen new properties have been defined as subproperties of the corresponding properties of DCMES Version 1.1 and assigned domains and ranges as specified in the more comprehensive document "DCMI Metadata Terms" [DCTERMS].Implementers may freely choose to use these fifteen properties either in their legacy dc: variant (e.g., http://purl.org/dc/elements/1.1/creator) or in the dcterms: variant (e.g., http://purl.org/dc/terms/creator) depending on application requirements. The RDF schemas of the DCMI namespaces describe the subproperty relation of dcterms:creator to dc:creator for use by Semantic Web-aware applications. Over time, however, implementers are encouraged to use the semantically more precise dcterms: properties, as they more fully follow emerging notions of best practice for machine-processable metadata."
    • Examples:

      • The ultimate goal, as described below, is to implement full compliance with DCTERMS, which would involve supporting the standard's range and domain values. This goal, however, is not possible with the current DSpace data model. For now, DCTERMS could be provided as a flat schema. Unlike our proposal for the updated 'dc' (QDC) schema, the DCTERMS schema will not be an update of what currently ships with DSpace but a whole new set of properties. Some of these terms, however, are easily mapped between the existing 'dc' (QDC) schema. For example, dc.date.created maps to dcterms:created. dc.format maps to dcterms:format. dc.date.updated maps to dcterms:modified. 
      • Some of these mappings remain to be decided and finalized. For example, DCTERMS provides a controlled list of syntax and vocabulary encoding schemes. QDC and "DCMI Terms" have often designated vocabulary and syntax encoding specifications as qualifiers (e.g., dc.subject.mesh, dc.identifier.uri). If we flatted DCTERMS, do we similarly extend with qualifiers (e.g., dcterms:subject.mesh)?  
      • A preliminary mapping can be found here: https://docs.google.com/spreadsheet/ccc?key=0AgU-htsSmo31dEtaM1M1Q2E1NlRxNG11ZHFrSkMxNFE#gid=0

  • Lockdown schemas offering migratory tools to pull out local customizations and push into new local schema. Make it possible but not easy to delete or edit elements in 'dc' (QDC) and DCTERMS schemas. Continue to enable the addition of qualifiers in the 'dc' (QDC) schema.
  • For staging purposes, we recommend that DSpace ship with 4 registries, to support ultimate migration to DCTERMS, allow for the continuing use of QDC, and standardize namespaces by pushing local customizations not compliant with QDC or DCTERMS into a local schema.: 
    • 1) 'dc' (QDC) - which will be an update of the current default 'dc' schema, and will be set as the default metadata schema 
    • 2) 'dcterms' (DCTERMS) - which will be an optional metadata schema, ultimate goal of replacing 'dc' (QDC) at some point in the future
    • 3) 'dspace' schema for system/admin metadata
    • 4) Local schema - which would ship with some elements migrated out of 'dc' because not compliant with QDC, and enabled for the purpose of local customizations

Relevant JIRA tickets

(please add any JIRA tickets that could be affected by this proposal!)

Would be RESOLVED

DS-433: Update DublinCore Registry to Implement latest DC Standards

DS-805: QDC schema registry needs to be brought into conformity with the current DCMI standards

 

RELATED/Would be AFFECTED

DS-125: Date type can't be repeatable in the submission

DS-202: Metadata Generator Plugin

  • Not sure whether this is related. But I assume if DCMI requires some properties to be unique (for example identifiers), I guess you would need a generator to ensure unique identifiers get generated.  

DS-716: Add an administrative metadata schema to DSpace

DS-800: Manage visibility of metadata fields as field attribute rather than in dspace.cfg 

DS-815: DCDate throws NullPointerException with mangled dates

DS-1134: Multilingual metadata for communities/collections

DS-1420: Exception handling for deleting a metadata field

Areas/processes that will be affected by registry update

What areas and processes will be affected by these shifts? Is there any documentation of what features in DSpace are making use of certain fields? Where will the code be affected? Where are metadata elements hardcoded?

(pulled from September 4, 2012, DCAT discussion)

  • Any processes that create new metadata in DSpace:
    • submission forms
    • spreadsheet importer
    • command line import
    • SWORD
    • built-in OAI Harvester
  • Any process that displays metadata in the web used interface:
    • item pages
    • search, browse, DSpace discovery
  • Any process that delivers the metadata (potentially via crosswalks) to other applications:
    • OAI server
    • REST API

 

 

  • No labels