Page History
...
This proposal is based on the premis that changes to DSpace metadata characteristics must be backward comparable and retain the same functionality as previously existed to ease transitional for all existing users of the platform. So many different functional areas of DSpace are reliant on existing metadata functionality, that it is criticial that any changes in functionality also have well defined and scripted updates across releases.
The following are some basic features of the proposal:
- Metadata fields can include additional properties for
- Validation rules such as syntax or vocabulary encodings
- Flag to designate the field is required.
- Form field types for input forms
- Type fields to designate Dublin Core or other metadata schema types, types are initially hard coded but new schema registry is extensible
- MetadataField should be extended with methods to derive its "dc" type. In the absence of an assigned type, all fields default to dc.description typing.
- MetadataSchema filed will be repurposed and extended to support
- Identification of the types of DSO it may be assigned to
- DSpaceObjects will be extended to allow them to have a specific "MetadataSchema" assigned. For example, different schema can be created for Publications, thesis, Multimedia, and so on, each having a different set of fields.
- DSpace will be able to use the new MetadataSchema registry will replace majority of input-forms.xml file.
- Additional table(s) will more than likely be required to designate schema that can be used in a specific collection, and thus the input forms that may be enabled in that collection.
- Inheritance may be used in schema to reduce replication of fields. For example, a base schema with Required DSpace fields that are generate during submission, workflow, archive processes (title, issued, accessioned, available)
- New schema may inherit from it to reduce replication of metadata fields.
Primary Objective
The primary objective of this proposal is that the DSpace metadata registry be "naturally" extended to support a richer and more expressive "Metadata Schema". Technical Objectives of the rpoposal are to provide the following features:
...
Supporting a similar level of refinement for DSpace Metadata can be supported through the addition of new MetadataFieldRegistry properties that are capable of storing this relationship.
The following are some basic features of the proposal:
- Metadata fields can include additional properties for
- Validation rules such as syntax or vocabulary encodings
- Flag to designate the field is required.
- Form field types for input forms
- Type fields to designate Dublin Core or other metadata schema types, types are initially hard coded but new schema registry is extensible
- MetadataField should be extended with methods to derive its "dc" type. In the absence of an assigned type, all fields default to dc.description typing.
- MetadataSchema filed will be repurposed and extended to support
- Identification of the types of DSO it may be assigned to
- DSpaceObjects will be extended to allow them to have a specific "MetadataSchema" assigned. For example, different schema can be created for Publications, thesis, Multimedia, and so on, each having a different set of fields.
- DSpace will be able to use the new MetadataSchema registry will replace majority of input-forms.xml file.
- Additional table(s) will more than likely be required to designate schema that can be used in a specific collection, and thus the input forms that may be enabled in that collection.
- Inheritance may be used in schema to reduce replication of fields. For example, a base schema with Required DSpace fields that are generate during submission, workflow, archive processes (title, issued, accessioned, available)
- New schema may inherit from it to reduce replication of metadata fields.
Metadata Registry
In the above example, we've changed the default "dc" prefix to explicitly apply to all "Items" in the repository.
Below we exemplify that new schema for Items can be created and assigned for use in individual collections or the entire repository.
ID | Namespace | Name | Applies To | Allowed In | |
---|---|---|---|---|---|
1 | http://mydspace/schema/1 | item | Item | All Collections | |
2 | http://mydspace/schema/2 | item2 | Item | Collection A, Collection B | |
3 | http://purl.org/dc/terms | dcterms | All | -- | |
4 | http://purl.org/dc/elements/1.1/ | dc | All | -- |
Metadata Schema: "item"
The following exemplifies how the view over the DSpace Metadata Field Registry would change after these adjustments:
...
ID | Field | refines | encoding | default | required | Scope Note |
---|---|---|---|---|---|---|
15 |
item.date.issued |
dcterms:date | W3CDTF | ${now} | true | Date of publication or distribution. |
10 |
dcterms:date | W3CDTF | ${now} | Use qualified form if possible. | |
25 |
dcterms:identifier | URI | true | Uniform Resource Identifier | |
17 |
dcterms:identifier | Literal | Catch-all for unambiguous identifiers not defined by qualified form; use identifier.other for a known identifier common to a local collection instead of unqualified form. | ||
38 |
dcterms:language | RFC5646 | en | Current ISO standard for language of intellectual content, including country codes (e.g. "en_US"). | |
37 |
dcterms:language | RFC5646 | en | Catch-all for non-ISO forms of the language of the item, accommodating harvested values. | |
44 |
dcterms:relation | URI | References physically or logically contained item. | ||
40 |
dcterms:relation | URI | Catch-all for references to other related items. | ||
62 |
dcterms:subject | URI | MEdical Subject Headings | ||
63 |
dcterms:subject | Literal | Local controlled vocabulary; global vocabularies will receive specific qualifier. | ||
57 |
dcterms:subject | Literal | Uncontrolled index term. | ||
65 |
Literal | Varying (or substitute) form of title proper appearing in item, e.g. abbreviation or translation | |||||
64 | item.title | dcterms:title | Literal | true | Title statement/title proper. | |
66 | item.type | dcterms:type | Class | Nature or genre of content. | ||
... | ... | ... | ... | ... | ... | ... |
Metadata Schema: "item2"
The second Item schema types would be expressed as follows:
ID | Field | refines | encoding | default | required | Scope Note |
---|---|---|---|---|---|---|
15 | item2.date.issued | dcterms:date | W3CDTF | ${now} | true | Date of publication or distribution. |
25 | item2.identifier.uri | dcterms:identifier | URI | true | Uniform Resource Identifier | |
37 | item2.language | dcterms:language | RFC5646 | en | Catch-all for non-ISO forms of the language of the item, accommodating harvested values. | |
62 | item2.subject.mesh | dcterms:subject | URI | MEdical Subject Headings | ||
64 | item2.title | dcterms:title | Literal | true | Title statement/title proper. | |
66 | item2.type | dcterms:type | Class | Nature or genre of content. | ||
... | ... | ... | ... | ... | ... | ... |
Metadata Schema: "dcterms"
where "dcterms:xxx" refinements point to a new Schema in the repository that contains the fields required for the typical dcterms namespace. In the current case, with the "item" and "item2" schema, this schema is not applied directly to Items, but inherited into defined "item" fields through "refinement".
ID | Field | refines | encoding | default | required | Scope Note | ||
---|---|---|---|---|---|---|---|---|
15 | dcterms.date | rdf:Property | W3CDTF | ${now} | true | Date of publication or distribution. | ||
25 | dcterms.identifier | rdf:Property | URI | true | Uniform Resource Identifier | |||
37 | dcterms.language | rdf:Property | RFC5646 | en | Catch-all for non-ISO forms of the language of the item, accommodating harvested values. | |||
40 | dc.titleterms.relation | rdf:Property | URI | Catch-all for references to other related items. | ||||
57 | dcterms.subject | rdf:Property | Literal | Uncontrolled index term. | ||||
64 | dcterms.title | rdf:Property | Literal | dc:title | TEXT | true | Title statement/title proper. | |
66 | dcterms.type | dcrdf:typeProperty | Class | Nature or genre of content. | ||||
... | ... | ... | ... | ... | ... | ... |
Data Model Changes to Support This Proposal
To support this prpoposal, only additional fields and relational tables will be required to be added to the existing DSpace schema.
Schema2dso:
This table will be utilized to directly map any specified schema as a validation target for any existing DSpace Item. One, or more than one Schema assignment will be allow, creating a situation where an Item may be polymorphic and support more than one type.
Schema2container:
This table will support the identification of which schema should be applied to new Items being created in any Collection within DSpace. This will be extended when support for metadata at all levels of DSpace is introduced, allowing assignment of Collection and Community "Types" to Community containers and likewise, support for Specific Bitstream types to be allowed in Item Containers.
A tentative list of new fields and tables is exemplified int he class diagram below.
The above solution can be easily encoded into the database schema, while the existing MetadataSchema, MetadataField and MetadataValue objects should be easy extendable to support new methods and business logic.