You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 36 Next »

Project Overview

 

Collection Description

UNSWorks

  • The institutional repository – UNSWorks – contains more than 12,000 objects. These include research publications such as digital theses and conference papers. The UNSWorks Live Fedora includes some metadata-only records as well as objects with file attachments. There is also an Interim Fedora that is used to house publications and metadata (including information about grants) requiring review or processing prior to ingestion to the UNSWorks Live Fedora. The publication metadata is sourced from the Research Outputs System (ROS) and details about UNSW people and grants is obtained from other UNSW enterprise systems via the data warehouse. The Interim Fedora currently contains about 500,000 records.

ResData

  • A research data management system containing over 250 records. The records describe datasets and research data management plans plus related parties (i.e. people) and activities (i.e. grants and projects). Information about people, grants and projects is sourced from other institutional databases via the data warehouse.

Other UNSW disciplinary repositories

  • Approximately 25,000 records are stored across 5 other specialist disciplinary repositories. While most are metadata-only records, there is also some managed content such as video files. 

Fedora 3 Details

Object Models

UNSWorks

Resource

  • DC

    • Type: Inline XML

    • Mime Type: text/xml

    • Versionable

  • MODS = descriptive metadata
    • Type: Inline XML
    • Mime Type: text/xml
    • Versionable
  • RELS-EXT
    • Type:Inline XML
    • Mime Type: application/rdf+xml
    • Versionable
    • Contains additional informatiion of the object such as persistent identifier (handle)
  • RELS-INT
    • Type: Inline XML
    • Mime Type: application/rdf+xml
    • Versionable
    • Contains additional information about the datastreams, such as type of resource and relation.
  • DP-EVENT = PREMIS preservation metadata
    • Type: Inline XML
    • Mime Type: application/rdf+xml
    • Versionable
  • SOURCE
    • Type: Managed
    • Mime Type: any
    • Versionable
  • PM = preservation metadata about individual datastream (eg: SOURCE01 would have PM-SOURCE01)
    • Type: Inline XML
    • Mime Type: application/rdf+xml
    • Versionable 

ResData

Dataset, Activity (grants/projects), and Party (people) object 

  • DC

    • Type: Inline XML

    • Mime Type: text/xml

    • Versionable

  • RELS-EXT
    • Type:Inline XML
    • Mime Type: application/rdf+xml
    • Versionable
    • Contains additional informatiion of the object such as persistent identifier (handle/doi) and resource type
  • RELS-INT
    • Type: Inline XML
    • Mime Type: application/rdf+xml
    • Versionable
    • Contains additional information about the datastreams, such as type of resource, relation, version, and publishing status
  • RDF = descriptive metadata plus links to related parties and activities for published object
    • Type: Inline XML

    • Mime Type: text/xml

    • Versionable

  • RDFNP = descriptive metadata plus links to related parties and activities for unpublished object
    • Type: Inline XML

    • Mime Type: text/xml

    • Not Versionable

Research Data Management Plan object

  • DC

    • Type: Inline XML

    • Mime Type: text/xml

    • Versionable

  • RELS-EXT
    • Type:Inline XML
    • Mime Type: application/rdf+xml
    • Versionable
    • Contains additional informatiion of the object such as persistent identifier (handle/doi) and resource type
  • RDFNP = descriptive metadata plus links to related parties and activities for unpublished object
    • Type: Inline XML

    • Mime Type: text/xml

    • Not Versionable

Notes: Record status includes draft, pending, published. Only dataset, activity and party objects can be published (not research data management plans). Published records are versionable = True. Different pid format based on object type (e.g. sample activity object pid = resdataa:2222; sample dataset object pid = resdatac:3333).

Functionality

Storage: Legacy storage (or Akubra)

UNSWorks uses Legacy storage and ResData uses Akubra.

XML metadata : datastreams

See object models above.

XML metadata : inline

See object models above.

Content models

Default Fedora Content Model.

Datastream types (inline, managed, redirect, and external)

Non metadata datastream is a managed datastream.

Identifiers

UNSW uses custom namespaces for PIDs. Some repositories use multiple PID prefixes. All UNSW repositories use handles as persistent identifiers for objects. The ResData repository also uses DOIs for some objects. 

Indexing strategies (GSearch, RI-Search vs. F4 approaches)

UNSW uses the Generic Search Service (GSearch) and Resource Index (RISearch) Search. 

Replication/Journaling

UNSW does not use replication or journaling. 

Security policies: XACML

Default XACML with minor modification for accessing rights metadata on UNSWorks.

OAI-PMH

UNSW does not use the Fedora OAI-PMH module. UNSW uses the Fedora 3 API to export XML metadata and jOAI as the OAI-PMH data provider.

Versions

All datastreams are mostly versionable.

Disseminators

UNSW does not use disseminators. 

Audit history

UNSW uses audit history for statistic, preservation, and versioning. 

API

Most use Fedora 3 API (REST and SOAP):

API_A 

  • findObjects

  • getDatastreamDissemination

  • listDatastreams

API_M

  • Datastream Management
    • addDatastream

    • getDatastreams

    • getDatastreamHistory

    • getDatastream

    • modifyDatastreamByValue

    • modifyDatastreamByReference

    • setDatastreamState

    • setDatastreamVersionable

    • purgeDatastream

  • Object Management

    • modifyObject
    • purgeObject
    • getNextPID
    • ingest

Fedora 4 Details

Fedora 3 to 4 data model mapping

This section outlines how the Fedora 3 objects associated with the UNSW repositories are conceptually mapped to Fedora 4 nodes.

Mapping Fedora 3 Object  Properties to Fedora 4:

 

Fedora 3

Fedora 4

Example

Note

PID

PID

dc:identifier

resdatac:1

Legacy Fedora 3 identifier

State

state

access:objState

Active

Using solution as described on Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Label

label

dc:title

Record title

 

Creation Date

CREATED

premis:hasDateCreatedByApplication

2014-01-20T04:34:26.331Z

premis:hasDateCreatedByApplication is used because fedora:created is not user-modifiable.

Last Modified Date

lastModifiedDate

fedora:lastModified

2014-01-20T05:39:08.601Z

Date of migration is to be treated as  a “modification”.

Owner Identifier

ownerId

ms21:owner

z2212222

The creator of the object

Mapping Fedora 3 Datastream Properties to Fedora 4:

 

Fedora 3

Fedora 4

Example

Note

DSID

ID

identifier or dc:identifier

MODS

This is the legacy Fedora 3 datastream identifier

State

state

access:objState

Active

Using solution as described on Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Control Group

CONTROL_GROUP

N/A

X

Migration is deemed unnecessary

Versionable

VERSIONABLE

fedora:hasVersions

true

The “VERSIONABLE” property of Fedora 3 is not semantically equivalent to Fedora 4’s hasVersions data property. The mapping proposed is intended to enable migration of Fedora 3 data but will no not be used after migration.

Label

LABEL

dc:title

MODS Metadata

 

Creation Date

CREATED

premis:hasDateCreatedByApplication

2014-01-20T04:34:26.331Z

Intended to enable migration of Fedora 3 creation dates. premis:hasDateCreatedByApplication is used because fedora:created is not user-modifiable.

Last Modified Date

N/A

fedora:lastModified

2014-01-20T05:39:08.601Z

Fedora 3 uses “Creation date” for last modified date for datastream.

Mime Type

MIMETYPE

fedora:mimeType

text/xml

 

Size

SIZE

premis:hasSize

50000

Automatically handled by Fedora 4

Alternate ID

AltIds

premis:hasOriginalName

sample_file.pdf

Automatically handled by Fedora 4

Checksum Type

checksumType

MD5

SHA1

Fedora 4 combines checksum type and checksum in one field on fedora:digest property

Checksum

checksum

fedora:digest

Fedora 3 example:

b4df41775c142aa18518d6586a8193c8e0b7dc96

 

Fedora 4 example:

urn:sha1:b4df41775c142aa18518d6586a8193c8e0b7dc96

 

Automatically added by Fedora 4

Format URI

formatURI

N/A

N/A

This field is not used


Note: all data and object properties under the official Fedora 4 namespace cannot be modified via Fedora 4 REST API.

 

Fedora 4 Namespaces

Fedora 4 data model for ResData

Figure 1 below presents a top level view of the Fedora 4 data model for ResData. 

Figure 1: Fedora 4 data model for ResData

Classes

The ResData Fedora 4 data model is an adaptation of the PCDM model, integrated with a customised version of ANDS VITRO ontology.  The resultant ontology consists mainly of the following classes:

 

Activities, Datasets, Parties (pcdm:Collection)

Activities, Datasets, and Parties are Fedora 4 container nodes of pcdm:Collection type, mainly intended to enable grouping of the three main ResData resource types, i.e. Activity, Dataset and Party. Fedora 4 URI structures for these pcdm:Collection containers are listed below:

Container name

URL

Activities

/rest/activities

Datasets

/rest/datasets

Parties

/rest/parties


Dataset (VITRO-ANDS:ResearchData, pcdm:Object)

The ResearchData class from the ANDS VITRO ontology is used to define the Dataset resource type in ResData.  In the Fedora 4 model for ResData, all instances of the ResearchData class are also defined as nodes of pcdm:Object type with a number of data properties containing descriptive metadata, and object properties containing reference to other related ResData resources, such as Activity (vivo:ResearchActivity), Party (foaf:Person) and other Dataset resources. Figure 2 bellow illustrates the combined use of pcdm:Object and VITRO-ANDS:ResearchData classes to represent various ResData resource types.

 

Figure 2: ResData Dataset resource defined as pcdm:Object


Fedora 4 URI structures for ResData Dataset-related nodes are as below:

Description

URL

Dataset

/rest/datasets/[dataset pairtree id]

Access

/rest/datasets/[dataset pairtree id]/access

Licence

/rest/datasets/[dataset pairtree id]/licence

Methodology

/rest/datasets/[dataset pairtree id]/methodology

Time Period

/rest/datasets/[dataset pairtree id]/timePeriod

Retention Period

/rest/datasets/[dataset pairtree id]/retentionPeriod

Subject

/rest/datasets/[dataset pairtree id]/subject

Publication

/rest/datasets/[dataset pairtree id]/publication

GEO

/rest/datasets/[dataset pairtree id]/geo

Rights

/rest/datasets/[dataset pairtree id]/rights

Storage

/rest/datasets/[dataset pairtree id]/storage


ms21:PartyRelation

PartyRelation is a custom class for describing a user-specified relation between a Party and a Dataset. Instances of PartyRelation in the ResData Fedora 4 model are also defined as pcdm:Object type nodes.

Fedora 4 URI structures for the PartyRelation nodes are:

Description

URL

Dataset

/rest/datasets/[dataset pairtree id]

PartyRelation

/rest/datasets/[dataset pairtree id]/partyRelation1
/rest/datasets/[dataset pairtree id]/partyRelation2
/rest/datasets/[dataset pairtree id]/partyRelation3


ms21:ResourceRelation

ResourceRelation is a custom class for describing user-defined relationships between Dataset resources. Instances of ResourceRelation in the ResData Fedora 4 model are also defined as pcdm:Object type nodes.

Fedora 4 URI structures for the ResourceRelation nodes are:

Description

URL

Dataset

/rest/datasets/[dataset pairtree id]

ResourceRelation

/rest/datasets/[dataset pairtree id]/resourceRelation1
/rest/datasets/[dataset pairtree id]/resourceRelation2
/rest/datasets/[dataset pairtree id]/resourceRelation3

 

Activity (vivo:ResearchActivity, pcdm:Object)

The ResearchActivity class from the VIVO ontology is used to define Activity type resources in ResData.  In the Fedora 4 model for ResData, all instances of the ResearchActivity class are also defined as nodes of pcdm:Object type with a number of data properties containing descriptive metadata and object properties containing reference to additional information about a research project, including funding body and affiliation. Figure 3 bellow illustrates how pcdm:Object and vivo:ResearchActivity classes are combined to represent Activity-type resources in ResData Fedora 4 model.

Figure 3: Activity-type resources in Fedora 4 model for ResData

Fedora 4 URI patterns for ResData Activity-type resources are:

Description

URL

Activity

/rest/activities/[activity pairtree id]

Funding

/rest/activities/[activity pairtree id]/funding

Organisation

/rest/activities/[activity pairtree id]/organisation


Party (foaf:Person, pcdm:Object)

Similar to Dataset and Activity, all Party-type resources are defined as instances of both the Person class from the FOAF ontology and the pcdm:Object class (Figure 4).

 

Figure 4: ResData Party defined as pcdm:Object

Fedora 4 URI patterns for ResData Party-type resources:

Description

URL

Activity

/rest/parties/[party pairtree id]

Funding

/rest/parties/[party pairtree id]/organisation

 

Namespaces

 

UNSWorks Data Model

Figure 5: UNSWorks Data Model

 

Note: All classes are derived from existing classes used on Fedora 3 used in RELS-INT and RELS-EXT

Classes

unsworksp:collection
Collection is a class describing a group of records. Aside from descriptive metadata, it contains administrative metadata containing access information to the records belonging to the collection.

 

Property

Note

unsworksp:hasCollection

 

 

unsworksp:record
A record class individual represents an intellectual entity such as a thesis, a book, moving image, etc. It has descriptive metadata in Dublin Core and administrative metadata. it can have a link to other individual such as metadata, rights, and resource.

Property

Note

unsworksp:hasMetadata

 

unsworksp:hasRights

 

unsworksp:hasResource

 

 

unsworksp:resource
A resource class individual represents the electronic resource of the record such as a PDF file of a thesis. It is stored as binary data and it can link to another resource describing the record has another binary data in another format type for preservation purpose. For example: a thesis record has binary file in word document and there is another binary file in PDF format which is converted from the word document.

Property

Note

unsworksp:migratedFrom

 

 

unsworksp:metadata
Metadata class is a class describing a metadata of a record. It is used to represent other record metadata not in Dublin Core format which will be stored as binary data. Similar to resource, it can link to same type another metadata for preservation purpose

Property

Note

unsworksp:migratedFrom

 

 

unsworksp:rights

Rights class individual represent a licence or agreements that author of the electronic resource has signed. Similar to resource, it can link to same type another metadata for preservation purpose

Property

Note

unsworksp:migratedFrom

 

 

Descriptive and Administrative Metadata

Similar to ResData, UNSWorks also uses RELS-INT and RELS-EXT to describe additional information on the Fedora 3 object and datastream for storing administrative information and searching purpose. For example doi and handle.

In Fedora 4, the RELS-INT and RELS-EXT is mapped as resource property of the resource as a administrative metadata.

Below is the RELS-INT and RELS-EXT information that will be ported to Fedora 4 as part of Resource property:

Property

Note

unsworksp:resourceType

 

unsworksp:dunsworkspid

 

unsworks:embargodate

 

unsworks:embargoRemoved

 

owl:SameAs Alternate URL

 

For descriptive metadata, the format for each of Fedora 4 resource is a Dublin Core metadata format.

 

Namespace

 

Sample URL structure on Fedora 4

Based on the model above, each resource can be added on the root using Fedora 4 default ingest using PairTree. The binary file of that particular resource will be added with the resource node as the parent using PairTree.


For example:

 

Functionality

Storage: Legacy storage (or Akubra)

Fedora 4 REST API will be used to Fedora 3 to Fedora 4. There are no issues related to the storage type for migration. The only difference is that container node is stored in database. On Fedora 3, object and datastream are stored in file structure.

XML metadata : datastreams

Where possible, metadata will be stored as properties of the relevant node. Metadata in other formats such as XML (e.g. MODS), will be stored as a binary file (pcdm:File).

XML metadata : inline

The inline XML metadata is a descriptive metadata of the resource. It is mapped as property of Fedora 4 container node (pdcm:Object).

See Data Model above for more information.

Content models

The default Fedora content models have not been modified.

Datastream types (inline, managed, redirect, and external)

In Fedora 3, the UNSWorks and ResData repositories only uses inline and managed datastreams. Inline datastreams is used for descriptive metadata such as DC, MODS, and MARCXML . DC metadata can be mapped to properties of Fedora 4 container node, others will be stored as binary file as Fedora 4 binary node. Similarly for managed datastreams, all will be stored as Fedora binary node (pdcm:File). See the UNSWorks and ResData Data Models for more information.

Identifiers

The PairTree algorithm is the default method for generating identifiers in Fedora 4 default. This method will be used for the migration and a new object to address the performance issue about limiting the number of children under a single resource (Performance). As for the legacy PID, it will be stored as a property of the node as mentioned above. For example, refer to the URL structures on Data Model section.

Indexing strategies (GSearch, RI-Search vs. F4 approaches)

Integrate Fedora 4 with external triple store using JMS Message Consumer to accommodate search with SPARQL.

For installation, refer to:
https://wiki.duraspace.org/display/FEDORA41/External+Triplestore

Replication/Journaling

N/A

Security policies: XACML

Security policies will be initially handled by the client applications. WebACL and the Fedora 4 Access Roles module will be explored further in future.

OAI-PMH

Fedora 4 OAI-PMH Provider will be used. Refer to the information on this link for installation:
https://wiki.duraspace.org/display/FEDORA41/Setup+OAI-PMH+Provider
Further testing will be done to test for OAI-PMH status.

Versions

Fedora 4 versioning will be used to store Fedora 3 versions. This will be included on the migration script later.

Disseminators

N/A

Audit history

For migration purposes, the legacy Fedora 3 FOXML will be stored as fedora:Binary (pcdm:File) in Fedora 4. The Fedora 4 Audit module will be used to manage the audit history after further testing.

API

Fedora 4 REST API will be used to replaced Fedora 3 SOAP and REST API

 

 

 

  • No labels