Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

A system for specifying relations in Fedora, based on OWL Lite ontologies

In the following these shorthands will refer to the following namespaces

Fedora is a store for digital resources. The exact way they are stored is not important for this discussion. What is important is that the Fedora digital resources have RDF relations to each other. I.e. the Fedora digital object repository can be modelled as an RDF graph. There are two kinds of Fedora resources that can assert relations, objects and datastreams. When the term resource is used, it refer to both types.

There is one critical difference between Fedora digital resources and a normal RDF graph: Each Fedora object contains its own local bit of the graph. You cannot change the number or nature of the relations from object A, without editing object A. Since datastreams are stored inside digital objects, you can change the datastream relations, without changing the datastream, but not without storing the object containing the datastream.

An ontology for the Fedora repository should preserve this characteristic. The ontology must be spread out over the entire repository, and you should not be able to break the ontology in one place by changing it somewhere else. In other words, if you have two separate Fedora repositories, each described by separate ontologies, and you transfer them to the same repository, the combined ontology should describe the combined sets of objects. This leads to the first property on the ontology system:

1. The ontology must not make statements that are global for the entire repository, except for the declaration of the existence of a class or property

Fedora provides a "class" of objects called Content Models. These try to represent the classes of data objects, and if specified, contain the description of the data objects, including the description of datastreams in the object. These are the natural location to place the local ontology bits. But now we reach the first problem. The Content models represent the classes of data objects, but they are also objects themselves. In order to describe such duality of existence, the language needed is OWL FULL, or something with similar expressive power. Since such expressive languages are difficult to reason about by automated systems, we chose to use a more restricted version, called OWL LITE. This imposes the second property on the ontology system.

2. The ontology only describes the digital objects. Classes, representing and related to the class resources are used, distinct from any fedora objects.

So, with this property, we separated the content model objects, from the classes they represent. The content model (ie. the content model identifier) represents the digital object. Classes are identified as "info:fedora/{contentmodelPID}#class".

An ontology with implicit rules, properties or classes, could lead to some potential problems. When part of the ontology is derived from the whole ontology, the effects of changes to the ontology can become difficult to predict. Especially the removal or introduction of a class could affect the nature of other classes. In effect, this means that someone wanting to use the ontology must know the entire ontology, in order to extrapolate anything implicit, which is in conflict with property 1. To make this explicit, the third property is introduced:

3. The ontology must be locally complete, so that every local bit provides the complete description of its local area.

Local area is a ambigious term. In effect, it refers to a class should not make restrictions on relations, without declaring these relations.

Fedora relations

Fedora does not allow for the FULL RDF specification to be used in the objects. What it basically allows is that each object can have properties relating them to other objects (called relations), and literal properties. There can be no qualifiers on the properties.

There are a number of note-worthy issues about the way Fedora works with RDF. The first is that Fedora objects do not declare a rdf:type property. Instead they use a fedora-model:hasModel property to relate to a Content model. Unfortunately, OWL LITE regards the relations as "owl:ObjectProperty"'s, and "rdf:type" as an "rdf:Property". As they are of different classes, you must use OWL FULL to define: fedora-model:hasModel rdfs:subPropertyOf rdf:type
So, in OWL LITE Content Models cannot be regarded as classes. But as this is all that prevents from using OWL LITE for the ontology, there are hackish ways around it. And thus is property 4 defined.

4. In data objects, all "fedora-model:hasModel" relations are to be regarded as "rdf:type" relations to the class represented by the content model

Defining ontologies by OWL LITE

A Fedora object consists of a number of datastreams. One datastream, RELS-EXT has been reserved for the Fedora rdf statements. We choose to reserve another datastream, ONTOLOGY, to contain the ontology definitions. The ONTOLOGY datastream is optional.

Just like Fedora only allows the "rdf:Description" tag in each object, we have chosen to similarly restrict what OWL tags can reside in a Content model. In fact, there are just two allowed elements inside the rdf:RDF tag; "owl:Class" and "owl:ObjectProperty".

Each Content model must contain one and just one "owl:Class" element, about the Content model itself, with the prefix "#Class" on the Content Model pid (to distinguish the object from the class). In this element the ordinary OWL syntax can be used to place restrictions on the Properties. The allowed restrictions are:

  • minCardinality (0-1)
  • maxCardinality (0-1)
  • cardinality (0-1)
  • someValuesFrom
  • allValuesFrom

You are not allowed to use the "rdfs:subClassOf" property to make the class a subclass of another Content model.

In order for the restrictions to be placed on a relation, it should be defined in the form of "owl:ObjectProperty". Relation can be declared in multiple Content models, but if a Content model place restrictions on a relation, it must declare the relation itself. Even through all the declaration of ObjectProperties are global for the repository, and thereby allowed for all objects in the repository, the demand is that each data object should be described by just the local Content models, ie. those it relates to through the "fedora-model:hasModel" relation.

Looking at property 1 in the context of "owl:ObjectProperty", it becomes clear that range and domain are not allowed. This is unfortunately required. Since neither OWL nor Fedora provide a way to ensure that the same relation is not defined twice, it is entirely possible for two unrelated Content models in the repository to define the same property. Each part of the repository will be valid viewed locally, but when regarding the repository as a whole, the two different definitions will be combined. Having two domains for a property mean that the source must be of both types, not either, and likewise for range, and the repository as a whole will be invalid. To prevent the risk of such errors the use of domain and range are disallowed.

5. "rdf:range" and "rdf:domain" are not allowed on any properties.

Instead of "rdf:range", one should use the "allValuesFrom" restriction. This restriction defines a range for the property, but only in the context of the given class. As such, the restriction will have no global effect. "rdf:domain" is just not nessesary. The property 3 implies that the ONTOLOGY in a Content Model should describe the local area, i.e. the objects subscribing to that content model. The result of this is that the domain, so to speak, of a property will always be the Content Model in which it was defined. But again, this restriction will have no global effect, the property defined somewhere else will have some other Content Model as its domain.

Ontologies in practice

This is an example of a simple setup, with a very simple ontology.

The content of RELS-EXT from object "demo:Object_A1"

Code Block
<rdf:RDF>
    <rdf:Description rdf:about="info:fedora/demo:Object_A1">
        <fedora-model:hasModel rdf:resource="info:fedora/demo:CM_A"/>
        <demo-relations:hasB rdf:resource="info:fedora/demo:Object_B1"/>
    </rdf:Description>
</rdf:RDF>

Since the object assert the relation "<fedora-model:hasModel rdf:resource="info:fedora/demo:CM_A"/>", it has the implicit relation <rdf:type rdf:resource="info:fedora/demo:CM_A#class"/>

The content of the ONTOLOGY datastream in content model "demo:CM_A"

Code Block
<rdf:RDF>
    <owl:Class rdf:about="info:fedora/demo:CM_A#class">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/demo-relations/#hasB"/>
                <owl:cardinality
                        rdf:datatype=
                                "http://www.w3.org/2001/XMLSchema#integer">
                    1
                </owl:cardinality>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/demo-relations/#hasB"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/demo:CM_B#class"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/demo-relations/#hasB"/>
</rdf:RDF>

The content model CM_A is defined. There is one defined relation for objects subscribing to this content model, the #hasB relation. Two restrictions are placed on this relation. There must be one, and just one such relation in subscribing objectss, and it must refer to an object of class demo:CM_B# (so, the "target" objects should subscribe to the content model demo:CM_B).

In fact, Object A1 has one such relation, and it refers to the object B1, which follows below.

RELS-EXT from "demo:Object_B1"

Code Block
<rdf:RDF>
    <rdf:Description rdf:about="info:fedora/demo:Object_B1">
        <fedora-model:hasModel rdf:resource="info:fedora/demo:CM_B"/>
    </rdf:Description>
</rdf:RDF>

So, Object B1 has the content model CM_B. That make the relation from A1 valid, see above. Lets look at the ontology from content model CM_B.

ONTOLOGY from "demo:CM_B"

Code Block
<rdf:RDF>
    <owl:Class rdf:about="info:fedora/demo:CM_B#class">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/demo-relations/#hasA"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/demo:CM_A#class"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/demo-relations/#hasA"/>
</rdf:RDF>

CM_B define on relation, the #hasA relation. There is just one restriction on this relation, that it must refer to something of class demo:CM_A#. No cardinality restriction is defined, so B1 does not need to have the relation, and in fact, it does not have it.

Ontologies for datastreams

A Fedora object consists of a number of datastreams. One datastream, RELS-INT has been reserved for the rdf statements "about" the datastreams. Datastreams do not have content models themselves. Rather, datastreams are part of fedora objects, and are thus described by the objects' content models. We expand object B1 from above

Object B1's RELS-EXT datastream. This contain the relations for the object itself.

Code Block
<rdf:RDF>
    <rdf:Description rdf:about="info:fedora/demo:Object_A1">
        <fedora-model:hasModel rdf:resource="info:fedora/demo:CM_A"/>
        <demo-relations:hasB rdf:resource="info:fedora/demo:Object_B1"/>
    </rdf:Description>
</rdf:RDF>

]

Object B1's RELS-INT datastream. This contains the relations for the datastreams in the object.

Code Block
<rdf:RDF>
    <rdf:Description rdf:about="info:fedora/demo:Object_A1/DC">
        <demo-relations:dsRelation1 rdf:resource="info:fedora/demo:Object_A1/DC"/>
        <demo-relations:dsRelation2 rdf:resource="info:fedora/demo:Object_A1"/>
    </rdf:Description>
    <rdf:Description rdf:about="info:fedora/demo:Object_A1/RELS-EXT"/>
</rdf:RDF>

]

Even through the RDF statements in the object is separated into these two datastreams, the content models have just one ONTOLOGY datastream, but with multiple class declarations. The class definition for a datastream should have the following syntax "info:fedora/{cmpid}#datastreams/{dsID}/class"

Code Block
<rdf:RDF>
    <owl:Class rdf:about="info:fedora/demo:CM_A#class">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/demo-relations/#hasA"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/demo:CM_A#class"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl:Class rdf:about="info:fedora/demo:CM_A#datastreams/DC/class">
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/demo-relations/#dsRelation1"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/demo:CM_A#datastreams/DC/class"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty
                        rdf:resource="http://www.statsbiblioteket.dk/demo-relations/#dsRelation2"/>
                <owl:allValuesFrom rdf:resource="info:fedora:/demo:CM_A#class"/>
            </owl:Restriction>
        </rdfs:subClassOf>
    </owl:Class>
    <owl:Class rdf:about="info:fedora/demo:CM_A#datastreams/RELS-EXT/class"/>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/demo-relations/#hasA"/>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/demo-relations/#dsRelation1"/>
    <owl:ObjectProperty
            rdf:about="http://www.statsbiblioteket.dk/demo-relations/#dsRelation2"/>
</rdf:RDF>

Three classes are defined, one for the object, one for the DC datastream, and one for the RELS-EXT datastream. Three object relations are also defined. The class for the object contain the restrictions defined above. The class for the RELS-EXT datastream contain no restrictions, and the class for the DC datastream contain the restrictions on #dsRelation1 and #dsRelations2.