This wiki space is deprecated

This wiki space is now deprecated. The wiki for Samvera (formerly the Hydra Project) is now to be found here: Samvera

Skip to end of metadata
Go to start of metadata

 

 

Page versions

2014-winter This page is now substantially behind the times but it contains some useful information. Fedora 4 and PCDM negate a lot of this.

2012-fall:  Major revision:  If you need the 2011 version of the page that this revision will replace, go to 'Tools > Page history' and choose version 64.

2011:  This is a heavily revised version of our original page. The changes are two-fold in purpose: first, a long overdue clean-up and second a restructuring to answer explicitly some of the questions the team was asked at Open Repositories 2011. If you need the 'old' version of the page go to 'Tools > Page history' and choose version 59.

Introduction

The Hydra team has said from the word "go" that Hydra will use Fedora's Content Model Architecture (CMA) and to that extent would need a set of defined content models. In the light of production experience, some institutions have managed without explicit content models in Fedora and, rather, have relied on the Active Fedora model expressed in their Ruby code (hereafter, the "Ruby model").  Both these approaches are discussed below.  In either case it was, and remains, our intention that these models be shared with the community.  

The content here represents a major re-write of the existing page (September/October 2012), made in the light of extensive, actual production experience.

Don't call it a content model

Already we have used the term 'content model' twice in this introduction referring here explicitly to Fedora cModels. There are a number of other ways in which the term could be used legitimately in a Hydra/Fedora context referring to quite different concepts and we have tried to disambiguate these. It is a detailed analysis which can be found here on its own page. It is really important that you read this page first if you are not yet fully comfortable with the ideas being discussed here.

Hydra-compliant? What does it mean?

We, the Hydra partners, tend to use the term 'Hydra-compliant' a lot when talking about digital objects. But what does it mean? Lots of people ask us that - so here's an answer right at the top of the page. It's just the headlines and doesn't cover all the possibilities - for the detail you need to keep reading down to the bottom!

Accepting that digital objects in a Fedora repository must have a DC datastream... As things have turned out the bare, bare minimum to even come near compliance is (1) a rightsMetadata datastream (or its equivalent in an Admin Policy Object) conforming to our schema, and (2) a RELS-EXT datastream declaring one or more appropriate cModels for your object to subscribe to. The first is needed to enable any form of access or delivery of the object to users, and the second to provide the hooks for the underlying Ruby models. Beyond that, content-bearing objects require content (!) and hopefully some descriptive metadata.

The Ruby models and cModels described below are basic building blocks. It is possible to use them 'as is' but it is perfectly acceptable (nay, encouraged) to write variations on them and/or additional models to better deal with your use own cases. This will, of course, require you to adapt or add to the underlying Ruby code.

Our rightsMetadata schema is deliberately very, very simple yet - we hope - flexible. In this form it is straightforward to index into Hydra's Solr index so that we can provide gated discovery. Work is afoot to make this interoperate with Fedora's FESL security so that the underlying Fedora repository can be accessed directly by non-Hydra systems and yet respect the security settings in rightsMetadata.  (This work saw a proof-of-concept demonstration at the September 2012 Hydra Partners' meeting.)

So why care about compliance? Well firstly, by sticking to the 'rules', more importantly maybe sticking to our approach, you can be assured of community support to assist your development process. We will all know in general terms what it is you are trying to achieve and may be able to offer advice based on the way we dealt with similar problems. Secondly, we hope that by conforming to standard patterns it will be possible for you to maybe adopt further developments with a minimum of pain - these may be additional Hydra heads or applications outside our framework but which assume 'Hydra-compliant' objects.

General principles for constructing Hydra objects

Hydra generally favours complex (atomistic) objects over compound (multi- content datastream) objects unless the content in all datastreams is identical but for, say, MIME-type or screen resolution or else where there is a requirement only for one content datastream (a special case of compound, sometimes referred to as a "simple" object). This has implications for most object classes: for instance, we take the view that because some ETDs (electronic theses and dissertations) may necessarily be complex (more than one datastream each having different content, for example a pdf + a multimedia file) then all ETD objects should be complex; a single datastream ETD is just a special case - an aggregation (parent) object with a single child.

As noted above, Hydra-compliant objects held in Fedora will be expected (by Fedora) to have a DC and a RELS-EXT datastream and (by Hydra) an enforceable rights statement, either in a rightsMetadata datastream (currently the most common pattern) and/or an Admin Policy Object (APO) by which it governed.  In addition there will be more datastreams depending on the content type and purpose of the object.  These structures are expressed and managed within the code using Ruby models.  In our original plans (2008) we had envisaged that these structures would also be expressed in actual Fedora cModel objects in order that Hydra heads could use Fedora disseminators.  However, many implementations have found that their use case(s) have no need for disseminators and so institutions have not created the actual cModel object or its attendant sDef or sDep.  That said, these implementations generally declare the non-existent cModel in an object's RELS-EXT where the declaration statement is a useful hook that the UI can use to provide a view appropriate to the content type.  

Where Fedora dissemination is used, the following applies:

Hydra-provided cModels, service definitions and service deployments will use Hydra's own namespace to distinguish them from other content that users may have, thus Hydra-provided object PIDs will begin:

  • hydra-cModel:
  • hydra-sDef: or
  • hydra-sDep:

We come back to disseminators at the bottom of the page.

Important note

It is not perhaps widely enough known that the size of Fedora's (FOXML) objects can have a significant effect on server performance. Hydra strongly recommends that all metadata datastreams other than DC, RELS-EXT, and rightsMetadata should be of type 'managed' and not of type 'inline XML'. These three datastreams are singled out for inline use so that if a Fedora object is found somehow disassociated from its normal context it will contain in the core FOXML some essential information about itself.

Who stuck to the rules?

You know what they say, "rules are meant to be broken"? In producing production systems the Hydra partners may have interpreted and extended the rules and guidelines that follow. This page is already quite long enough with only scant reference to some of those adaptations. We are encouraging Hydra partners to document their implementations, adaptations and extensions here on the wiki. The parent page can be found here.

Example models for download

Hydra cModel/sDef/sDep objects are available from our github site at https://github.com/projecthydra.  

Example Ruby models can be found on the Active Fedora github pages.

Specific datastreams

Compulsory datastreams and equivalents

Rights metadata

All Hydra-compliant objects must express something about applicable rights.  Who can view their content?  Who can edit them?  ...and so on.  Hydra's original design required that each object should contain a datastream called "rightsMetadata" which expressed these matters in a simple XML structure.  This is still the way that the majority of production systems work.  The Hydra team developed a very simple schema for rightsMetadata which can be found here.  This rights metadata is easily indexed by Hydra and allows us to provide appropriate security around content and to provide gated discovery (users searching content through a Hydra head will only be aware of content that they would ultimately be allowed to download).

Newer Hydra implementations have developed the idea of an Admin Policy Object (APO) which can govern a set of objects.  (Individual objects express their adherence to an APO using an isGovernedBy statement in their REL-EXT datastream.)  The APO contains a statement of the rightsMetadata that each adhering object will conform to.  On the one hand, using an APO has the benefit that any change of applicable rights may need only to be done in one place but, on the other, there is no explicit statement of rights in each object.  APOs and object-level rightsMetadata datastreams can be combined: an APO could lay down a set of 'grant' statements which are then extended at the object level by further 'grant' statements.

APOs are dealt with more fully elsewhere.

Rights metadata schema

The schema can be found here.

DC metadata

Fedora requires that all its objects have a DC metadata datastream.  In the design of Fedora this was intended to be for administrative use only in order to facilitate searching for objects in a repository from the admin interface.  It was not intended to be used for general descriptive metadata (although in the "real world" it is widely used that way).  Hydra uses it in the way Fedora intended, as an admin tool, and generally partners keep a very minimal set of information there, perhaps just

  • dc:title  and
  • dc:identifier

both required by Fedora, and maybe

  • dc:creator  and
  • dc:date

An object's "real" descriptive metadata is kept in a different datastream usually called "descMetadata".

RELS-EXT

The RELS-EXT datastream is provided by Fedora primarily to record external relationships for an object.  Hydra uses is for this purpose and can express a number of things there in addition to the standard Fedora entries.

As noted in "Hydra-compliant" above, we use the relationship "hasModel" to indicate the content type of our objects.

We use the relationship "isGovernedBy" to associate an object with an APO (see "Rights metadata" above).

We use the relationship "isPartOf" to associate child objects with their parent when using atomistic objects.

We use the relationship "isMemberOf" when defining management structures within a repository (not everyone does this).

We use the relationship "isDependentOf" when defining intellectual arrangements within a repository (again, not everyone does this).

For more information about sets see the section "Sets, sets or sets" below.

Optional metadata datastreams

Descriptive metadata

In our original specification this was a compulsory datastream for descriptive metadata; as we moved to production we realised that there were classes of object that did not need it and so now it is optional. Clearly it is necessary in objects that can display a splash page.

The Hydra founding partners have each used a locally chosen subset of MODS here, appropriate to their use cases. There is no reason why other metadata schemas should not be used. Hydra users are known to be working with a range of other metadata formats including Dublin Core (DC), EAD and PBCore It is possible to convert between formats; thus, for instance, one Hydra partner keeps all descriptive metadata as MODS but can create a DC representation or a UKETD_DC one from it on the fly.

Content metadata

Whilst optional, this datastream might usefully be present in all objects that can display a splash page containing onward links. It is a 'one-stop-shop' for data related to this onward progress. Some of Hydra's founding partners have found it convenient in several Hydra heads.

This "contentMetadata" datastream could contain, say, a METS FileSec, a METS StructMap, an ORE map or a locally defined schema. The contentMetadata schema that the Hydra partners will use is based on one developed at Stanford (copy here for reference) which is being adopted elsewhere in slightly modified form (see for instance the version being used at the University of Hull.

Hydra contentMetadata schemas
And more

There are many more datastreams that could be defined in a Hydra object, for instance:

  • technicalMetadata
  • provenanceMetadata
  • sourceMetadata

and others you may create in response to need.

Datastreams for content

Initially, Hydra was quite prescriptive about the way datastreams should be named.  In practice, implementers may or may not have followed the recommendations and so we are now more laid back about such things.  But may we suggest a couple of guidelines?

We suggested (and if pushed would still recommend) that in a simple object with a single content-bearing datastream it should be called "content".  Further, we'd suggest that every content-bearing object should have a "content" datastream; this so that a new Hydra head (perhaps imported from elsewhere) has somewhere to start.  In a compound object (multiple content-bearing datastreams in the same object), the simple pattern would be content, content02, content03 etc.

Beyond that we might suggest that if an object has a thumbnail for display that datastream should be called "thumbnail" (again with a view to interoperability at a basic level).  Beyond even that, the choice is yours.  For reference, and/or general interest, these are some of the datastreams we originally tried to prescribe:

Simple content (pdf, etc)

  • content
  • original (optional, perhaps the docx file from which the pdf is derived)
  • thumbnail (optional, but if you have the cover image...)

General compound content

  • content
  • content02 
  • content03 (optional)
  • content.. (optional)
  • thumbnail (optional)

Simple images:

  • thumbnail
  • screen (say a 1024px version of the original)
  • max (maximum deliverable resolution)
  • original (optional, but useful if the original is a TIFF and you are delivering a jpg, say)
  • content (optional, see above, might deliver any of the first three according to local thinking)

Sets, sets or sets?

It is not necessary to use sets in a Fedora repository, Hydra-based or otherwise. If you need to offer your end users some sort of structure it may well be enough to do this using facets in the discovery interface. That said, some people find sets useful for two purposes: providing a behind-the-scenes structure to aid management and/or for providing context around a collection of objects. 

In the first case one might have a set for ETDs, within which sub-sets for individual subjects: this makes it easy to identify, say, all the biology ETDs in your repository for management. In the second case, you might have a collection of datasets all related to the same subject matter: discovered on its own each is of limited use, grouped together by a set object that explains their context and purpose they are much more useful.

It is possible that these two functions, management and provision of context, can be served by exactly the same set objects in which case the 'isMemberOf' relationship will answer all your needs. Equally, it may be useful to distinguish between the two approaches and talk about 'structural sets' and 'display sets'. In this scenario 'isMemberOf' is used for the structural sets and 'isDependentOf' for display sets and it is likely that your structural sets would be completely hidden from end users.

There are two basic patterns for managing "sets", Hydra's preferred name over "collections" or "folders". 

Implicit set relationships in which the set object has no explicit listing but rather contains some rule(s) for identifying its set members.

Explicit set relationships in which the set object contains an explicit listing of its set members. 

In all cases there must be a single object that represents the set itself in the repository: this object defines and describes the set (in the abstract and/or for specific UI use) and provides a reference point (a pid) for creating object associations to the set.

Doubtless people are developing additional ways of dealing with sets.

So what about disseminators?

As we noted near the top of the page, many Hydra implementations have found no need to use Fedora disseminators - but what about those that have?

If you want to make use of Fedora disseminators you first need to read the appropriate parts of the Fedora Content Modelling Architecture.  The implications for Hydra are that, not only must you declare your content type as a cModel in RELS-EXT but the cModel object must exist in the Fedora repository - as must the corresponding sDef and sDep objects.  Why go to this trouble?  Well, for instance, disseminators provide a way of manipulating content on the fly.  A color image could be delivered in monochrome, say, or a metadata datastream could be transformed to another format.  A concrete example of this last case in use is that Hull keeps all descriptive metadata as MODS but can deliver it to the user expressed as, for example, DC: a disseminator applies an XSLT transform on the fly.

Granted, you could do XSLT transforms from within your Ruby code.  It's all a matter of choice!


  • No labels

2 Comments

  1. Trivial comment - The first paragraph in the 'Sets, sets, or sets?' section is immediately repeated. Cheers.

    1. Not any more it's not.  Thanks for the heads-up.  I'd guess my 'paste' finger was a bit trigger happy...