The Arrow project (http://arrow.edu.au) spent a great deal of time thinking about and documenting content models.

There are (at least) two dimensions to the thinking around content models.

The first is the difference between a compound object model vs an atomistic model. The second is the way in which metadata is used to describe different kinds of resource types.

Compound vs. atomistic model

Early in the project, ARROW committed to the compound object model rather than the atomistic model. What we mean by this is that there is a 'knowledge entity' which may have lots of different components. A book may have several chapters. A thesis may have a body, some tables, some images, a movie file, and an appendix. One knowledge entity = one object = many datastreams. (The atomistic model would separate these out into lots of different objects and link them together later.)

We started thinking in terms of contentstreams and metastreams, and how to link one contentstream to its corresponding metastream. (My terminology, which I hope is self-explanatory)

One object might contain one pdf contentstream, and one image contentstream. Expanding this object to include its metastreams might look like this:

PID	Descriptive label	Format	Comments
DC	Dublin Core	text/xml	Descriptive metadata applying to entire object
DS1	MARCXML	text/xml	Descriptive metadata applying to entire object
DS2	Chapter 1	application/pdf	A pdf contentstream
DS3	DS2metadata	text/xml	A metastream describing ONLY DS2
DS4	Image	text/xml	A jpg image contentstream
DS5	DS4metadata	text/xml	A metastream describing ONLY DS4
DS6	Relationships	text/xml	A datastream describing the relationships between each of these datastreams

Being able to specify metadata for a specific contentstream has implications for things like controlling access at the datastream level, displaying file size and other attributes for specific contentstreams, applying keywords or other descriptive metadata for specific contentstreams. It allows you describe and manage individual components as well as the total knowledge entity.

Content models for different resource types

ARROW repositories are designed to manage the research outputs of universities in Australia. Each ARROW institution has its own individual and unique repository. (These are brought together in one interface through the National Library of Australia's Discovery Service http://search.arrow.edu.au). The original partners (Monash University, University of NSW, Swinburne University, and National Library of Australia) formed an Implementation Managers Group. This group worked on a series of standard resource types, and modelled those. We agreed that MARCXML was an appropriate schema for describing the mostly bibliographic objects that were to be ingested. From a MARCXML file, ARROW could extract standard Dublin Core for harvest.

We worked on standard MARCXML for: books, book chapters, conference papers, journal articles, theses, working papers. Later we added images (although strictly speaking these aren't resource types per se.)

We put all the agreed MARCXML metadata elements into one spreadsheet, which we call the Centralregister. We added the following information for each:

ContentModel, Contentmodel Name, Data element, Repeatable, Mandatory, Element Label, MARCXML, DC/OAI-PMH mapping, Comments or Rules, Harvest recommendation by the National Library of Australi (for the Resource Discovery Service), Display Y/N, and finally, whether the element was common to all resource types Y/N.

Using Excel's autofilter means that this spreadsheet could be used to see which elements were common to all the resource types (eleven elements are), which are unique to specific models (such as Conference Location), which of the elements appeared in what models, and so forth. It is a handy tool.

Knowing which of the metadata elements are common means that you can build a front end to elicit the appropriate elements for any resource type, then you can customise the remainder according to the resource type.

Once we had the elements pinned down, we then built xml templates. Our users can fire up the Journal Article template in any XML editor, and fill in the appropriate bits. Load them into Fedora using the VITAL client from VTLS, extract the Dublin Core automatically, and you have a pretty standard object. It is a start. More resource types will be added as we need them.

The Centralregister.xls spreadsheet, as well as the seven resourcetype XML templates will be attached when I discover how to do so! We welcome your suggestions and comments - please arrow@arrow.edu.au