Concepts

Frequently, we talk about "the data model" in VIVO. But this is an over-simplification which can be useful at times, but misleading at other times. In fact, VIVO contains a matrix of data models and sub-models, named graphs, datasets and other constructs.

It might be more accurate to talk about the union of these data models as "the knowlege base". However, the terminology of "the data model" is firmly entrenched.

In VIVO release 1.6, we are attempting to simplify this complex collection of models, and to produce a unified access layer. This is a work in progress. And regardless of how clean the design might eventually become, this will remain an area with complex requirements which cannot be satisfied by simplistic solutions.

Divisions in the knowledge base

Depending on what you want to do with the data, it can be useful to sub-divide it by one or more of the following criteria:

Types of statements

An RDF model is often divided into ABox (assertions) and TBox (terminology). In RDF, there is no technical distinction between TBox and ABox data. They are stored separately because they are used for different purposes. The combination of the two is informally called the Full model.

Data type Example data

TBox

"Terminological data"

Defines classes, properties, and relationships in your ontology.

foaf:Person 
 a owl:Class ; 
 rdfs:subClassOf owl:Thing ; 
 rdfs:label "Person"@en . 
ex:preferredName 
 a owl:DatatypeProperty ; 
 rdfs:subPropertyOf skos:prefLabel, foaf:name, rdfs:label ; 
 rdfs:domain foaf:Person ; 
 rdfs:label "preferred name"@en .

ABox

"Assertion data"

Enumerates the individual instances of your classes and describes them.

local:tobyink 
 a foaf:Person ; 
 ex:preferredName "Toby Inkster" .

Full

The TBox and the ABox together, treated as a single model.

For example, when you use the RDF tools to remove statements, you want them removed regardless of whether they are found in the TBox or the ABox.

Source of statements

An RDF model can also be divided into Assertions and Inferences. The combination of the two is informally called the Union.

Statement type Meaning Example data

Assertions Statements that you explicitly add to the model, either through setup, ingest, or editing. local:tobyink rdfs:type core:FacultyMember .

Inferences

Statements that the semantic reasoner adds to the model, by reasoning about the assertions, or about other inferences.

local:tobyink rdfs:type foaf:Person .

local:tobyink rdfs:type foaf:Agent .

local:tobyink rdfs:type owl:Thing .

Union

The combination of Assertions and Inferences.

For most purposes, this is the desired model. You want to know what statements are available, without regard to whether they were asserted or inferred.

"Content" vs. "Configuration"

We sometimes distinguish between the data that VIVO is serving (Content) and the data that VIVO itself uses (Configuration). The Content is available for display, for searching, for serving as Linked Open Data. The Configuration controls how the content is displayed, who can access the data, and what VIVO itself looks like.

Model type

Purpose

Examples

Configuration

Data about the VIVO application itself.

Application parameters

User Accounts

Display options

Content

The payload - the data that VIVO is intended to distribute.

People data

Publications data

Grant data

etc.

Model scope

The knowledge base exists for as long as VIVO is running. However, subsets or facets of the knowledge base are often used to satisfy a particular HTTP request, or through the length of a VIVO session for a particular user. These subsets are created dynamically from the full knowledge base, used for as long as they are useful, and then discarded.

Scope

Purpose

Example

Discarded when...

Application

(Servlet Context)

Created for the life of VIVO.

Never discarded.

Session

Created for a particular logged-in user

Data that is filtered by what the user is permitted to view.

When the user logs out, or the session times out.

Request

Created for a single HTTP request

Data that is organized by the languages that are preferred by the browser.

When the individual request has been satisfied.

At present, the Session lifespan is almost never used. However, potential use cases do exist for it.

The Request lifespan is used extensively, since it provides a convenient way to manage database connections and minimize contention for resources.

Purpose vs. scope

It is tempting to think of the models of the Servlet Context as equivalent to the unfiltered models of the Request. And in fact, they may represent the very same data. However, the unfiltered models in the Request go out of scope when the Request has been satisfied. As such, it is much easier to manage the resources required by these models. By contrast, the models of the Servlet Context never go out of scope until VIVO is shut down. Any resources such as database connections or processor memory are not easily reclaimed from these models.

Initializing the Models

TBD - Mention the criteria, the file paths. Include the filegraph and other directories and files.

Application data (Display model)

Name: http://vitro.mannlib.cornell.edu/default/vitro-kb-displayMetadata

Source: the application Datasource (MySQL database)

If empty, read the files in /WEB-INF/ontologies/app/ (without subdirectories)

Every time, read /WEB-INF/ontologies/app/loadedAtStartup

User Accounts

Name: http://vitro.mannlib.cornell.edu/default/vitro-kb-userAccounts

Source: the application Datasource (MySQL database)

If empty, read the files in /WEB-INF/ontologies/auth (without subdirectories). None expected, except during Selenium tests.

The ModelAccess class

Show how it represents all of these distinctions. Describe the scope searching and masking, wrt set and get.

Transition from previous methods

What are we transitioning from? Check out VIVO-82.

Semantics have changed: saves code, but may alter some uses.
- Always searches the stack
- OMS are facades with no internal state
  - There is no way to set an OMS - set the models instead
  - Keeps consistent

	prior to ModelAccess	using ModelAccess
User Accounts Model	ctx.getAttribute("userAccountsOntModel")	ModelAccess.on(ctx).getUserAccountsModel()
	ctx.setAttribute("userAccountsOntModel", model)	ModelAccess.on(ctx).setUserAccountsModel(model)
DisplayModel	req.getAttribute("displayOntModel")	ModelAccess.on(req).getDisplayModel()
	session.getAttribute("displayOntModel")	ModelAccess.on(session).getDisplayModel()
	ctx.getAttribute("displayOntModel") ModelContext.getDisplayModel(ctx)	ModelAccess.on(ctx).getDisplayModel()
	ctx.setAttribute("displayOntModel", model) ModelContext.setDisplayModel(model, ctx)	ModelAccess.on(ctx).getDisplayModel()
	req.setAttribute("displayOntModel", model)	ModelAccess.on(req).setDisplayModel(model)
"jenaOntModel"	ctx.getAttribute("jenaOntModel")	ModelAccess.on(ctx).getJenaOntModel()
	session.getAttribute("jenaOntModel")	ModelAccess.on(session).getJenaOntModel()
	req.getAttribute("jenaOntModel")	ModelAccess.on(req).getJenaOntModel()
	ctx.setAttribute("jenaOntModel", model)	ModelAccess.on(ctx).setOntModel(ModelID.UNION_FULL, model)
	req.setAttribute("jenaOntModel", model)	ModelAccess.on(req).setOntModel(ModelID.UNION_FULL, model) ModelAccess.on(req).setJenaOntModel(model)
"baseOntModel" "assertionsModel" Base Full Model	ModelContext.getBaseOntModel(ctx) ctx.getAttribute("baseOntModel") session.getAttribute("baseOntModel")	ModelAccess.on(ctx).getOntModel(ModelID.BASE_FULL) ModelAccess.on(ctx).getBaseOntModel()
	ModelContext.setBaseOntModel(model, ctx)
"inferenceModel" Inference Full Model	ctx.getAttribute("inferenceOntModel")	ModelAccess.on(ctx).getInferenceOntModel()

Notes:

"jenaOntModel" is a previous term for the Union Full model. The convenience methods getJenaOntModel() and setJenaOntModel(m)support this use.
"baseOntModel" and "assertionsModel" are both previous terms for the Base Full model. The convenience methods getBaseOntModel() and setBaseOntModel(m)support this use.

prior to ModelAccess

using ModelAccess

ontModelSelector

unionOntModelSelector

ModelContext.setOntModelSelector(model, ctx)

ModelContext.getUnionOntModelSelector(ctx)

ctx.getAttribute("ontModelSelector")

ctx.getAttribute("unionOntModelSelector")

no mutator methods

ModelAccess.on(ctx).getOntModelSelector()

ModelAccess.on(ctx).getUnionOntModelSelector()

baseOntModelSelector

ctx.getAttribute("baseOntModelSelector")

ModelAccess.on(ctx).getBaseOntModelSelector()

inferenceOntModelSelector

ctx.getAttribute("inferenceOntModelSelector")

ModelAccess.on(ctx).getInferenceOntModelSelector()

Notes:

The default WebappDaoFactory is the one backed by the unionOntModelSelector. On the request level, this is also known as the "fullWebappDaoFactory". The convenience methodsgetWebappDaoFactory() and setWebappDaoFactory(wdf) support this use.
"baseWebappDaoFactory" and "assertionsWebappDaoFactory" are both previous terms for the WebappDaoFactory backed by the baseOntModelSelector. The convenience methods getBaseWebappDaoFactory() and setBaseWebappDaoFactory(wdf) support this use.
Nobody was using the "deductionsWebappDaoFactory", so we got rid of it.

Future directions?

What are we transitioning toward? From VIVO-82

Page tree

Accessing the data models in VIVO

Concepts

Divisions in the knowledge base

Types of statements

Source of statements

"Content" vs. "Configuration"

Model scope

Purpose vs. scope

Initializing the Models

Application data (Display model)

User Accounts

The ModelAccess class

Transition from previous methods

Future directions?