Contribute to the DSpace Development Fund

The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

(If this page is useful, it will be moved into the DSpace documentation space.)

Purpose

The purpose of this document is to provide an overview of the way that the DSpace code base interacts with a database.

The mechanisms for interacting with the database layer changed significantly in DSpace 6.x.  This document will highlight those differences.

This document will also outline additional changes that are anticipated in the development of DSpace 7.x.

DSO - DSpace Objects

A DSO is a DSpace Object (Collection, Item, Bitstream).  A DSO is saved to the database.  Bitstreams are a special type of DSO that have binary storage in addition to data in the database.

Each DSO is represented as a table in the DSpace database.  Some additional tables are present to represent relationships between DSOs.

org.dspace.core.Context - DSpace Context

The DSpace Context Object contains information about about the user/session interacting with DSpace code.

The context object can be queried to determine the current user and the current user's locale.

The context object can be set to a privileged mode that can bypass all authorization checks.

The context object contains a cache (question) of objects.

Read Only Context

The Context object can be set to a read only mode for enhanced performance when processing many database objects in a read-only fashion.

Curation Context (Curator.curationContext())

A context object built to be shared between Curation tasks.

Database Interaction (before DSpace 6.x)

All Database actions requires the presence of a Context object.  

All DSOs are constructed with a Context object.  The context object provides access to the database connections to create/retrieve/update/delete DSOs.

The context object is used to authorize access to particular actions.

Individual DSOs implement an update() method.  This method calls org.dspace.storage.rdbms.DatabaseManager.update().  This is a helper class that helps to construct the SQL for a DSO.

Data Access Objects (introduced in DSpace 6.x)

The concept of a Data Access Object (DAO) was introduced in DSpace 6 to provide an optimization layer between the DSpace code and the DSpace database.

In DSpace 6, Hibernate was implemented as the DAO.  The DAO concept would allow for a framework other than Hibernate to be implemented in the DSpace code base.

Here is the interface for the GenericDAO in DSpace 6: https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/core/GenericDAO.java

Hibernate (introduced in DSpace 6.x)

DSpace 6 introduced hibernate (http://hibernate.org/orm/) as an object relational mapping layer between the DSpace database and the DSpace code.

Objects accessed by hibernate are registered in the hibernate.cfg.xml file.  DSO properties and relationships can be inferred from the database schema.  

The following class provides a hibernate implementation of the GenericDAO interface.

Because hibernate has a mechanism for automatically updating content that has changed, the save() method is not implemented.  

The save to the database is invoked when Context.commit() is called.

The hibernate commit is implemented in the following manner

Hibernate Annotations in DSpace

Additional relationships can be explicitly declared using Hibernate annotations.

The Hibernate Cache and the Context Object

Hibernate will intelligently cache objects allowing for optimized performance.  The hibernate cache is accessible from the DSpace Context object.

Some care is needed to properly utilize the hibernate cache.

Context Configurations

The DSpace Context Object can be constructed as a read-only context or a batch context.  This mode determines how hibernate will flush changes to the database.

See https://github.com/DSpace/DSpace/blob/dspace-6.1/dspace-api/src/main/java/org/dspace/core/HibernateDBConnection.java#L148-L156

Read Only Context in DSpace 6

Objects read by a read only context are not intended to be modified.  The context object should not be committed.

The read only context is intended for data security.  Implementations using this context should not be able to accidentally save chages.

(question) Do read only connections also have optimized performance?

Batch Context in DSpace 6

Hibernate provides a mechanism to submit a large batch of changes to a database in a memory-efficient manner.

See https://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html/batch.html

The Life-cycle of a DSO with Hiberate

(Explanation needed for the states that an object can be in)

  • Retrieved from hibernate
  • Retrieved from hibernate, modified, unsaved
  • Retrieved from hibernate, modified, saved
  • "Detached" object

Hibernate Cache Management in DSpace Command Line Tools

Some DSpace command line tools process a large number of DSOs from a single Context object.  In such a case, the hibernate cache can become too large and trigger memory exceptions.

In such a case, it is necessary to explicitly purge items from the DSpace cache.  

For instance, when re-indexing DSpace, the entire hierarchy is traversed.  DSOs are removed from the cache once they are no longer needed.

Hibernate Issues Discovered in DSpace 6.1

Surprisingly, Hibernate Database Connections are shared between DSpace Context objects.  Therefore, database connections used by read only contexts and by editable contexts are shared. 

The proper commit/closure of a database connection differs for read only connections and writable connections.  Since these connections are shared, unexpected behavior has been discovered when an incompatible database connection is used by a DSpace context.

Recommended use of Hibernate and the Context Object (DSpace 6.2 and beyond)

When to Construct a Context Object

When to use a Read Only Context

When to use a Batch Context

What is the Proper Way to Close a Context Object?

What is the Proper Way to Close a Read-Only Context Object?

What is the Proper Way to Close a Batch Context Object?

When to call Context.uncacheEntity()

When to call Context.reloadEntity()

Hibernate Queries

In order to take advantage of the hibernate cache and other hibernate features, all queries for DSOs will be performed through the hibernate framework rather than by generating SQL explicitly.

Hibernate Criteria Queries

This allows the construction of a query in an object-oriented fashion.

Hibernate Query Language (HQL)

HQL is a SQL-like query language that references hibernate object properties rather that table column names.

Common Hibernate Error Messages

LazyInitializationException

For example: LazyInitializationException: failed to lazily initialize ... could not initialize proxy - no Session

StaleStateException

For example: StaleStateException: Batch update returned unexpected row count from update

  • This error means that your Hibernate tried to update an object that either no longer exists in the Database, or the update already previous occurred. In other words, the state of this object was "stale" in the Hibernate cache, and its state in the Database was different.
  • In DSpace, this may mean that Context.commit() should have been called previously to save the object in question (and ensure the cache and database are synced).

Hibernate Resources

  • No labels