Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

(If this page is useful, it will be moved into the DSpace documentation space.)

Table of Contents

Purpose

The purpose of this document is to provide an overview of the way that the DSpace code base interacts with a database.

The mechanisms for interacting with the database layer changed significantly in DSpace 6.x (see also DSpace Service based api).  This document will highlight those differences.

...

DSO - DSpace Objects

A DSO is a DSpace Object (org.dspace.content.DSpaceObject). Most everything in DSpace is a DSO (e.g. Site, Community, Collection, Item, Bitstream, EPerson, Group).  A DSO is saved to the database.  Bitstreams are a special type of DSO that have binary storage (of a file) in addition to data in the database.

...

The context object can be set to a privileged mode that can bypass all authorization checks.

The context object contains a cache (question) of objects.

...

manages/maintains a list of "events" to dispatch after a commit() (or when dispatchEvents() is called). These events represent changes to objects in the system, and are responded to by Event Consumers.

The context object interacts with the DBConnection class to manage database commits, connection pooling, transactions, etc.  The default DBConnection used is the HibernateDBConnection, which manages the Hibernate Session, Transaction, etc (more on that below).

Differences between DSpace 5 and DSpace 6 Context

In DSpace 5, each new Context() established a new DB connection. Context then committed or completed/aborted the connection after it was done (based on results of that request).  A single Context could also be shared between methods if a single transaction needed to perform actions across multiple methods.

In DSpace 6, Hibernate manages the DB connection pool.  Each thread is associated with a unique Hibernate Session (which corresponds to a DB connection in the pool). This means two Context objects may use the same DB connection (if they are in the same thread). In other words, code can no longer assume each new Context() is treated as a new/separate database transaction.

Warning

Please don't change this the property "hibernate.current_session_context_class" unless you really know what you are doing. It has huge impact on the software architecture. Changing the configuration without changing parts of DSpace's source code will probably result in a malfunctioning installation (and could result in data loss).

Curation Context (Curator.curationContext())

A context object built to be shared between Curation tasks.

Context Configurations

The DSpace Context Object can be constructed as a read-only context or a batch context.  This mode determines how hibernate will flush changes to the database. It is good behavior to store the old context mode before changing it and to set it back to the old mode when you're done with your work. This reduces problems when code parts that needs to be able to update stored content calls parts of DSpace's code that uses a read only or a batch mode.

See https://github.com/DSpace/DSpace/blob/dspace-6.1/dspace-api/src/main/java/org/dspace/core/HibernateDBConnection.java#L148-L156

Read Only Context in DSpace 6

The Context object can be set to a read only mode for enhanced performance when processing many database objects in a read-only fashion.

Curation Context (Curator.curationContext())

Objects read by a read only context are not intended to be modified. The context object should not be committed.

The read only context is intended for data security.  Implementations using this context should not be able to accidentally save changes.

Batch Context in DSpace 6

Hibernate provides a mechanism to submit a large batch of changes to a database in a memory-efficient manner.

See https://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html/batch.htmlA context object built to be shared between Curation tasks.

Database Interaction (before DSpace 6.x)

All Database actions requires require the presence of a Context object.  

...

Additional relationships can be explicitly declared using Hibernate annotations.

The Hibernate Session (Cache) and the Context Object

Hibernate will intelligently cache objects in the current Hibernate Session, allowing for optimized performance.  The hibernate cache is accessible from the DSpace Context object.

Some care is needed to properly utilize the hibernate cache.

Context Configurations

The DSpace Context Object can be constructed as a read-only context or a batch context.  This mode determines how hibernate will flush changes to the database.

...

 Each Hibernate Session opens a single database connection when it is created, and holds onto it until the session is closed.  A Session may consist of one or more Transactions.

In DSpace, the Hibernate Session (and its Transactions) is managed by the HibernateDBConnection object. (NOTE: This class is perhaps unfortunately named as it manages the process of obtaining a database connection from Hibernate, via a Session. It does not represent a single database connection.)

...

...

The DSpace Context object has methods (like uncacheEntity() and reloadEntity()) which can manage objects cached within this Hibernate Session (via HibernateDBConnection).

Some care is needed to properly utilize the Hibernate cache.  Objects are loaded into the Session cache on access. Objects are not removed from the cache until one of the following occurs:

  • The Hibernate Session's Transaction is committed (e.g. via a call to Context.commit() or Context.complete())
  • The Hibernate Session's Transaction is rolled back (e.g. via a call to Context.abort())
  • The object is specifically "evicted" (i.e. uncached) from the Hibernate Session (e.g. via a call to Context.uncacheEntity())

Be aware, once an object is removed (detached) from the Session cache, it will need to be reloaded from the database before it can be used again!  This can be achieved via Context.reloadEntity() or by querying for the object again via its Service.

Development tips regarding Hibernate Session

A few tips on working with Hibernate Sessions (all gleaned from https://developer.atlassian.com/confdev/development-resources/confluence-architecture/hibernate-sessions-and-transaction-management-guidelines)

  • Hibernate sessions are not thread-safe
    • Therefore, any new DSpace code must ensure it is not attempting to share objects or Sessions between threads. Instead, pass around UUIDs, so the new thread can load the referenced object in a new Session.
  • The more objects you load during the lifetime of a Session, the less efficient each query will be
    • So, be sure to use Context.commit() or Context.uncacheEntity() when you are done with an object
    • (recommendation: offer very specific/limited guidance on when to call uncacheEntity())
  • Because Hibernate has built-in Session caching, it is not recommended to cache objects elsewhere in your code.  If you must perform other caching, store UUIDs instead
    • Caching objects elsewhere is likely to result in a LazyInitializationException if the object (cached elsewhere) outlives its Session. See "Common Hibernate Error Messages" below

...

Read Only Context in DSpace 6

Objects read by a read only context are not intended to be modified.  The context object should not be committed.

The read only context is intended for data security.  Implementations using this context should not be able to accidentally save chages.

(question) Do read only connections also have optimized performance?

Batch Context in DSpace 6

Hibernate provides a mechanism to submit a large batch of changes to a database in a memory-efficient manner.

...

The Life-cycle of a DSO with Hiberate

...

HQL is a SQL-like query language that references hibernate object properties rather that table column names.

Hibernate Logging

If you wish to see Hibernate queries and their parameters logged in your DSpace log files (dspace.log.*), you can update the log4j.properties A1 appender as follows:

Code Block
# Log all Hibernate queries (does not include query params)
log4j.logger.org.hibernate.SQL=DEBUG, A1
# Log all Hibernate query parameters (immediately after query they pertain to)
log4j.logger.org.hibernate.type.descriptor.sql=TRACE, A1


Common Hibernate Error Messages

...