Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Enhance docs on Hibernate & Flyway

...

DSpace uses a relational database to store all information about the organization of content, metadata about the content, information about e-people and authorization, and the state of currently-running workflows. The DSpace system also uses the relational database in order to maintain indices that users can browse. 

DSpace 6 database schema (PostgresPostgreSQL). Right-click the image and choose "Save as" to save in full resolution. Instructions on updating this schema diagram are in How to update database schema diagram.

Most of the functionality that DSpace uses can be offered by any standard SQL database that supports transactions. However at this time, DSpace APIS use some features specific to PostgreSQL and Oracle, so some modification to the code would be needed before DSpace would function fully with an alternative database back-end.

The org.dspace.storage.rdbms package provides access to an SQL database in a somewhat simpler form than using JDBC directly. The primary class is DatabaseManager, which executes SQL queries and returns TableRow or TableRowIterator objects.

...


  • DSpace uses Hibernate ORM as the object relational mapping layer between the DSpace database and the DSpace code.
    • The main Hibernate configuration can be found at [dspace]/config/hibernate.cfg.xml
    • Hibernate initialization is triggered via Spring (beans) defined [dspace]/config/spring/api/core-hibernate.xml.  This Spring configuration pulls in some settings from DSpace Configuration, namely all Database (db.*) settings defined there.
    • All DSpace Object Classes provide a DAO (Data Access Object) implementation class that extends a GenericDAO interface defined in org.dspace.core.GenericDAO class.  The default (abstract) implementation is in org.dspace.core.AbstractHibernateDAO class.
    • The DSpace Context object (org.dspace.core.Context) provides access to the configured org.dspace.core.DBConnection (Database Connection), which is HibernateDBConnection by default.  The org.dspace.core.HibernateDBConnection class provides access the the Hibernate Session interface (org.hibernate.Session) and its Transactions. 
      • Each Hibernate Session opens a single database connection when it is created, and holds onto it until the Session is closed.  A Session may consist of one or more Transactions. Sessions are NOT thread-safe (so individual objects cannot be shared between threads).
      • Hibernate will intelligently cache objects in the current Hibernate Session (on object access), allowing for optimized performance.  
      • DSpace provides methods on the Context object to specifically remove (Context.uncacheEntity()) or reload (Context.reloadEntity()) objects within Hibernate's Session cache.
      • DSpace also provides special Context object "modes" to optimize Hibernate performance for read-only access (Mode.READ_ONLY) or batch processing (Mode.BATCH_EDIT).  These modes can be specified when constructing a new Context object.
  • DSpace uses FlywayDB to perform automated database initialization and upgrades
    • The org.dspace.storage.rdbms.DatabaseUtils class manages all Flyway API calls, and executes the SQL migrations under the org.dspace.storage.rdbms.sqlmigration package and the Java migrations under the org.dspace.storage.rdbms.migration package. 
    • Once all database migrations have run, a series of Flyway Callbacks are triggered to initialize the (empty) database with required default content.  For example, callbacks exist for adding default DSpace Groups (GroupServiceInitializer), default Metadata & Format Registries (DatabaseRegistryUpdater), and the default Site object (SiteServiceInitializer). All Callbacks are under the org.dspace.storage.rdbms package.
    • While Flyway is automatically initialized and executed during startup, various Database Utilities are also available on the command line.  These utilities allow you to manually trigger database upgrades or check the status of your database.

Most of the functionality that DSpace uses can be offered by any standard SQL database that supports transactions.  However, at this time, DSpace only provides Flyway migration scripts for PostgreSQL and Oracle (and has only been tested with those database backends).  Additional database backends should be possible, but would minimally require creating custom Flyway migration scripts for that database backend..sqlmigration package and the Java migrations under the org.dspace.storage.rdbms.migration package.  While Flyway is automatically initialized and executed during the initialization of DatabaseManager, various Database Utilities are also available on the command line..

Maintenance and Backup

When using PostgreSQL, it's a good idea to perform regular 'vacuuming' of the database to optimize performance. By default, PostgreSQL performs automatic vacuuming on your behalf.  However, if you have this feature disabled, then we recommend scheduling the vacuumdb command to run on a regular basis.

...

  • After restoring a backup, you will need to reset the primary key generation sequences so that they do not produce already-used primary keys. Do this by executing the SQL in [dspace]/etc/postgres/update-sequences.sql, for example with:

    Code Block
    psql -U dspace -f [dspace]/etc/update-sequences.sql
    


Configuring the

...

Database Component

The database manager is configured with the following properties in dspace.cfg:

db.url

The JDBC URL to use for accessing the database. This should not point to a connection pool, since DSpace already implements a connection pool.

db.driver

JDBC driver class name. Since presently, DSpace uses PostgreSQL-specific features, this should be org.postgresql.Driver.

db.username

Username to use when accessing the database.

db.password

Corresponding password ot use when accessing the database.

Custom RDBMS tables,

...

columns or views

When at all possible, we recommend creating custom database tables or views within a separate schema from the DSpace database tables. Since the DSpace database is initialized and upgraded automatically using Flyway DB, the upgrade process may stumble or throw errors if you've directly modified the DSpace database schema, views or tables.  Flyway itself assumes it has full control over the DSpace database schema, and it is not "smart" enough to know what to do when it encounters a locally customized database.

...