This article is based on the latest chapter that’s been added to the Hibernate User Guide. The Performance Tuning and Best Practices chapter aims to help the application developer to get the most out of their Hibernate persistence layer.
Every enterprise system is unique. However, having a very efficient data access layer is a common requirement for many enterprise applications. Hibernate comes with a great variety of features that can help you tune the data access layer.
Although Hibernate provides the
update option for the
hibernate.hbm2ddl.auto configuration property,
this feature is not suitable for a production environment.
An automated schema migration tool (e.g. Flyway, Liquibase) allows you to use any database-specific DDL feature (e.g. Rules, Triggers, Partitioned Tables). Every migration should have an associated script, which is stored on the Version Control System, along with the application source code.
When the application is deployed on a production-like QA environment, and the deploy worked as expected, then pushing the deploy to a production environment should be straightforward since the latest schema migration was already tested.
You should always use an automatic schema migration tool and have all the migration scripts stored in the Version Control System.
Whenever you’re using a framework that generates SQL statements on your behalf, you have to ensure that the generated statements are the ones that you intended in the first place.
There are several alternatives to logging statements. You can log statements by configuring the underlying logging framework. For Log4j, you can use the following appenders:
### log just the SQL log4j.logger.org.hibernate.SQL=debug ### log JDBC bind parameters ### log4j.logger.org.hibernate.type=trace log4j.logger.org.hibernate.type.descriptor.sql=trace
However, there are some other alternatives like using datasource-proxy or p6spy.
The advantage of using a JDBC
DataSource Proxy is that you can go beyond simple SQL logging:
statement execution time
JDBC batching logging
Another advantage of using a
DataSource proxy is that you can assert the number of executed statements at test time.
This way, you can have the integration tests fail when a N+1 query issue is automatically detected.
While simple statement logging is fine, using datasource-proxy or p6spy is even better.
JDBC allows us to batch multiple SQL statements and to send them to the database server into a single request. This saves database roundtrips, and so it reduces response time significantly.
UPDATE statements, but even
DELETE statements can be batched as well.
UPDATE statements, make sure that you have all the right configuration properties in place, like ordering inserts and updates and activating batching for versioned data.
Check out this article for more details on this topic.
DELETE statements, there is no option to order parent and child statements, so cascading can interfere with the JDBC batching process.
Unlike any other framework which doesn’t automate SQL statement generation, Hibernate makes it very easy to activate JDBC-level batching as indicated in the Batching chapter, in our User Guide.
Choosing the right mappings is very important for a high-performance data access layer. From the identifier generators to associations, there are many options to choose from, yet not all choices are equal from a performance perspective.
When it comes to identifiers, you can either choose a natural id or a synthetic key.
For natural identifiers, the assigned identifier generator is the right choice.
For synthetic keys, the application developer can either choose a randomly generates fixed-size sequence (e.g. UUID) or a natural identifier. Natural identifiers are very practical, being more compact than their UUID counterparts, so there are multiple generators to choose from:
TABLE generator addresses the portability concern, in reality, it performs poorly because it requires emulating a database sequence using a separate transaction and row-level locks.
For this reason, the choice is usually between
If the underlying database supports sequences, you should always use them for your Hibernate entity identifiers.
Only if the relational database does not support sequences (e.g. MySQL 5.7), you should use the
If you’re using the
SEQUENCE generator, then you should be using the enhanced identifier generators that were enabled by default in Hibernate 5.
The pooled and the pooled-lo optimizers are very useful to reduce the number of database roundtrips when writing multiple entities per database transaction.
JPA offers four entity association types:
@ElementCollection for collections of embeddables.
Because object associations can be bidirectional, there are many possible combinations of associations. However, not every possible association type is efficient from a database perspective.
The closer the association mapping is to the underlying database relationship, the better it will perform.
On the other hand, the more exotic the association mapping, the better the chance of being inefficient.
@ManyToOne and the
@OneToOne child-side association are best to represent a
FOREIGN KEY relationship.
@OneToOne association requires bytecode enhancement
so that the association can be loaded lazily. Otherwise, the parent-side is always fetched even if the association is marked with
For this reason, it’s best to map
@OneToOne association using
@MapsId so that the
PRIMARY KEY is shared between the child and the parent entities.
@MapsId, the parent-side becomes redundant since the child-entity can be easily fetched using the parent entity identifier.
For collections, the association can be either:
For unidirectional collections,
Sets are the best choice because they generate the most efficient SQL statements.
Lists are less efficient than a
Bidirectional associations are usually a better choice because the
@ManyToOne side controls the association.
Embeddable collections (
`@ElementCollection) are unidirectional associations, hence
Sets are the most efficient, followed by ordered
Lists, whereas bags (unordered
Lists) are the least efficient.
@ManyToMany annotation is rarely a good choice because it treats both sides as unidirectional associations.
For this reason, it’s much better to map the link table as depicted in the Bidirectional many-to-many with link entity lifecycle User Guide section.
FOREIGN KEY column will be mapped as a
On each parent-side, a bidirectional
@OneToMany association is going to map to the aforementioned
@ManyToOne relationship in the link entity.
Just because you have support for collections, it does not mean that you have to turn any one-to-many database relationship into a collection.
TABLE_PER_CLASS to deal with inheritance mapping, and each of these strategies has advantages and disadvantages.
SINGLE_TABLEperforms the best in terms of executed SQL statements. However, you cannot use
NOT NULLconstraints on the column-level. You can still use triggers and rules to enforce such constraints, but it’s not as straightforward.
JOINEDaddresses the data integrity concerns because every subclass is associated with a different table. Polymorphic queries or
`@OneToManybase class associations don’t perform very well with this strategy. However, polymorphic @ManyToOne` associations are fine, and they can provide a lot of value.
TABLE_PER_CLASSshould be avoided since it does not render efficient SQL statements.
Fetching too much data is the number one performance issue for the vast majority of JPA applications.
Hibernate supports both entity queries (JPQL/HQL and Criteria API) and native SQL statements. Entity queries are useful only if you need to modify the fetched entities, therefore benefiting from the automatic dirty checking mechanism.
For read-only transactions, you should fetch DTO projections because they allow you to select just as many columns as you need to fulfill a certain business use case. This has many benefits like reducing the load on the currently running Persistence Context because DTO projections don’t need to be managed.
Related to associations, there are two major fetch strategies:
Prior to JPA, Hibernate used to have all associations as
EAGER fetching is to be avoided. For this reason, it’s better if all associations are marked as
LAZY by default.
LAZY associations must be initialized prior to being accessed. Otherwise, a
LazyInitializationException is thrown.
There are good and bad ways to treat the
The best way to deal with
LazyInitializationException is to fetch all the required associations prior to closing the Persistence Context.
JOIN FETCH directive is goof for
OneToOne associations, and for at most one collection (e.g.
If you need to fetch multiple collections, to avoid a Cartesian Product, you should use secondary queries which are triggered either by navigating the
LAZY association or by calling
Hibernate has two caching layers:
the first-level cache (Persistence Context) which is a transactional write-behind cache providing application-level repeatable reads.
the second-level cache which, unlike application-level caches, it doesn’t store entity aggregates but normalized dehydrated entity entries.
The first-level cache is not a caching solution "per se", being more useful for ensuring
REPEATABLE READ(s) even when using the
READ COMMITTED isolation level.
While the first-level cache is short lived, being cleared when the underlying
EntityManager is closed, the second-level cache is tied to an
Some second-level caching providers offer support for clusters. Therefore, a node needs only to store a subset of the whole cached data.
Although the second-level cache can reduce transaction response time since entities are retrieved from the cache rather than from the database, there are other options to achieve the same goal, and you should consider these alternatives prior to jumping to a second-level cache layer:
tuning the underlying database cache so that the working set fits into memory, therefore reducing Disk I/O traffic.
optimizing database statements through JDBC batching, statement caching, indexing can reduce the average response time, therefore increasing throughput as well.
database replication is also a very valuable option to increase read-only transaction throughput
After properly tuning the database, to further reduce the average response time and increase the system throughput, application-level caching becomes inevitable.
Topically, a key-value application-level cache like Memcached or Redis is a common choice to store data aggregates. If you can duplicate all data in the key-value store, you have the option of taking down the database system for maintenance without completely loosing availability since read-only traffic can still be served from the cache.
One of the main challenges of using an application-level cache is ensuring data consistency across entity aggregates. That’s where the second-level cache comes to the rescue. Being tightly integrated with Hibernate, the second-level cache can provide better data consistency since entries are cached in a normalized fashion, just like in a relational database. Changing a parent entity only requires a single entry cache update, as opposed to cache entry invalidation cascading in key-value stores.
The second-level cache provides four cache concurrency strategies:
READ_WRITE is a very good default concurrency strategy since it provides strong consistency guarantees without compromising throughput.
TRANSACTIONAL concurrency strategy uses JTA. Hence, it’s more suitable when entities are frequently modified.
TRANSACTIONAL use write-through caching, while
NONSTRICT_READ_WRITE is a read-through caching strategy.
For this reason,
NONSTRICT_READ_WRITE is not very suitable if entities are changed frequently.
When using clustering, the second-level cache entries are spread across multiple nodes.
When using Infinispan distributed cache, only
NONSTRICT_READ_WRITE are available for read-write caches.
Bear in mind that
NONSTRICT_READ_WRITE offers a weaker consistency guarantee since stale updates are possible.
For more about Hibernate Performance Tuning, check out the High-Performance Hibernate presentation from Devoxx France.