Red Hat

In Relation To

The Hibernate team blog on everything data.

Hi, I’m Mincong, an engineering student from France. I’m glad to present my Google Summer of Code 2016 project, which provides an alternative to the current mass indexer implementation of Hibernate Search, using the Java Batch architecture (JSR 352). I’ve been working on this project for 4 months. Before getting started, I want to thank Google for sponsoring the project, the Hibernate team for accepting my proposal and my mentors Gunnar and Emmanuel for their help during this period. Now, let’s begin!

What is it about?

Hibernate Search brings full-text search capabilities to your Hibernate/JPA applications by synchronizing the state of your entities with a search index maintained by Lucene (or Elasticsearch as of Hibernate Search 5.6!). Index synchronization usually happens on the fly as entities are modified, but there may be cases where an entire index needs to be re-built, e.g. when enabling indexing for an existing entity type or after changes have been applied directly to the database, bypassing Hibernate (Search).

Hibernate Search provides the mass indexer for this purpose. It was the goal of my GSoC project to develop an alternative using the API for Java batch applications standardized by JSR 352.

What do we gain from JSR 352?

Implementing the mass indexing functionality using the standardized batching API allows you to use the existing tools of your runtime environment for starting/stopping and monitoring the status of the indexing process. E.g. in WildFly you can use the CLI to do so.

Also JSR 352 provides a way to restart specific job runs. This is very useful if re-indexing of an entity type failed mid-way, for instance due to connectivity issues with the database. Once the problem is solved, the batch job will continue where it left off, not processing again those items already processed successfully.

As JSR 352 defines common concepts of batch-oriented applications such as item readers, processors and writers, the job architecture and workflow is very easy to follow. In JSR 352, the workflow is written in an XML file (the "job XML"), which is used to specify a job, its steps and directs their execution. So you can understand the process without jumping into the code.

<job id="massIndex">
    <step id="beforeChunk" next="produceLuceneDoc">
        <batchlet ref="beforeChunkBatchlet"/>
    </step>

    <step id="produceLuceneDoc" next="afterChunk">
        <chunk checkpoint-policy="custom">
            <reader ref="entityReader"/>
            <processor ref="luceneDocProducer"/>
            <writer ref="luceneDocWriter"/>
            <checkpoint-algorithm ref="checkpointAlgorithm"/>
        </chunk>
        ...
    </step>

    <step id="afterChunk">
        <batchlet ref="afterChunkBatchlet"/>
    </step>
</job>

As you see, it brings a pervasive batch-processing workload. Anyone who has experience in ETL processes should have no difficulty to understand our new implementation.

Example usages

Here are the example usages of the new mass indexer under a draft version. It allows you to add one or multiple class types. If you have more than one root entity to index, then you can use the addRootEntities(Class<?>…​) method.

How to use the new MassIndexer
long executionId = new MassIndexer()
        .addRootEntity( Company.class )
        .start();
Another example with a more customized configuration:
long executionId = new MassIndexer()
        .addRootEntity( Company.class, Employee.class )
        .cacheable( false )
        .checkpointFreq( 1000 )
        .rowsPerPartition( 100000 )
        .maxThreads( 10 )
        .purgeAtStart( true )
        .optimizeAfterPurge( true )
        .optimizeAtEnd( true )
        .start();

Parallelism

In order to maximize the performance, we highly recommend you to speed up the mass indexer using parallelism. Parallelism is activated by default. Under the JSR 352 standard, the exact word is "partitioning". The indexing step may run as multiple partitions, one per thread. Each partition has its own partition ID and parameters. If there are more partitions than threads, partitions are considered as a queue to consume: each thread can only run one partition at a time and won’t consume the next partition until the previous one is finished.

massIndexer = massIndexer.rowsPerPartition( 500 );

Checkpoints

Mass indexer supports checkpoint algorithm. If the job is interrupted for any reason, mass indexer can be restarted from the last checkpoint, stored by the batch runtime. And the entities already indexed won’t be lost, because they are already flushed to the directory provider. Assume that N is the value of checkpoint frequency, then a partition will reach at checkpoint every N items processed inside the partition. You can overwrite it to adapt your business requirements.

massIndexer = massIndexer.checkpointFreq( 1000 );

Run

For further usage, please check my GitHub repo gsoc-hsearch. If you want to play with it, you can download the code and build it with Maven:

$ git clone -b 1.0 git://github.com/mincong-h/gsoc-hsearch.git
$ cd gsoc-hsearch
$ mvn install

Current status and next steps

Currently, the new implementation accepts different types of entity as entry, provides high level of customization of the job properties and parallel indexation. The job periodically saves its current progress to enable restart from the last point of consistency. Load balancing has been considered to avoid overload of any single thread. This indexing batch job is available under Java SE and Java EE.

There are still many things to do, e.g. related to performance improvements, integration into WildFly, monitoring, more fine-grained selection of entities to be re-indexed etc. Here are some of the ideas:

  • Core: partition mapping for composite ID

  • Integration: package the batch job as a WildFly module

  • Integration: start the indexing batch job from FullTextSession and FullTextEntityManager

  • Integration: embed this project into Hibernate Search

  • Monitoring: enhance the basic monitoring, e.g. progress status for restarted job

  • Performance: Ensure a great performance of this implementation

These tasks are tracked as GitHub issues, you can check the complete TODO list here.

Feedback

If you are using Hibernate Search and ever wished for a more standardized approach to mass indexing, this project clearly is for you.

We still need to apply some improvements and polishing before integrating it as a module into the Hibernate Search core code base, but any bug reports or comments on the project will be very helpful. So please give it a try and let us know about your feedback. Just drop a comment below or raise an issue on GitHub.

Looking forward to hearing from you!

Hibernate ORM 5.1.1.Final

Posted by    |       |    Tagged as Hibernate ORM Releases

Hibernate ORM 5.1.1.Final has just been tagged and published.

The complete list of changes can be found here (or here for people without a Hibernate Jira account).

For information on consuming the release via your favorite dependency-management-capable build tool, see http://hibernate.org/orm/downloads/

The release bundles can be obtained from: SourceForge or BinTray.

Hibernate Community Newsletter 16/2016

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Articles

Hibernate OGM basics with MongoDB is a short tutorial that demonstrates how easy you can persist JPA entities into MongoDB.

The Java In Code blog offers a very good article about the benefits of the Hibernate bootstrap mechanism.

If you’re running Windows and you happen to have an Nvidia Video Card, you might get into problems when running Wildfly with Arquillian. This blog post shows you how you can tackle this issue.

Anghel Leonard wrote a series of articles related to Hibernate:

I’ve noticed many questions related to mapping Composite Keys on our forum and StackOverflow. For this reason, I decided to write an article about the best way to map a Composite Key with JPA and Hibernate.

In Hibernate 5.2.2, we added an improvement to the way DISTINCT is handled to improve performance. If you’re interested in this topic, check out the article on our blog.

Debezium is a Change Data Capture tool which scans the database transaction log and extracts all data changes. For more on this topic, check out this post about capturing changes from MySQL.

Projects

Mihai Costin developed a library for using the Hibernate second level cache mechanism with Memcached. If your application already uses Memcached, it makes sense to store the second level cache entries in the same distributed caching system. This library is hosted on GitHub and available on Maven Central, so you can easily grab it and give it a try.

Time to upgrade

We have released two versions of Hibernate ORM: 5.0.10 and 5.2.2.

Second bug-fix release for ORM 5.2

Posted by    |       |    Tagged as Hibernate ORM Releases

The Second bug-fix release for Hibernate ORM 5.2 has just been published. It is tagged at https://github.com/hibernate/hibernate-orm/releases/tag/5.2.2

The complete list of changes can be found here (or here for people without a Hibernate Jira account).

For information on consuming the release via your favorite dependency-management-capable build tool, see http://hibernate.org/orm/downloads/

The release bundles can be obtained from SourceForge or BinTray.

Introduction

When it comes to optimizing queries, you should listen to Markus Winand. While reading this great article about Nested Loops, I stumbled on the following remark:

The query has the expected left join but also an unnecessary distinct keyword. Unfortunately, JPA does not provide separate API calls to filter duplicated parent entries without de-duplicating the child records as well. The distinct keyword in the SQL query is alarming because most databases will actually filter duplicate records. Only a few databases recognize that the primary keys guarantees uniqueness in that case anyway.

— Markus Winand
Nested Loops

Markus is right. Not only that the DISTINCT keyword is redundant, but it’s affecting performance as well. Because we want Hibernate to be a solution for high-performance data access systems, I created the HHH-10965 Jira issue which is fixed in Hibernate 5.2.2. This article is going to demonstrate the redundant DISTINCT problem, and how you can overcome it in Hibernate 5.2.2.

Domain Model

For test sake, let’s assume that we want to develop a book store application. The two most important entities within our system are the Book and the Person who’s written the book.

To simplify our design, we’ll assume that every Book is written by a single author, although in reality, this might not always be the case. Therefore, the Domain Model looks like this:

@Entity(name = "Person") @Table( name = "person")
public class Person {

    @Id @GeneratedValue
    private Long id;

    @Column(name = "first_name")
    private String firstName;

    @Column(name = "last_name")
    private String lastName;

    @OneToMany(mappedBy = "author", cascade = CascadeType.ALL)
    private List<Book> books = new ArrayList<>(  );

    public Person() {}

    public Person(String firstName, String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }

    public void addBook(Book book) {
        books.add( book );
        book.setAuthor( this );
    }

    //Getters and setters omitted for brevity
}

@Entity(name = "Book") @Table( name = "book")
public class Book {

    @Id @GeneratedValue
    private Long id;

    private String title;

    @ManyToOne
    private Person author;

    public Book() {}

    public Book(String title) {
        this.title = title;
    }

    //Getters and setters omitted for brevity
}

Every Book has an author property indicating the Person who wrote it.

Using DISTINCT

Supposing we need to select a particular author, we might want to provide a list of last names so that the user can trim down the author list. This can be done easily with the following query:

List<String> lastNames = entityManager.createQuery(
    "select distinct p.lastName " +
    "from Person p", String.class)
.getResultList();

assertTrue(
    lastNames.size() == 2 &&
    lastNames.contains( "King" ) &&
    lastNames.contains( "Mihalcea" )
);

As expected, Hibernate generates the following SQL query:

SELECT DISTINCT
    p.last_name as col_0_0_
FROM person p

In this context, the use of DISTINCT is understandable since we want duplicates to be removed.

So far, so good, But, let’s now assume we want to fetch all Person entities along with their Book(s):

List<Person> authors = entityManager.createQuery(
    "select p " +
    "from Person p " +
    "left join fetch p.books", Person.class)
.getResultList();

authors.forEach( author -> {
    log.infof( "Author %s wrote %d books",
       author.getFirstName() + " " + author.getLastName(),
       author.getBooks().size()
    );
} );

When executing the query above, Hibernate generates the following output:

SELECT
    p.id as id1_1_0_,
    b.id as id1_0_1_,
    p.first_name as first_na2_1_0_,
    p.last_name as last_nam3_1_0_,
    b.author_id as author_i3_0_1_,
    b.title as title2_0_1_,
    b.author_id as author_i3_0_0__,
    b.id as id1_0_0__
FROM person p
LEFT OUTER JOIN book b ON p.id=b.author_id

-- Author Gavin King wrote 2 books
-- Author Gavin King wrote 2 books
-- Author Stephen King wrote 1 books
-- Author Vlad Mihalcea wrote 1 books

As we can see we have a duplicated entry for Gavin King. This is because we have 4 records in our ResultSet, the count being given by the number of books associated with the currently selected authors.

To remove duplicated entries, JPA offers the DISTINCT keyword:

List<Person> authors = entityManager.createQuery(
    "select distinct p " +
    "from Person p " +
    "left join fetch p.books", Person.class)
.getResultList();

authors.forEach( author -> {
    log.infof( "Author %s wrote %d books",
       author.getFirstName() + " " + author.getLastName(),
       author.getBooks().size()
    );
} );

This time, Hibernate generates the following output:

SELECT DISTINCT
    p.id as id1_1_0_,
    b.id as id1_0_1_,
    p.first_name as first_na2_1_0_,
    p.last_name as last_nam3_1_0_,
    b.author_id as author_i3_0_1_,
    b.title as title2_0_1_,
    b.author_id as author_i3_0_0__,
    b.id as id1_0_0__
FROM person p
LEFT OUTER JOIN book b ON p.id=b.author_id

-- Author Gavin King wrote 2 books
-- Author Stephen King wrote 1 books
-- Author Vlad Mihalcea wrote 1 books

There are no duplicates any more. Duplicates are removed by Hibernate when building the graph of entities, after the ResultSet is fetched from JDBC.

Spotting the problem

However, have you spot the DISTINCT keyword in the SQL query? In this particular case, DISTINCT does not bring any value since we don’t have duplicated entries in the JDBC ResultSet. Not only this is redundant, but DISTINCT can cause performance problems because the result set must be sorted by the query executor.

For instance, running the previous query on PostgreSQL generates the following execution plan:

PostgreSQL execution plan with DISTINCT

The HashAggregate stage is associated with sorting the result set and ensuring there is no duplicate entry. There’s a very good explanation of HashAggregate inner working on depesz.com. Basically, the result set is sorted so that duplicated rows come one after other, therefore a row is discarded if it’s identical to the previous record.

When running the EXPLAIN FORMAT=JSON against the same query using DISTINCT on MySQL, we get the following execution plan:

{
   "query_block": {
     "select_id": 1,
     "cost_info": {
       "query_cost": "5.02"
     },
     "duplicates_removal": {
       "using_temporary_table": true,
       "using_filesort": false,
       "nested_loop": [
         {
           "table": {
             "table_name": "p",
             "access_type": "ALL",
             "rows_examined_per_scan": 3,
             "rows_produced_per_join": 3,
             "filtered": "100.00",
             "cost_info": {
               "read_cost": "1.00",
               "eval_cost": "0.60",
               "prefix_cost": "1.60",
               "data_read_per_join": "4K"
             },
             "used_columns": [
               "id",
               "first_name",
               "last_name"
             ]
           }
         },
         {
           "table": {
             "table_name": "b",
             "access_type": "ALL",
             "possible_keys": [
               "FKi7lkcmacourlqkkn4uo1s4svl"
             ],
             "rows_examined_per_scan": 4,
             "rows_produced_per_join": 12,
             "filtered": "100.00",
             "using_join_buffer": "Block Nested Loop",
             "cost_info": {
               "read_cost": "1.02",
               "eval_cost": "2.40",
               "prefix_cost": "5.02",
               "data_read_per_join": "9K"
             },
             "used_columns": [
               "id",
               "title",
               "author_id"
             ],
           }
         }
       ]
     }
   }
}

When using MySQL 5.6.5 or later, you have the option of using the JSON EXPLAIN format, which provides lots of information compared to the TRADITIONAL EXPLAIN output.

You can easily spot the duplicates_removal and using_temporary_table attributes that are associated with the sorting phase incurred by the DISTINCT keyword. MySQL creates a temporary table to store the intermediate result set and apply a sorting algorithm to discover duplicates. MySQL offers a DISTINCT/GROUP BY optimization technique to avoid the temporary table creation, but that requires an index that features the same columns in the same exact order with the SELECT query. In our case, this is not very feasible.

The DISTINCT SQL keyword is not cost free. For this reason, it should only be used when we really need to filter out duplicated records from the statement result set.

Fixing the problem

That was the main reason for the HHH-10965 issue that was recently fixed, and which adds the following JPA-level Query Hint:

List<Person> authors = entityManager.createQuery(
    "select distinct p " +
    "from Person p " +
    "left join fetch p.books", Person.class)
.setHint( QueryHints.HINT_PASS_DISTINCT_THROUGH, false )
.getResultList();

The hibernate.query.passDistinctThrough hint tells Hibernate to avoid passing the DISTINCT keyword to the actual SQL query. Therefore, we get the following output:

SELECT
    p.id as id1_1_0_,
    b.id as id1_0_1_,
    p.first_name as first_na2_1_0_,
    p.last_name as last_nam3_1_0_,
    b.author_id as author_i3_0_1_,
    b.title as title2_0_1_,
    b.author_id as author_i3_0_0__,
    b.id as id1_0_0__
FROM person p
LEFT OUTER JOIN book b ON p.id=b.author_id

-- Author Gavin King wrote 2 books
-- Author Stephen King wrote 1 books
-- Author Vlad Mihalcea wrote 1 books

This way, the Person are de-duplicated by Hibernate while the SQL query features no useless DISTINCT keyword.

Now, the query plan for PostgreSQL becomes:

PostgreSQL execution plan without DISTINCT

There’s no longer a HashAggregate stage, meaning that the database is not doing any redundant result set sorting.

The query plan for MySQL looks like this:

{
   "query_block": {
     "select_id": 1,
     "cost_info": {
       "query_cost": "5.02"
     },
     "nested_loop": [
       {
         "table": {
           "table_name": "p",
           "access_type": "ALL",
           "rows_examined_per_scan": 3,
           "rows_produced_per_join": 3,
           "filtered": "100.00",
           "cost_info": {
             "read_cost": "1.00",
             "eval_cost": "0.60",
             "prefix_cost": "1.60",
             "data_read_per_join": "4K"
           },
           "used_columns": [
             "id",
             "first_name",
             "last_name"
           ]
         }
       },
       {
         "table": {
           "table_name": "b",
           "access_type": "ALL",
           "possible_keys": [
             "FKi7lkcmacourlqkkn4uo1s4svl"
           ],
           "rows_examined_per_scan": 4,
           "rows_produced_per_join": 12,
           "filtered": "100.00",
           "using_join_buffer": "Block Nested Loop",
           "cost_info": {
             "read_cost": "1.02",
             "eval_cost": "2.40",
             "prefix_cost": "5.02",
             "data_read_per_join": "9K"
           },
           "used_columns": [
             "id",
             "title",
             "author_id"
           ],
           "attached_condition": "<if>(is_not_null_compl(b), (`hibernate_orm_test`.`b`.`author_id` = `hibernate_orm_test`.`p`.`id`), true)"
         }
       }
     ]
   }
}

This fix is going to be available in Hibernate 5.2.2, so one more reason to upgrade!

If you enjoyed this article, you might be interested in reading my book as well.

Hibernate ORM 5.0.10.Final

Posted by    |       |    Tagged as Hibernate ORM Releases

Hibernate ORM 5.0.10.Final has just been tagged and published.

The complete list of changes can be found here (or here for people without a Hibernate Jira account).

For information on consuming the release via your favorite dependency-management-capable build tool, see http://hibernate.org/orm/downloads/

The release bundles can be obtained from: SourceForge or BinTray.

hibernate-orm-modules

As previously explained, the hibernate-orm-modules integrates the latest version of Hibernate into Wildfly 10.

While on Linux and OSX, testing this module works just fine, on Windows, we used to get the following test failure:

:hibernate-orm-modules:test

org.hibernate.wildfly.integrationtest.HibernateModulesOnWildflyTest > classMethod FAILED
    org.jboss.arquillian.container.spi.client.container.LifecycleException

1 test completed, 1 failed
:hibernate-orm-modules:test FAILED

Finding the culprit

Looking into test report, there is little clue about what’s going on:

org.jboss.arquillian.container.spi.client.container.LifecycleException: The server is already running! Managed containers do not support connecting to running server instances due to the possible harmful effect of connecting to the wrong server. Please stop server before running or change to another type of container.
To disable this check and allow Arquillian to connect to a running server, set allowConnectingToRunningServer to true in the container configuration
        at org.jboss.as.arquillian.container.managed.ManagedDeployableContainer.failDueToRunning(ManagedDeployableContainer.java:321)
        at org.jboss.as.arquillian.container.managed.ManagedDeployableContainer.startInternal(ManagedDeployableContainer.java:81)
        at org.jboss.as.arquillian.container.CommonDeployableContainer.start(CommonDeployableContainer.java:115)

The error message says that the application server is already running, so there can be two ways to explain this behavior:

  • either the set-up logic is flawed and the server is starting twice, which cannot be the case since tests run just fine on Linux and OSX.

  • one of the ports required by the application server is stolen by some other process, and Arquillian assumes we already started a server instance.

In this kind of situations, a good old debugging session can prove the actual culprit, and in our case, I could identify the problem in org.jboss.as.arquillian.container.managed.ManagedDeployableContainer:

private boolean isServerRunning() {
    Socket socket = null;
    try {
        socket = new Socket(
                getContainerConfiguration().getManagementAddress(),
                getContainerConfiguration().getManagementPort());
    } catch (Exception ignored) { // nothing is running on defined ports
        return false;
    } finally {
        if (socket != null) {
            try {
                socket.close();
            } catch (Exception e) {
                throw new RuntimeException("Could not close isServerStarted socket", e);
            }
        }
    }
    return true;
}

The Wildfly Arquillian Container trys to establish a socker to the address and the port given by the org.jboss.as.arquillian.container.CommonContainerConfiguration which, by default, uses teh following address and port info:

managementAddress = "127.0.0.1";
managementPort = 9990 + Integer.decode(System.getProperty("jboss.socket.binding.port-offset", "0"));

Without providing a jboss.socket.binding.port-offset property, the default port is 9990.

Let’s see who’s already using this port:

for /f "tokens=5" %a in ('netstat -ano ^| findstr 9990') do tasklist /fi "pid eq %a"

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
NvNetworkService.exe          2976 Services                   0      4,724 K

The NvNetworkService.exe process is the Network Service used by Nvidia, and it’s used for updating the Nvidia driver.

So, we have two options:

  • either we disable the NvNetworkService.exe process and then we’ll no longer get any driver update for our video card.

  • we use the jboss.socket.binding.port-offset property and then Wildfly is going to use a different port that’s hopefully not in use.

Fixing the port stealing issue

Because I don’t want to run outdated software on my computer and risk security issues, I’m going to choose the second option. After searching the list of available ports, I’m going to use the port 10127, which doesn’t seem to be allocated to any software vendor. That being said, the offset value should be 137.

To provide the jboss.socket.binding.port-offset System property, we have to do two changes:

  1. The arquillian.xml should be changed as follows:

    <group qualifier="Grid" default="true">
        <container qualifier="container.active-1" mode="suite" default="true">
            <configuration>
                <property name="jbossHome">
                    ${buildDir}/wildfly-${wildflyVersion}
                </property>
                <property name="javaVmArguments">
                    -Djava.net.preferIPv4Stack=true
                    -Djgroups.bind_addr=127.0.0.1
                    -Djboss.socket.binding.port-offset=137
                </property>
            </configuration>
        </container>
    </group>
  2. The gradle build script needs to provide the jboss.socket.binding.port-offset System property as well:

    test {
        systemProperties['jboss.socket.binding.port-offset'] = 137
    }

That’s it. Now the tests run just fine on Windows too.

Hibernate Community Newsletter 15/2016

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Articles

Null and not-null @DiscriminatorValue options

Posted by    |       |    Tagged as Discussions Hibernate ORM

Inheritance and discriminator columns

Although it can be used for JOINED table inheritance, the @DiscriminatorValue is more common for SINGLE_TABLE inheritance. For SINGLE_TABLE, the discriminator column tells Hibernate the subclass entity type associated with each particular database row.

Without specifying a discriminator column, Hibernate is going to use the default DTYPE column. To visualize how it works, consider the following Domain Model inheritance hierarchy:

@Entity(name = "Account")
@Inheritance(strategy = InheritanceType.SINGLE_TABLE)
public static class Account {

    @Id
    private Long id;

    private String owner;

    private BigDecimal balance;

    private BigDecimal interestRate;

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public String getOwner() {
        return owner;
    }

    public void setOwner(String owner) {
        this.owner = owner;
    }

    public BigDecimal getBalance() {
        return balance;
    }

    public void setBalance(BigDecimal balance) {
        this.balance = balance;
    }

    public BigDecimal getInterestRate() {
        return interestRate;
    }

    public void setInterestRate(BigDecimal interestRate) {
        this.interestRate = interestRate;
    }
}

@Entity(name = "DebitAccount")
public static class DebitAccount extends Account {

    private BigDecimal overdraftFee;

    public BigDecimal getOverdraftFee() {
        return overdraftFee;
    }

    public void setOverdraftFee(BigDecimal overdraftFee) {
        this.overdraftFee = overdraftFee;
    }
}

@Entity(name = "CreditAccount")
public static class CreditAccount extends Account {

    private BigDecimal creditLimit;

    public BigDecimal getCreditLimit() {
        return creditLimit;
    }

    public void setCreditLimit(BigDecimal creditLimit) {
        this.creditLimit = creditLimit;
    }
}

For this mode, Hibernate generates the following database table:

create table Account (
    DTYPE varchar(31) not null,
    id bigint not null,
    balance decimal(19,2),
    interestRate decimal(19,2),
    owner varchar(255),
    overdraftFee decimal(19,2),
    creditLimit decimal(19,2),
    primary key (id)
)

So when inserting two subclass entities:

DebitAccount debitAccount = new DebitAccount();
debitAccount.setId( 1L );
debitAccount.setOwner( "John Doe" );
debitAccount.setBalance( BigDecimal.valueOf( 100 ) );
debitAccount.setInterestRate( BigDecimal.valueOf( 1.5d ) );
debitAccount.setOverdraftFee( BigDecimal.valueOf( 25 ) );

CreditAccount creditAccount = new CreditAccount();
creditAccount.setId( 2L );
creditAccount.setOwner( "John Doe" );
creditAccount.setBalance( BigDecimal.valueOf( 1000 ) );
creditAccount.setInterestRate( BigDecimal.valueOf( 1.9d ) );
creditAccount.setCreditLimit( BigDecimal.valueOf( 5000 ) );

Hibernate will populate the DTYPE column with the subclass class name:

INSERT INTO Account (balance, interestRate, owner, overdraftFee, DTYPE, id)
VALUES (100, 1.5, 'John Doe', 25, 'DebitAccount', 1)

INSERT INTO Account (balance, interestRate, owner, creditLimit, DTYPE, id)
VALUES (1000, 1.9, 'John Doe', 5000, 'CreditAccount', 2)

While this is rather straightforward for most use cases, when having to integrate a legacy database schema, it might be that the discriminator column contains NULL(s) or some values that are not associated to any entity subclass.

Consider that our database contains records like these:

INSERT INTO Account (DTYPE, balance, interestRate, owner, id)
VALUES (NULL, 300, 0.9, 'John Doe', 3)

INSERT INTO Account (DTYPE, active, balance, interestRate, owner, id)
VALUES ('Other', true, 25, 0.5, 'Johnny Doe', 4)

INSERT INTO Account (DTYPE, active, balance, interestRate, owner, id)
VALUES ('Unsupported', false, 35, 0.6, 'John Doe Jr.', 5)

With the previous mappings, when trying to fetch all Account(s):

Map<Long, Account> accounts = entityManager.createQuery(
        "select a from Account a", Account.class )
.getResultList()
.stream()
.collect( Collectors.toMap( Account::getId, Function.identity()));

We’d bump into the following kind of issues:

org.hibernate.WrongClassException: Object [id=3] was not of the specified subclass
[org.hibernate.userguide.inheritance.Account] : Discriminator: null

org.hibernate.WrongClassException: Object [id=4] was not of the specified subclass
[org.hibernate.userguide.inheritance.Account] : Discriminator: Other

org.hibernate.WrongClassException: Object [id=5] was not of the specified subclass
[org.hibernate.userguide.inheritance.Account] : Discriminator: Unsupported

Fortunately, Hibernate allows us to handle these mappings by using NULL and NOT NULL discriminator value mapping.

For the NULL values, we can annotate the base class Account entity as follows:

@Entity(name = "Account")
@Inheritance(strategy = InheritanceType.SINGLE_TABLE)
@DiscriminatorValue( "null" )
public static class Account {

    @Id
    private Long id;

    private String owner;

    private BigDecimal balance;

    private BigDecimal interestRate;

    // Getter and setter omitted for brevity
}

For the Other and Unsupported discriminator values, we can have a miscellaneous entity that handles all values that were not explicitly mapped:

@Entity(name = "MiscAccount")
@DiscriminatorValue( "not null" )
public static class MiscAccount extends Account {

    private boolean active;

    public boolean isActive() {
        return active;
    }

    public void setActive(boolean active) {
        this.active = active;
    }
}

This way, the aforementioned polymorphic query works and we can even validate the results:

assertEquals(5, accounts.size());
assertEquals( DebitAccount.class, accounts.get( 1L ).getClass() );
assertEquals( CreditAccount.class, accounts.get( 2L ).getClass() );
assertEquals( Account.class, accounts.get( 3L ).getClass() );
assertEquals( MiscAccount.class, accounts.get( 4L ).getClass() );
assertEquals( MiscAccount.class, accounts.get( 5L ).getClass() );

I have also updated the Hibernate 5.0, 5.1, and 5.2 documentations with these two very useful mapping options.

How we fixed all database connection leaks

Posted by    |       |    Tagged as Discussions Hibernate ORM

The context

By default, all Hibernate tests are run on H2. However, we have a lots of database-specific tests as well, so we should be testing on Oracle, PostgreSQL, MySQL, and possibly SQL Server as well.

When we tried to set up a Jenkins job that uses PostgreSQL, we realized that the job fails because we ran out of connections. Knowing that the PostgreSQL server has a max_connections setting of 30, we realized the connection leak issue was significant.

Needle in a haystack

Just the hibernate-core module alone has over 5000 tests, and hibernate-envers has around 2500 tests as well. But there are many mode modules: hibernate-c3p0, hibernate-ehcache, hibernate-jcache, etc. All in all, we couldn’t just browse the code and spot issues. We needed an automated connection leak detector.

That being said, I came up with a solution that works on H2, Oracle, PostgreSQL, and MySQL as well. Luckily, no problem was spotted in the actual framework code base. All issues were caused by unit tests which did not handle database resources properly.

The most common issues

One of the most widespread issue was caused by improper bootstrapping logic:

@Test
public void testInvalidMapping() {
    try {
        new MetadataSources( )
                .addAnnotatedClass( TheEntity.class )
                .buildMetadata();
        fail( "Was expecting failure" );
    }
    catch (AnnotationException ignore) {
    }
}

The issue here is that MetadataSources creates a BootstrapServiceRegistry behind the scenes, which in turn triggers the initialization of the underlying ConnectionProvider. Without closing the BootstrapServiceRegistry explicitly, the ConnectionProvider will not get a chance to close all the currently pooled JDBC Connection(s).

The fix is as simple as that:

@Test
public void testInvalidMapping() {
    MetadataSources metadataSources = new MetadataSources( )
        .addAnnotatedClass( TheEntity.class );
    try {
        metadataSources.buildMetadata();
        fail( "Was expecting failure" );
    }
    catch (AnnotationException ignore) {
    }
    finally {
        ServiceRegistry metaServiceRegistry = metadataSources.getServiceRegistry();
        if(metaServiceRegistry instanceof BootstrapServiceRegistry ) {
            BootstrapServiceRegistryBuilder.destroy( metaServiceRegistry );
        }
    }
}

Another recurring issue was improper transaction handling such as in the following example:

protected void cleanup() {
    Session s = getFactory().openSession();
    s.beginTransaction();

    TestEntity testEntity = s.get( TestEntity.class, "foo" );
    Assert.assertTrue( testEntity.getParams().isEmpty() );

    TestOtherEntity testOtherEntity = s.get( TestOtherEntity.class, "foo" );
    Assert.assertTrue( testOtherEntity.getParams().isEmpty() );

    s.getTransaction().commit();
    s.clear();
    s.close();
}

The first thing to notice is the lack of a try/finally block which should be closing the session even if there is an exception being thrown. But that’s not all.

Not a long time ago, I had fixed HHH-7412, meaning that, for RESOURCE_LOCAL (e.g. JDBC Connection-bound transactions), the logical or physical Connection is closed only when the transaction is ended (either commit or rollback).

Before HHH-7412 was fixed, the Connection was closed automatically when the Hibernate Session was closed as well, but this behavior is not supported anymore. Nowadays, aside from closing the underlying Session, you have to commit/rollback the current running Transaction as well:

protected void cleanup() {
    Session s = getFactory().openSession();
    s.beginTransaction();

    try {
        TestEntity testEntity = s.get( TestEntity.class, "foo" );
        Assert.assertTrue( testEntity.getParams().isEmpty() );

        TestOtherEntity testOtherEntity = s.get( TestOtherEntity.class, "foo" );
        Assert.assertTrue( testOtherEntity.getParams().isEmpty() );

        s.getTransaction().commit();
    }
    catch ( RuntimeException e ) {
        s.getTransaction().rollback();
        throw e;
    }
    finally {
        s.close();
    }
}

If you are curious of all the changes that were required, you can check the following two commits: da9c6e1 and f5e10c2. The good news is that the PostgreSQL job is running fine now, and soon we will add jobs for Oracle and a MySQL too.

back to top