Red Hat

In Relation To Hibernate ORM

In Relation To Hibernate ORM

Hibernate Community Newsletter 19/2016

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Articles

Michael Simons wrote a very good article about using Hibernate Search with Elasticsearch on Pivotal Cloud Foundry.

On our blog, I wrote an article about configuring Hibernate to use a custom timezone (e.g. UTC) that’s different than the default JVM one.

For our Russian-speaking readers, Николай Алименков has written a very good post about Hibernate usage scenarios. Nikolai is one of the organizers of JEEConf, and together with Igor Dmitriev, they gave several presentations about Hibernate performance tuning.

If you want to learn what is the best way to handle the infamous LazyInitializationException, check out this article.

Alejandro Gervasio wrote a very good article, Persisting Java Objects the Easy Way on SitePoint.

Hibernate offers a very various ways to load entity attributes lazily. Check out this article for more details.

Thomas Kratz has written a very article about using Hibernate JSON types with Kotlin.

When developing Hibernate, we are using the hibernate-testing module to simplify Session/Transaction management, as described in this article. The hibernate-testing library is available on Maven Central, so it’s very easy to make use of it in your project. Happy testing!

Presentations

Patrycja Wegrzynowicz gave a very interesting presentation at Java One about Second-level Caching. The presentation goes from defining what response time is and how you can level it up using caching. At the end of the presentation, Patrycja discusses how Ehacache and Infinispan implement the Hibernate second-level caching contract.

Hibernate ORM 5.0.11.Final and 5.1.2.Final released

Posted by    |       |    Tagged as Hibernate ORM Releases

Hibernate ORM 5.0.11.Final:

Hibernate ORM 5.1.2.Final:

For information on consuming the release via your favorite dependency-management-capable build tool, see http://hibernate.org/orm/downloads/

Introduction

Hibernate 5.2 has migrated to Java 1.8. In this article, I’m going to show you how easily you can now test JPA logic using Java 1.8 lambdas.

Integration testing

Hibernate has thousands of integration tests, and each unit test runs in isolation. Traditionally, every test required to open an EntityManager, as well as to coordinate the underlying database transaction by calling begin, commit, and rollback.

EntityManager entityManager = getOrCreateEntityManager();
entityManager.getTransaction().begin();
try {
    entityManager.persist( item );
    assertTrue( entityManager.contains( item ) );
    entityManager.getTransaction().commit();
}
catch (Exception e){
    if ( entityManager.getTransaction() != null &&
         entityManager.getTransaction().isActive() ) {
        entityManager.getTransaction().rollback();
    }
    throw e;
}
finally {
    entityManager.close();
}

That’s very verbose because we need to ensure that the EntityManager always gets closed so that connections are released back to the pool. More, the database transaction must be rolled back on every failure as otherwise locks might be held, therefore preventing a schema-drop process.

For this reason, we decided to extract the whole EntityManager and JPA transaction management logic into a common utility class:

import static org.hibernate.testing.transaction.TransactionUtil.*;

What’s great about these utilities is that you don’t even need to create them. We’ve got you covered!

You only have to add the following dependency to your Maven pom.xml:

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-testing</artifactId>
    <version>5.2.2.Final</version>
</dependency>

Therefore, the previous test case is reduced to four lines of code:

doInJPA( this::entityManagerFactory, entityManager -> {
    entityManager.persist( item );
    assertTrue( entityManager.contains( item ) );
} );

Brilliant!

The aforementioned example relies on the presence of the entityManagerFactory() method that returns an EntityManagerFactory instance.

If you prefer the Hibernate native API, you can do it as follows:

doInHibernate( this::sessionFactory, session -> {
    session.persist( item );
    assertTrue( session.contains( item ) );
} );

Analogous to the previous example, a sessionFactory() method is assumed to be existing in the test class. However, you can name these methods any way you like and just make sure to update the first argument of the doInJPA or doInHibernate methods.

Behind the scenes

If you’re interested in the underlying implementations, then check out the source code on GitHub.

Enjoy testing!

Introduction

When it comes to time zones, there is no easy way of handling timestamps generated across the globe. Prior to Java 1.8, the Date and Calendar API was not helping this either.

While the java.time package allows better handling of Date/Time objects when it comes to Relational Databases in general and JDBC in particular, the API is still very much based on the dreadful java.util.Calendar.

In this post, I’m going to unravel a new configuration property that we added so that we can better handle timestamps at the JDBC level.

The problem

Let’s assume we have the following JPA entity:

@Entity(name = "Person")
public class Person {

    @Id
    private Long id;

    private Timestamp createdOn;
}

Since our application is available over the Internet, it is much simpler if every user saves all timestamp values in UTC format. This convention is very common

When Alice, who’s living in Los Angeles, inserts the following Person entity into the database:

doInHibernate( this::sessionFactory, session -> {
    Person person = new Person();
    person.id = 1L;

    //Y2K - 946684800000L
    long y2kMillis = LocalDateTime.of( 2000, 1, 1, 0, 0, 0 )
        .atZone( ZoneId.of( "UTC" ) )
        .toInstant()
        .toEpochMilli();
    assertEquals(946684800000L, y2kMillis);

    person.createdOn = new Timestamp(y2kMillis);
    session.persist( person );
} );

Hibernate executes the following INSERT statement:

INSERT INTO Person (createdOn, id)
VALUES (?, ?)

-- binding parameter [1] as [TIMESTAMP] - [1999-12-31 16:00:00.0]
-- binding parameter [2] as [BIGINT]    - [1]

The timestamp value is set as 1999-12-31 16:00:00.0, and that’s what we get when we query it from the database:

s.doWork( connection -> {
    try (Statement st = connection.createStatement()) {
        try (ResultSet rs = st.executeQuery(
                "SELECT to_char(createdon, 'YYYY-MM-DD HH24:MI:SS.US') " +
                "FROM person" )) {
            while ( rs.next() ) {
                assertEquals(
                    "1999-12-31 16:00:00.000000",
                    rs.getString( 1 )
                );
            }
        }
    }
} );

What’s just happened?

Because Alice’s time zone, Pacific Standard Time, is 8 hours behind UTC (UTC-8), the timestamp value was transposed in the local JVM timezone. But why?

To answer this question, we have to first check how Hibernate saves the underlying timestamp value inside Hibernate 5.2.2 TimestampTypeDescriptor:

st.setTimestamp( index, timestamp );

If we take a look in the PostgreSQL JDBC Driver version 9.4.1210.jre7 (Sep. 2016) PreparedStatement.setTimestamp() implementation, we are going to find the following logic:

public void setTimestamp(int parameterIndex, Timestamp x)
    throws SQLException {
    setTimestamp(parameterIndex, x, null);
}

public void setTimestamp(int i, Timestamp t, java.util.Calendar cal)
    throws SQLException {
    checkClosed();

    if (t == null) {
      setNull(i, Types.TIMESTAMP);
      return;
    }

    int oid = Oid.UNSPECIFIED;
    if (t instanceof PGTimestamp) {
      PGTimestamp pgTimestamp = (PGTimestamp) t;
      if (pgTimestamp.getCalendar() == null) {
        oid = Oid.TIMESTAMP;
      } else {
        oid = Oid.TIMESTAMPTZ;
        cal = pgTimestamp.getCalendar();
      }
    }
    if (cal == null) {
      cal = getDefaultCalendar();
    }
    bindString(i, connection.getTimestampUtils().toString(cal, t), oid);
}

So, if there is no Calendar being passed, the following default Calendar is going to be used:

private Calendar getDefaultCalendar() {
    TimestampUtils timestampUtils = connection.getTimestampUtils();

    if (timestampUtils.hasFastDefaultTimeZone()) {
      return timestampUtils.getSharedCalendar(null);
    }
    Calendar sharedCalendar = timestampUtils
        .getSharedCalendar(defaultTimeZone);
    if (defaultTimeZone == null) {
      defaultTimeZone = sharedCalendar.getTimeZone();
    }
    return sharedCalendar;
}

So, unless we are providing a default java.util.Calendar, PostgreSQL is going to use a default one, which falls back to the underlying JVM time zone.

A workaround

Traditionally, to overcome this issue, the JVM time zone should be set to UTC:

Either declaratively:

java -Duser.timezone=UTC ...

or programmatically:

TimeZone.setDefault( TimeZone.getTimeZone( "UTC" ) );

If the JVM time zone is set to UTC, Hibernate is going to execute the following insert statement:

INSERT INTO Person (createdOn, id)
VALUES (?, ?)

-- binding parameter [1] as [TIMESTAMP] - [2000-01-01 00:00:00.0]
-- binding parameter [2] as [BIGINT]    - [1]

The same is true when fetching the timestamp value from the database:

s.doWork( connection -> {
    try (Statement st = connection.createStatement()) {
        try (ResultSet rs = st.executeQuery(
                "SELECT to_char(createdon, 'YYYY-MM-DD HH24:MI:SS.US') " +
                "FROM person" )) {
            while ( rs.next() ) {
                String timestamp = rs.getString( 1 );
                assertEquals("2000-01-01 00:00:00.000000", timestamp);
            }
        }
    }
} );

Unfortunately, sometimes we cannot change the default time zone of the JVM because the UI requires it to render UTC-based timestamps into the user-specific locale and current time zone.

The JDBC time zone setting

Starting from Hibernate 5.2.3, you’ll be able to provide a JDBC-level time zone so that you don’t have to change the default JVM setting.

This is done via the hibernate.jdbc.time_zone SessionFactory-level configuration property:

settings.put(
    AvailableSettings.JDBC_TIME_ZONE,
    TimeZone.getTimeZone( "UTC" )
);

Once set, Hibernate is going to call the following JDBC PreparedStatement.setTimestamp() method which takes a specific Calendar instance.

Now, when executing the insert statement, Hibernate is going to log the following query parameters:

INSERT INTO Person (createdOn, id)
VALUES (?, ?)

-- binding parameter [1] as [TIMESTAMP] - [1999-12-31 16:00:00.0]
-- binding parameter [2] as [BIGINT]    - [1]

This is expected since the java.sql.Timestamp uses Alice’s JVM Calendar (e.g. Los Angeles) to display the underlying date/time value. When fetching the actual timestamp value from the database, we can see that the UTC value was actually saved:

s.doWork( connection -> {
    try (Statement st = connection.createStatement()) {
        try (ResultSet rs = st.executeQuery(
                "SELECT " +
                "   to_char(createdon, 'YYYY-MM-DD HH24:MI:SS.US') " +
                "FROM person" )) {
            while ( rs.next() ) {
                String timestamp = rs.getString( 1 );
                assertEquals("2000-01-01 00:00:00.000000", timestamp);
            }
        }
    }
} );

You can even override this setting on a per Session level:

Session session = sessionFactory()
    .withOptions()
    .jdbcTimeZone( TimeZone.getTimeZone( "UTC" ) )
    .openSession();

Since many applications tend to use the same time zone (usually UTC) when storing timestamps, this change is going to be very useful, especially for front-end nodes which need to retain the default JVM time zone for UI rendering.

Hibernate Community Newsletter 18/2016

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Videos

Both Gunnar and I have participated in the wonderful JavaZone conference.

Gunnar’s presentation outlines the benefits of Hibernate Search and demonstrates how easily you can propagate JPA entity changes to ElasticSearch.

My High-Performance Hibernate presentation shows how you can run Hibernate ORM at warp-speed.

Articles

The pick for this newsletter is Michael Simons' article about Hibernate Search. In this blog post, Michael uses Hibernate Search to index his tweets and then demonstrates the power of the Fulltext-Search Query DSL.

Because Concurrency Control is very important for every enterprise application, Hibernate offers both optimistic and pessimistic locking right out of the box. However, sometimes you might need to coordinate the child entity state with a common parent entity, in which case, you need to provide some custom entity listeners. This article demonstrates how you can trigger a parent entity version change whenever a child entity is added, removed or even modified.

The Vertabelo’s blog features an article about Composite Primary Keys and how to map them using Hibernate and jOOQ. The article makes a remark which might confuse the reader into thinking that Hibernate didn’t support Database-First mapping:

Hibernate is a Java object-relational mapper that follows the “Object-First” approach. This means that the appropriate database structures are generated based on the Java code.

Not only Hibernate can be used to generate a database schema from the entity mapping, but in reality, you want to have the database schema managed by an automatic migration tool such a FlywayDB.

Thorben Janssen shows you how you can count queries using Hibernate Statistics. You can take this idea one step further and monitor to any SQL statement type (e.g. insert, update, delete), and you can even build a tool that automatically discovers N+1 statements, as demonstrated in this blog post.

I also wrote an article which aims to warn the reader about the dangers of using the hibernate.enable_lazy_load_no_trans configuration property.

Time to upgrade

Hibernate Search is getting closer and closer to releasing the Final release of the ElasticSearch integration. Now, it’s the right time to shape this new API, so we’d really love you give us feedback on this topic.

Hibernate Validator 5.3.0.CR1 is out, so feel free to try it out and tell us what you think.

Hibernate Community Newsletter 17/2016

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Articles

Fitbit is using Hibernate for data persistence. On their engineering blog, they published an article about connection provider instrumentation. Although this is a very clever solution, in case you need a solution for monitoring connection pool usage, I’d suggest using FlexyPool instead.

I’ve run a survey on my Twitter account to find out the JPA provider market share in 2016. Just like in 2014 and 2015, Hibernate is leading by a very large margin. Thanks for choosing Hibernate, and stay tuned for even more great features.

You can use the Fluent Interface pattern with Hibernate and JPA. Check out this post for more details.

Anghel Leonard has added comprehensive list of test cases for both Hibernate ORM and OGM on his blog and GitHub repository.

Thorben Janssen wrote a series of articles related to:

Hibernate has great support for concurrency control. In this article, you can find out how you can increment a root entity version whenever any child entity is being added/removed or even modified.

Hibernate ORM 5.1.1.Final

Posted by    |       |    Tagged as Hibernate ORM Releases

Hibernate ORM 5.1.1.Final has just been tagged and published.

The complete list of changes can be found here (or here for people without a Hibernate Jira account).

For information on consuming the release via your favorite dependency-management-capable build tool, see http://hibernate.org/orm/downloads/

The release bundles can be obtained from: SourceForge or BinTray.

Hibernate Community Newsletter 16/2016

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Articles

Hibernate OGM basics with MongoDB is a short tutorial that demonstrates how easy you can persist JPA entities into MongoDB.

The Java In Code blog offers a very good article about the benefits of the Hibernate bootstrap mechanism.

If you’re running Windows and you happen to have an Nvidia Video Card, you might get into problems when running Wildfly with Arquillian. This blog post shows you how you can tackle this issue.

Anghel Leonard wrote a series of articles related to Hibernate:

I’ve noticed many questions related to mapping Composite Keys on our forum and StackOverflow. For this reason, I decided to write an article about the best way to map a Composite Key with JPA and Hibernate.

In Hibernate 5.2.2, we added an improvement to the way DISTINCT is handled to improve performance. If you’re interested in this topic, check out the article on our blog.

Debezium is a Change Data Capture tool which scans the database transaction log and extracts all data changes. For more on this topic, check out this post about capturing changes from MySQL.

Projects

Mihai Costin developed a library for using the Hibernate second level cache mechanism with Memcached. If your application already uses Memcached, it makes sense to store the second level cache entries in the same distributed caching system. This library is hosted on GitHub and available on Maven Central, so you can easily grab it and give it a try.

Time to upgrade

We have released two versions of Hibernate ORM: 5.0.10 and 5.2.2.

Second bug-fix release for ORM 5.2

Posted by    |       |    Tagged as Hibernate ORM Releases

The Second bug-fix release for Hibernate ORM 5.2 has just been published. It is tagged at https://github.com/hibernate/hibernate-orm/releases/tag/5.2.2

The complete list of changes can be found here (or here for people without a Hibernate Jira account).

For information on consuming the release via your favorite dependency-management-capable build tool, see http://hibernate.org/orm/downloads/

The release bundles can be obtained from SourceForge or BinTray.

Introduction

When it comes to optimizing queries, you should listen to Markus Winand. While reading this great article about Nested Loops, I stumbled on the following remark:

The query has the expected left join but also an unnecessary distinct keyword. Unfortunately, JPA does not provide separate API calls to filter duplicated parent entries without de-duplicating the child records as well. The distinct keyword in the SQL query is alarming because most databases will actually filter duplicate records. Only a few databases recognize that the primary keys guarantees uniqueness in that case anyway.

— Markus Winand
Nested Loops

Markus is right. Not only that the DISTINCT keyword is redundant, but it’s affecting performance as well. Because we want Hibernate to be a solution for high-performance data access systems, I created the HHH-10965 Jira issue which is fixed in Hibernate 5.2.2. This article is going to demonstrate the redundant DISTINCT problem, and how you can overcome it in Hibernate 5.2.2.

Domain Model

For test sake, let’s assume that we want to develop a book store application. The two most important entities within our system are the Book and the Person who’s written the book.

To simplify our design, we’ll assume that every Book is written by a single author, although in reality, this might not always be the case. Therefore, the Domain Model looks like this:

@Entity(name = "Person") @Table( name = "person")
public class Person {

    @Id @GeneratedValue
    private Long id;

    @Column(name = "first_name")
    private String firstName;

    @Column(name = "last_name")
    private String lastName;

    @OneToMany(mappedBy = "author", cascade = CascadeType.ALL)
    private List<Book> books = new ArrayList<>(  );

    public Person() {}

    public Person(String firstName, String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }

    public void addBook(Book book) {
        books.add( book );
        book.setAuthor( this );
    }

    //Getters and setters omitted for brevity
}

@Entity(name = "Book") @Table( name = "book")
public class Book {

    @Id @GeneratedValue
    private Long id;

    private String title;

    @ManyToOne
    private Person author;

    public Book() {}

    public Book(String title) {
        this.title = title;
    }

    //Getters and setters omitted for brevity
}

Every Book has an author property indicating the Person who wrote it.

Using DISTINCT

Supposing we need to select a particular author, we might want to provide a list of last names so that the user can trim down the author list. This can be done easily with the following query:

List<String> lastNames = entityManager.createQuery(
    "select distinct p.lastName " +
    "from Person p", String.class)
.getResultList();

assertTrue(
    lastNames.size() == 2 &&
    lastNames.contains( "King" ) &&
    lastNames.contains( "Mihalcea" )
);

As expected, Hibernate generates the following SQL query:

SELECT DISTINCT
    p.last_name as col_0_0_
FROM person p

In this context, the use of DISTINCT is understandable since we want duplicates to be removed.

So far, so good, But, let’s now assume we want to fetch all Person entities along with their Book(s):

List<Person> authors = entityManager.createQuery(
    "select p " +
    "from Person p " +
    "left join fetch p.books", Person.class)
.getResultList();

authors.forEach( author -> {
    log.infof( "Author %s wrote %d books",
       author.getFirstName() + " " + author.getLastName(),
       author.getBooks().size()
    );
} );

When executing the query above, Hibernate generates the following output:

SELECT
    p.id as id1_1_0_,
    b.id as id1_0_1_,
    p.first_name as first_na2_1_0_,
    p.last_name as last_nam3_1_0_,
    b.author_id as author_i3_0_1_,
    b.title as title2_0_1_,
    b.author_id as author_i3_0_0__,
    b.id as id1_0_0__
FROM person p
LEFT OUTER JOIN book b ON p.id=b.author_id

-- Author Gavin King wrote 2 books
-- Author Gavin King wrote 2 books
-- Author Stephen King wrote 1 books
-- Author Vlad Mihalcea wrote 1 books

As we can see we have a duplicated entry for Gavin King. This is because we have 4 records in our ResultSet, the count being given by the number of books associated with the currently selected authors.

To remove duplicated entries, JPA offers the DISTINCT keyword:

List<Person> authors = entityManager.createQuery(
    "select distinct p " +
    "from Person p " +
    "left join fetch p.books", Person.class)
.getResultList();

authors.forEach( author -> {
    log.infof( "Author %s wrote %d books",
       author.getFirstName() + " " + author.getLastName(),
       author.getBooks().size()
    );
} );

This time, Hibernate generates the following output:

SELECT DISTINCT
    p.id as id1_1_0_,
    b.id as id1_0_1_,
    p.first_name as first_na2_1_0_,
    p.last_name as last_nam3_1_0_,
    b.author_id as author_i3_0_1_,
    b.title as title2_0_1_,
    b.author_id as author_i3_0_0__,
    b.id as id1_0_0__
FROM person p
LEFT OUTER JOIN book b ON p.id=b.author_id

-- Author Gavin King wrote 2 books
-- Author Stephen King wrote 1 books
-- Author Vlad Mihalcea wrote 1 books

There are no duplicates any more. Duplicates are removed by Hibernate when building the graph of entities, after the ResultSet is fetched from JDBC.

Spotting the problem

However, have you spot the DISTINCT keyword in the SQL query? In this particular case, DISTINCT does not bring any value since we don’t have duplicated entries in the JDBC ResultSet. Not only this is redundant, but DISTINCT can cause performance problems because the result set must be sorted by the query executor.

For instance, running the previous query on PostgreSQL generates the following execution plan:

PostgreSQL execution plan with DISTINCT

The HashAggregate stage is associated with sorting the result set and ensuring there is no duplicate entry. There’s a very good explanation of HashAggregate inner working on depesz.com. Basically, the result set is sorted so that duplicated rows come one after other, therefore a row is discarded if it’s identical to the previous record.

When running the EXPLAIN FORMAT=JSON against the same query using DISTINCT on MySQL, we get the following execution plan:

{
   "query_block": {
     "select_id": 1,
     "cost_info": {
       "query_cost": "5.02"
     },
     "duplicates_removal": {
       "using_temporary_table": true,
       "using_filesort": false,
       "nested_loop": [
         {
           "table": {
             "table_name": "p",
             "access_type": "ALL",
             "rows_examined_per_scan": 3,
             "rows_produced_per_join": 3,
             "filtered": "100.00",
             "cost_info": {
               "read_cost": "1.00",
               "eval_cost": "0.60",
               "prefix_cost": "1.60",
               "data_read_per_join": "4K"
             },
             "used_columns": [
               "id",
               "first_name",
               "last_name"
             ]
           }
         },
         {
           "table": {
             "table_name": "b",
             "access_type": "ALL",
             "possible_keys": [
               "FKi7lkcmacourlqkkn4uo1s4svl"
             ],
             "rows_examined_per_scan": 4,
             "rows_produced_per_join": 12,
             "filtered": "100.00",
             "using_join_buffer": "Block Nested Loop",
             "cost_info": {
               "read_cost": "1.02",
               "eval_cost": "2.40",
               "prefix_cost": "5.02",
               "data_read_per_join": "9K"
             },
             "used_columns": [
               "id",
               "title",
               "author_id"
             ],
           }
         }
       ]
     }
   }
}

When using MySQL 5.6.5 or later, you have the option of using the JSON EXPLAIN format, which provides lots of information compared to the TRADITIONAL EXPLAIN output.

You can easily spot the duplicates_removal and using_temporary_table attributes that are associated with the sorting phase incurred by the DISTINCT keyword. MySQL creates a temporary table to store the intermediate result set and apply a sorting algorithm to discover duplicates. MySQL offers a DISTINCT/GROUP BY optimization technique to avoid the temporary table creation, but that requires an index that features the same columns in the same exact order with the SELECT query. In our case, this is not very feasible.

The DISTINCT SQL keyword is not cost free. For this reason, it should only be used when we really need to filter out duplicated records from the statement result set.

Fixing the problem

That was the main reason for the HHH-10965 issue that was recently fixed, and which adds the following JPA-level Query Hint:

List<Person> authors = entityManager.createQuery(
    "select distinct p " +
    "from Person p " +
    "left join fetch p.books", Person.class)
.setHint( QueryHints.HINT_PASS_DISTINCT_THROUGH, false )
.getResultList();

The hibernate.query.passDistinctThrough hint tells Hibernate to avoid passing the DISTINCT keyword to the actual SQL query. Therefore, we get the following output:

SELECT
    p.id as id1_1_0_,
    b.id as id1_0_1_,
    p.first_name as first_na2_1_0_,
    p.last_name as last_nam3_1_0_,
    b.author_id as author_i3_0_1_,
    b.title as title2_0_1_,
    b.author_id as author_i3_0_0__,
    b.id as id1_0_0__
FROM person p
LEFT OUTER JOIN book b ON p.id=b.author_id

-- Author Gavin King wrote 2 books
-- Author Stephen King wrote 1 books
-- Author Vlad Mihalcea wrote 1 books

This way, the Person are de-duplicated by Hibernate while the SQL query features no useless DISTINCT keyword.

Now, the query plan for PostgreSQL becomes:

PostgreSQL execution plan without DISTINCT

There’s no longer a HashAggregate stage, meaning that the database is not doing any redundant result set sorting.

The query plan for MySQL looks like this:

{
   "query_block": {
     "select_id": 1,
     "cost_info": {
       "query_cost": "5.02"
     },
     "nested_loop": [
       {
         "table": {
           "table_name": "p",
           "access_type": "ALL",
           "rows_examined_per_scan": 3,
           "rows_produced_per_join": 3,
           "filtered": "100.00",
           "cost_info": {
             "read_cost": "1.00",
             "eval_cost": "0.60",
             "prefix_cost": "1.60",
             "data_read_per_join": "4K"
           },
           "used_columns": [
             "id",
             "first_name",
             "last_name"
           ]
         }
       },
       {
         "table": {
           "table_name": "b",
           "access_type": "ALL",
           "possible_keys": [
             "FKi7lkcmacourlqkkn4uo1s4svl"
           ],
           "rows_examined_per_scan": 4,
           "rows_produced_per_join": 12,
           "filtered": "100.00",
           "using_join_buffer": "Block Nested Loop",
           "cost_info": {
             "read_cost": "1.02",
             "eval_cost": "2.40",
             "prefix_cost": "5.02",
             "data_read_per_join": "9K"
           },
           "used_columns": [
             "id",
             "title",
             "author_id"
           ],
           "attached_condition": "<if>(is_not_null_compl(b), (`hibernate_orm_test`.`b`.`author_id` = `hibernate_orm_test`.`p`.`id`), true)"
         }
       }
     ]
   }
}

This fix is going to be available in Hibernate 5.2.2, so one more reason to upgrade!

If you enjoyed this article, you might be interested in reading my book as well.

back to top