Red Hat

In Relation To Discussions

In Relation To Discussions

Meet Marco Pivetta

Posted by    |       |    Tagged as Discussions Hibernate ORM Interview

In this post, I’d like you to meet Marco Pivetta, who is one of the maintainer of Doctrine, a suite of PHP projects that were inspired by Hibernate ORM.

Marco Pivetta, align=

Hi, Marco. Would you like to introduce yourself and tell us a little bit about your developer experience?

I’m Marco "Ocramius" Pivetta, an Italian PHP consultant, currently living in Germany. Yes, the nickname is weird, but it comes from an era of Quake 3 Arena, Unreal Tournament & co.

I’ve been tinkering with computers since I was a child, and have been working with PHP for more than half my life now, developing a love-hate relationship with the language. Interestingly, I didn’t start with the usual Joomla/Wordpress/Drupal/etc, but built a quite complex website that interacted with a browser game called "OGame", and scraped game information through a Firefox addon that would then provide an additional information to the players.

The reason why this project ("stogame") is important for me is that it included extremely challenging problems to be solved for a rookie with no help at all, and is still one of the most complex projects that I worked on:

  • XSS/SQL injections - had those, wasn’t fun

  • queuing mechanisms to sync browser extensions and the website - invented my own system

  • optimizing queries and indexes on ~60Gb of MySQL MyISAM tables

  • disaster recoveries on such a system - had those too, wasn’t fun either

  • real-time push mechanisms for clients via BOSHXMPP

  • simplistic prediction engine to aid players in decision making

All of the above were built by 15-years-old-me by just spending countless sleepless nights on it, and also jeopardizing my school evaluations. Still, this was before libraries, design patterns, mentoring, Github: only me, some friends, and a good amount of design and prediction work.

I then moved on, gave up on the project, failed university (I’m a terrible student), got a few jobs and started using frameworks. Eventually, I got to work with all of the typical DB abstraction approaches:

  • Active Record (with ZendFramework)

  • Table Data Gateway - in a custom solution

  • Data Mapper - in a Java EE project

I liked the JPA approach in the Java EE project so much that I started looking for a PHP analogue solution for my daytime job, and ended up discovering Doctrine 2.

Since then, I started getting more and more involved with the project, starting from answering questions on the mailing list and StackOverflow. Benjamin Eberlei, who was the lead on the project at that time, pushed me towards contributing with actual code changes back in 2011.

Eventually, I became part of the maintainers of the project, and that also boosted my career, allowing me to become a consultant for Roave, which allows me to see dozens of different projects, teams and tools every month, as well as a public speaker.

You are one of the developers of Doctrine ORM framework. Can you please tell us what’s the goal of Doctrine?

I am actually not one of the developers, but one of the current maintainers. The initial designers of the current Doctrine 2 ORM, as far as I know, are Jonathan Wage, Guilherme Blanco, Benjamin Eberlei and Roman Borschel. I can probably still answer the question: Doctrine ORM tries to abstract the "database thinking" away from PHP software projects, while still being a leaky abstraction on purpose.

To clarify, most PHP developers are used to developing applications from the database up to the application layer, rather than from the domain logic down, and that’s a quite widespread problem that leads to hardly maintainable and unreadable code. This tool gets rid of most of those problems, by still allowing developers to access the database directly when needed.

Ruby on Rails employs the Active Record pattern. Why did Doctrine choose the ORM paradigm instead?

Interestingly, Doctrine 1.x was an Active Record library, and also a quite good one, but it became evident quite quickly that the JPA specification and Data Mapper plus Unit of Work were better solutions altogether.

Specifically, the Data Mapper approach allows consumers of the library to write abstractions that decouple the tool from the domain almost completely (there are always limitations to this). The Unit of Work pattern has an increased memory impact for PHP applications, but also massively reduces required query operations (via in-memory identity maps) while adding some transactional boundaries, and that is a big win for most PHP apps, which often don’t even use transactions at all.

There are more advantages, but I personally wouldn’t ever consider using Active Record again due to its limitations and inherent framework coupling. This doesn’t mean that Active Record doesn’t work, but I’ve been burnt many more times with AR than with DM.

Since Hibernate ORM has been influencing Doctrine, can you tell use the similarities and differences between these two frameworks?

Doctrine is hugely inspired by Hibernate and the JPA, although we couldn’t really copy things, both due to licensing issues and life-cycle differences in Java and PHP software.

Doctrine resembles Hibernate in the Unit of Work, mappings, basic event system, second level cache and the DQL language (HQL in Hibernate). We even designed an annotation system for PHP, since the language doesn’t support them, and it currently is the de-facto standard for custom annotations in PHP libraries, and we initially only needed this to simulate inline mappings like Hibernate allows them.

Where things differ a lot are flexibility and lifecycle, since Java is an AOT-compiled language with a powerful JIT and generally deployed in long-running applications.

PHP is an interpreted language, and its strength is also its pitfall: the typical share-nothing architecture allows for short-lived, memory-safe, retry-able application runs. That also means that we have no connection pooling, and the ORM internals are much more inflexible and less event-driven than Hibernate’s due to memory and execution time constraints. That also means that we rarely encounter memory issues due to large Unit of Work instances, and connections and entity instances aren’t shared across separate web application page loads, and slow ORM will unlikely slow down an entire application server.

Another huge difference is managed state: DETACHED makes little sense in the PHP world, since a detached entity may only come from serialized state. In Doctrine 3.x, we are planning to remove support for detaching entities, since storing serialized objects in PHP is generally leading to security issues and more trouble.

As you can see, the differences are indeed mostly in the lifecycle, but each language and framework has its strengths and pitfalls.

We always value feedback from our users, so can you tell us what you’d like us to improve or are there features that we should add support for?

I’m probably being weird here, but I don’t lack any particular features from either ORM at this time. What would be interesting is reducing support for entity and transaction lifecycle events, since most consumers of these ORMs tend to code application and domain logic in those, while they were mostly intended for technical tasks, such as creating audit logs and executing pre- and post- DB cleanup tasks.

A possible improvement is to explore saving/loading of single aggregate-root-acting entities attached to a Unit of Work, which is only responsible for tracking state in child aggregates. This is only to prevent sharing entity references across aggregates, and to prevent DB transactions from crossing aggregate root boundaries.

Thank you, Marco, for taking your time. It is a great honor to have you here. To reach Marco, you can follow him on Twitter.

The MySQL Dialect refactoring

Posted by    |       |    Tagged as Discussions Hibernate ORM

Starting with Hibernate ORM 5.2.8, MariaDB gets its own Hibernate dialects.

Why?

While working on the new MariaDB Dialects, I realized that the MySQL Dialects would benefit from simplifying the version hierarchy.

Previously, the MySQL Dialects used to looks like that:

MySQL Dialects before refactoring, align=

As you can see, because of the various MySQL storage engines (e.g. MyISAM and InnoDB), the class hierarchy has diverged in multiple branches. Once we integrated Hibernate Spatial, the MySQL Dialects have become even more convoluted.

For this reason, we created the HHH-11473 Jira issue, which is fixed in Hibernate 5.2.9.

How do we stand now?

After refactoring, the MySQL Dialects look as follows:

MySQL Dialects after refactoring, align=

The following Dialects have been deprecated, therefore, they were not added to the class diagram above:

MySQLMyISAMDialect

Use MySQLDialect instead, as well as the hibernate.dialect.storage_engine=myisam Environment Variable or System Property.

MySQLInnoDBDialect

Use MySQLDialect instead, as well as the hibernate.dialect.storage_engine=innodb Environment Variable or System Property.

MySQL5InnoDBDialect

Use MySQL5Dialect instead, as well as the hibernate.dialect.storage_engine=innodb Environment Variable or System Property.

MySQL57InnoDBDialect

Use MySQL57Dialect instead.

MySQL5InnoDBSpatialDialect

Use MySQL5SpatialDialect instead, as well as the hibernate.dialect.storage_engine=innodb Environment Variable or System Property.

MySQL56InnoDBSpatialDialect

Use MySQL56SpatialDialect which defaults to InnoDB by default.

The MySQLStorageEngine abstraction encapsulates the difference between various storage engines, By delegating this responsibility to a new abstraction, the MySQL Dialect hierarchy got a lot simpler.

Traditionally, MySQL used the non-transactional MyISAM storage engine, and this is the default storage engine for all Dialects that are older than MySQL55Dialect. From MySQL55Dialect onwards, the InnoDB storage engine is used by default.

You can always override the default storage engine by providing the hibernate.dialect.storage_engine Environment Variable or System Property. Unlike other Hibernate configuration properties, this one must not be provided via persistence.xml because the Dialect is bootstrapped prior to the configuration management mechanism.

Conclusion

The deprecated Dialects will be available for a while, but they will surely be removed in a future version of Hibernate, so you better use the new ones instead. This refactoring is useful for two reasons. First, supporting MySQL 8.0 requires a single Dialect, not two. Second, it’s easier for our users as well since the choice is much more straightforward now since there is only one Dialect associated to a given MySQL version.

Hibernate Community Newsletter 4/2017

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Looking for your feedback

We are looking for your feedback in about Hibernate bootstrap in cloud environments. Check out this article for more details. If you have any idea or proposal, don’t hesitate to use the comments section below the aforementioned article.

Articles

Sometimes, it’s easy to miss the basic concepts, and relational databases are no different. Check out this article about how does a relational database work.

We released dedicated Dialects for MariaDB, so you don’t have to use the MySQL-specific Dialects when working with MariaDB.

Integration testing is of paramount importance when building an enterprise application. However, many projects rely on in-memory databases (e.g. H2, HSQLDB) for testing, while in production they use Oracle, SQL Server, PostgreSQL or MySQL. In this article, you’ll find how you can run integration tests faster using tmpfs and Docker.

If you’re using JCache through Spring, Hibernate, and Ehcache, this article explains how you can prevent spontaneous cache creation.

Emmanouil Gkatziouras wrote two articles about Hibernate and Hazelcast as a 2nd-level caching provider:

Russ Thomas wrote https://sqljudo.wordpress.com/2014/12/29/what-every-dba-and-swe-should-know-about-ef/[a comprehensive list of ORM Anti-Patterns. Although the article was written for Entity Framework, the tups apply to JPA or Hibernate.

For our Portuguese readers, Rhuan Henrique Rocha da Silva wrote an article about the meaning of mappedBy in JPA and Hibernate.

Thorben Janssen wrote an article about adding Full-Test Search capabilities to a Hibernate application.

Questions and answers

MariaDB Dialects

Posted by    |       |    Tagged as Discussions Hibernate ORM

Starting with Hibernate ORM 5.2.8, MariaDB gets its own Hibernate dialects.

About MariaDB

MariaDB is a MySQL fork that emerged in 2009 as a drop-in replacement for MySQL. While for a while, MariaDB and MySQL offered similar functionalities, with time, both MariaDB and MySQL have diverged.

For this reason, we created the HHH-11457 issue, which is fixed in the Hibernate ORM 5.2.8.

Dialect variants

For the moment, you can use one of the following two options:

MariaDBDialect

which is the base class for all MariaDB dialects and it works with any MariaDB version

MariaDB53Dialect

which is intended to be used with MariaDB 5.3 or newer versions

In time, we will add new Dialects based on newer capabilities introduced by MariaDB.

Connection properties

While to connect to a MySQL application, the connection properties look as follows:

  • 'db.dialect' : 'org.hibernate.dialect.MySQL57InnoDBDialect',

  • 'jdbc.driver': 'com.mysql.jdbc.Driver',

  • 'jdbc.user' : 'hibernate_orm_test',

  • 'jdbc.pass' : 'hibernate_orm_test',

  • 'jdbc.url' : 'jdbc:mysql://127.0.0.1/hibernate_orm_test'

For MariaDB, the connection properties look like this:

  • 'db.dialect' : 'org.hibernate.dialect.MariaDB53Dialect',

  • 'jdbc.driver': 'org.mariadb.jdbc.Driver',

  • 'jdbc.user' : 'hibernate_orm_test',

  • 'jdbc.pass' : 'hibernate_orm_test',

  • 'jdbc.url' : 'jdbc:mariadb://127.0.0.1/hibernate_orm_test'

While the URL includes the mariadb database identifier, the MariaDB53Dialect supports Time and Timestamp with microsecond precision, just like MySQL57InnoDBDialect.

Conclusion

If you are using MariaDB, it’s best to use the MariaDB-specific Dialects from now on since it’s much easier to match the MariaDB version with its appropriate Hibernate Dialect.

Recently, the team has been discussing improvements around Hibernate (ORM) usage within cloud based apps and microservices. In particular the fundamental assumption that things will break regularly on these platforms and that services should be resilient to failures.

The problem

In microservices or cloud architectures, services are started in different orders (usually beyond your control). It is possible for the app using Hibernate ORM to be started before the database. At the moment, Hibernate ORM does not like that and will explicitly fail (exception) if it can’t connect to the database.

Another related concern is to consider what is happening if the database is stopped for a while after the app running Hibernate ORM has started and resume working shortly after.

Solution 1: Hibernate waits and retries at boot time

Some users have asked us to delay and retry the connection process in case the database is not present at boot time. That would work and solve the bootstrap problem. It would not solve the database gone while running the app but here at least you have your transaction and the error propagation mechanism covering you. Plus at development time, the boot time problem gone would be quite nice already.

I understand that this is probably a quick win to implement this, but better be sure of the problem before adding that feature. It feels to me that Hibernate ORM bootstrap is not the ideal area to fix that problem. But at the end of the day if it helps enough, it would be worth it.

We are exploring that option and considering alternatives and that’s where we need your feedback.

Wait and retry vs platform notification

In this blog post, I mention the wait and retry approach. It can be replaced by a notification from the cloud platform when a service is up / down.

This avoids the regular polling process at the cost of having to rely on various integrations from various cloud platforms.

Solution 1.b: The connection pool waits and retries

It probably would be better if the connection pool Hibernate ORM uses, implements that logic but there are more than one connection pool Hibernate supports. That’s a minor variation on solution 1.

Solution 2: Hibernate boots in non-functioning mode

If Hibernate ORM cannot connect to the datasbase, it continues its bootstrap process. If an EntityManager is asking for a connection while the database is still unavailable, a well defined exception is raised. To not flood the system, a wait and retry system for connection checking would be in place to only try a few times even when lots of EntityManager are requested.

There are some subtle difficulties here on concurrency and on the fact that we use info from the bootstrap connection to configure Hibernate ORM. The most visible option guessed from the connection is the dialect to use. On the other hand, stopping the app boot process while waiting and retrying like solution 1 proposes is probably not without its challenges.

The exception raised by Hibernate ORM upon DB inaccessibility needs to be treated properly by the application (framework) being used. Like a global try catch that moves the application in degraded mode or propagating the execution error to the client (e.g. HTTP error 500). It might even be helpful if Hibernate ORM was exposing the not ready status via an explicit API.

This could be tied to a health check from the cloud platform. The application would report the not ready but trying status via a /health endpoint that the orchestrator would use.

On database connection breaking

There are many reasons for failing to connect to a database:

  1. Host unreachable

  2. DB server denying access temporarily (e.g. load)

  3. Incorrect port setting

  4. Incorrect credentials

  5. And many more

Should the system go into the wait and retry mode for cases 3, 4, 5? Or should it refuse to deploy?

Solution 3: the smart app (framework)

Another solution is for the app to have a smart overall bootstrap logic. It tries to eagerly start but if a Hibernate ORM connection error occurs, only the inbound request framework is started. It will regularly try and boot and in the mean time return HTTP 500 errors or similar.

This requires an app framework that could handle that. It embeds circuit breaker logic in the app and can better react to specific errors. I wonder how common such frameworks are though.

This is in spirit the same solution as solution 2 except it is handled at the higher level of the app (framework) vs Hibernate ORM.

Solution 4: the cloud / MSA platform restart the apps

An arguably better solution would be for the cloud platform to handle these cases and restart apps that fail to deploy in these situations. It likely requires some kind of service dependency management and a bit of smartness from the cloud infra. The infrastructure would upon specific error code thrown at boot time, trigger a wait and retry deployment logic. There is also a risk of a dependency circularity leading to a never starting system.

I guess not all cloud infra offer this and we would need an alternative solution. OpenShift let’s you express dependencies to make sure a given service is started before another. The user would have to declare that dependency of course.

Solution 5: proxy!

Another solution is to put proxies either before the app inbound requests and/or between the app and the database. Proxy is the silver bullet that lots for cloud platforms uses to solve world hunger in the digital universe.

How many proxies and routing logic does it take to serve a "Hello world!" in the cloud?
Who proxies the proxies?

:)

This approach has the advantage of not needing customized apps or libraries. The inconvenience is more intermediary points between your client and the app or data.

If the proxy is before the application, then it needs a health check or a feedback from the boot system to wait and retry the re-deployment of the application on a regular basis. I’m again not certain cloud infrastructures offer all of this infrastructure.

If the proxy is between Hibernate ORM and the database (like HAProxy for MySQL), you’re still facing some timeout exception on the JDBC side. Which means the application will fail to boot. But at least, the proxy could implement the wait and retry logic.

My questions to you

Do you have any input on this subject:

  • what’s your opinion?

  • what’s your experience when deploying cloud apps?

  • any alternative solution you have in mind?

  • any resource you found interesting covering this subject?

  • would you benefit from solution 1?

  • would you benefit from solution 2?

Any feedback to help us think this problem further is what we need :)

Building Multi-Release JARs with Maven

Posted by    |       |    Tagged as Discussions

Java 9 comes with a new feature very useful to library authors: multi-release JARs (JEP 238).

A multi-release JAR (MR JAR) may contain multiple variants of one and the same class, each targeting a specific Java version. At runtime, the right variant of the class will be loaded automatically, depending on the Java version being used.

This allows library authors to take advantage of new Java versions early on, while keeping compatibility with older versions at the same time. If for instance your library performs atomic compare-and-set operations on variables, you may currently be doing so using the sun.misc.Unsafe class. As Unsafe has never been meant for usage outside the JDK itself, Java 9 comes with a supported alternative for CAS logics in form of var handles. By providing your library as an MR JAR, you can benefit from var handles when running on Java 9 while sticking to Unsafe when running on older platforms.

In the following we’ll discuss how to create an MR JAR using Apache Maven.

Structure of a Multi-Release JAR

Multi-Release JARs contain several trees of class files. The main tree is at the root of the JAR, whereas version-specific trees are located under META-INF/versions, e.g. like this:

JAR root
- Foo.class
- Bar.class
+ META-INF
   - MANIFEST.MF
   + versions
      + 9
         - Bar.class

Here the Foo and the Bar class from the JAR root will be used on Java runtimes which are not aware of MR JARs (i.e. Java 8 and earlier), whereas Foo from the JAR root and Bar from META-INF/versions/9 will be used under Java 9 and later. The JAR manifest must contain an entry Multi-Release: true to indicate that the JAR is an MR JAR.

Example: Getting the Id of the Current Process

As an example let’s assume we have a library which defines a class providing the id of process (PID) it is running in. PIDs shall be represented by a descriptor comprising the actual PID and a String describing the provider of the PID:

src/main/java/com/example/ProcessIdDescriptor.java
package com.example;

public class ProcessIdDescriptor {

    private final long pid;
    private final String providerName;

    // constructor, getters ...
}

Up to Java 8, there is no easy way to obtain the id of the running process. One rather hacky approach is to parse the return value of RuntimeMXBean#getName() which is "pid@hostname" in the OpenJDK / Oracle JDK implementation. While that behavior is not guaranteed to be portable across implementations, let’s use it as the basis for our default ProcessIdProvider:

src/main/java/com/example/ProcessIdProvider.java
package com.example;

public class ProcessIdProvider {

    public ProcessIdDescriptor getPid() {
        String vmName = ManagementFactory.getRuntimeMXBean().getName();
        long pid = Long.parseLong( vmName.split( "@" )[0] );
        return new ProcessIdDescriptor( pid, "RuntimeMXBean" );
    }
}

Also let’s create a simple main class for displaying the PID and the provider it was retrieved from:

src/main/java/com/example/Main.java
package com.example;

public class Main {

    public static void main(String[] args) {
        ProcessIdDescriptor pid = new ProcessIdProvider().getPid();

        System.out.println( "PID: " + pid.getPid() );
        System.out.println( "Provider: " + pid.getProviderName() );
    }
}

Note how the source files created so far are located in the regular src/main/java source directory.

Now let’s create another variant of ProcessIdDescriptor based on Java 9’s new ProcessHandle API, which eventually provides a portable way for obtaining the current PID. This source file is located in another source directory, src/main/java9:

src/main/java9/com/example/ProcessIdProvider.java
package com.example;

public class ProcessIdProvider {

    public ProcessIdDescriptor getPid() {
        long pid = ProcessHandle.current().getPid();
        return new ProcessIdDescriptor( pid, "ProcessHandle" );
    }
}

Setting up the build

With all the source files in place, it’s time to configure Maven so an MR JAR gets built.

Three steps are required for that. The first thing is to compile the additional Java 9 sources under src/main/java9. I hoped I could simply set up another execution of the Maven compiler plug-in for that, but I could not find a way which only would compile src/main/java9 but not the ones from src/main/java for second time.

As a work-around, the Maven Antrun plug-in can be used for configuring a second javac run just for the Java 9 specific sources:

pom.xml
...
<properties>
    <java9.sourceDirectory>${project.basedir}/src/main/java9</java9.sourceDirectory>
    <java9.build.outputDirectory>${project.build.directory}/classes-java9</java9.build.outputDirectory>
</properties>
...
<build>
    ...
    <plugins>
        ...
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-antrun-plugin</artifactId>
            <executions>
                <execution>
                    <id>compile-java9</id>
                    <phase>compile</phase>
                    <configuration>
                        <tasks>
                            <mkdir dir="${java9.build.outputDirectory}" />
                            <javac srcdir="${java9.sourceDirectory}" destdir="${java9.build.outputDirectory}"
                                classpath="${project.build.outputDirectory}" includeantruntime="false" />
                        </tasks>
                    </configuration>
                    <goals>
                        <goal>run</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        ...
    </plugins>
    ...
</build>
...

This uses the target/classes directory (containing the class files emitted by the default compilation) as the classpath, allowing to refer to classes common for all Java versions supported by our MR JAR, e.g. ProcessIdDescriptor. The compiled classes go into target/classes-java9.

The next step is to copy the compiled Java 9 classes into target/classes so they will later be put to the right place within the resulting JAR. The Maven resources plug-in can be used for that:

pom.xml
...
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-resources-plugin</artifactId>
    <executions>
        <execution>
            <id>copy-resources</id>
            <phase>prepare-package</phase>
            <goals>
                <goal>copy-resources</goal>
            </goals>
            <configuration>
                <outputDirectory>${project.build.outputDirectory}/META-INF/versions/9</outputDirectory>
                <resources>
                    <resource>
                        <directory>${java9.build.outputDirectory}</directory>
                    </resource>
                </resources>
            </configuration>
        </execution>
    </executions>
</plugin>
...

This will copy the Java 9 class files from target/classes-java9 to target/classes/META-INF/versions/9.

Finally, the Maven JAR plug-in needs to be configured so the Multi-Release entry is added to the manifest file:

pom.xml
...
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-jar-plugin</artifactId>
    <configuration>
        <archive>
            <manifestEntries>
                <Multi-Release>true</Multi-Release>
                <Main-Class>com.example.Main</Main-Class>
            </manifestEntries>
        </archive>
        <finalName>mr-jar-demo.jar</finalName>
    </configuration>
</plugin>
...

And that’s it, we got everything together to build a multi-release JAR. Trigger the build via mvn clean package (using Java 9) to create the JAR in the target directory.

In order to take a look whether the JAR contents is alright, list its contents via jar -tf target/mr-jar-demo.jar. You should see the following:

...
com/example/Main.class
com/example/ProcessIdDescriptor.class
com/example/ProcessIdProvider.class
META-INF/versions/9/com/example/ProcessIdProvider.class
...

Eventually, let’s execute the JAR via java -jar target/mr-jar-demo.jar and examine its output. When using Java 8 or earlier, you’ll see the following:

PID: <some pid>
Provider: RuntimeMXBean

Whereas on Java 9, it’ll be this:

PID: <some pid>
Provider: ProcessHandle

I.e. the ProcessIdProvider class from the JAR root will be used on Java 8 and earlier, and the one from META-INF/versions/9 on Java 9.

Conclusion

While javac, jar, java and other JDK tools already support multi-release JARs, build tools like Maven still need to catch up. Luckily, it can be done using some plug-ins for the time being, but it’s my hope that Maven et al. will provide proper support for creating MR JARs out of the box some time soon.

Others have been thinking about the creation of MR JARs, too. E.g. check out this post by my colleague David M. Lloyd. David uses a separate Maven project for the Java 9 specific classes which are then copied back into the main project using the Maven dependency plug-in. Personally, I prefer to have all the sources within one single project, as I find that a tad simpler, though it’s not without quirks either. Specifically, if you have both, src/main/java and src/main/java9, configured as source directories within your IDE, you’ll get an error about the duplicated class ProcessIdProvider. This can be ignored (you might also remove src/main/java9 as a source directory from the IDE if you don’t need to touch it), but it may be annoying to some.

One could think about having the Java 9 classes in another package, e.g. java9.com.example and then using the Maven shade plug-in to relocate them to com.example when building the project, though this seems quite a lot of effort for a small gain. Ultimately, it’d be desirable if IDEs also added support for MR JARs and multiple compilations with different source and target directories within a single project.

Any feedback on this or other approaches for creating MR JARs is welcome in the comments section below. The complete source code of this blog post can be found on GitHub.

Hibernate Community Newsletter 3/2017

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Articles

If you’re using MySQL, then the GenerationType.AUTO identifier strategy is not the best option. Check out this article for more details and a very simple workaround.

Injecting a JPA/Hibernate Entity Managers wit CDI and Weld is extremely easy. Check out this article for more details.

Concurrency Control is a very difficult topic, and relational databases are no different. If you wonder how different database systems prevent Phantom reads or you are curious about how Two-phase Locking and MVCC work, you should definitely read this article.

Arno Huetter wrote a list of tips to improve application performance when you’re using JPA and Hibernate.

If you’re working on a database system which does not allow you to create temporary tables, then rest assured. Hibernate 5.2.8 adds support for non-temporary table bulk-id strategies.

If you want to separate the entity validation logic from the entity data structures, Hibernate Validator is a very attractive solution.

Time to upgrade

This article is about the HHH-11262 JIRA issue which now allows the bulk-id strategies to work even when you cannot create temporary tables.

Class diagram

Considering we have the following entities:

Class diagram, align=

The Person entity is the base class of this entity inheritance model, and is mapped as follows:

@Entity(name = "Person")
@Inheritance(
    strategy = InheritanceType.JOINED
)
public class Person
    implements Serializable {

    @Id
    private Integer id;

    @Id
    private String companyName;

    private String name;

    private boolean employed;

    //Getters and setters omitted for brevity

    @Override
    public boolean equals(Object o) {
        if ( this == o ) {
            return true;
        }
        if ( !( o instanceof Person ) ) {
            return false;
        }
        Person person = (Person) o;
        return Objects.equals(
            getId(),
            person.getId()
        ) &&
        Objects.equals(
            getCompanyName(),
            person.getCompanyName()
        );
    }

    @Override
    public int hashCode() {
        return Objects.hash(
            getId(), getCompanyName()
        );
    }
}

Both the Doctor and Engineer entity classes extend the Person base class:

@Entity(name = "Doctor")
public class Doctor
    extends Person {
}

@Entity(name = "Engineer")
public class Engineer
    extends Person {

    private boolean fellow;

    //Getters and setters omitted for brevity
}

Inheritance tree bulk processing

Now, when you try to execute a bulk entity query:

int updateCount = session.createQuery(
    "delete from Person where employed = :employed" )
.setParameter( "employed", false )
.executeUpdate();

Hibernate executes the following statements:

create temporary table
    HT_Person
(
    id int4 not null,
    companyName varchar(255) not null
)

insert
into
    HT_Person
    select
        p.id as id,
        p.companyName as companyName
    from
        Person p
    where
        p.employed = ?

delete
from
    Engineer
where
    (
        id, companyName
    ) IN (
        select
            id,
            companyName
        from
            HT_Person
    )

delete
from
    Doctor
where
    (
        id, companyName
    ) IN (
        select
            id,
            companyName
        from
            HT_Person
    )

delete
from
    Person
where
    (
        id, companyName
    ) IN (
        select
            id,
            companyName
        from
            HT_Person
    )

HT_Person is a temporary table that Hibernate creates to hold all the entity identifiers that are to be updated or deleted by the bulk id operation. The temporary table can be either global or local, depending on the underlying database capabilities.

What if you cannot create a temporary table?

As the HHH-11262 issue describes, there are use cases when the application developer cannot use temporary tables because the database user lacks this privilege.

In this case, we defined several options which you can choose depending on your database capabilities:

  • InlineIdsInClauseBulkIdStrategy

  • InlineIdsSubSelectValueListBulkIdStrategy

  • InlineIdsOrClauseBulkIdStrategy

  • CteValuesListBulkIdStrategy

InlineIdsInClauseBulkIdStrategy

To use this strategy, you need to configure the following configuration property:

<property name="hibernate.hql.bulk_id_strategy"
          value="org.hibernate.hql.spi.id.inline.InlineIdsInClauseBulkIdStrategy"
/>

Now, when running the previous test case, Hibernate generates the following SQL statements:

select
    p.id as id,
    p.companyName as companyName
from
    Person p
where
    p.employed = ?

delete
from
    Engineer
where
        ( id, companyName )
    in (
        ( 1,'Red Hat USA' ),
        ( 3,'Red Hat USA' ),
        ( 1,'Red Hat Europe' ),
        ( 3,'Red Hat Europe' )
    )

delete
from
    Doctor
where
        ( id, companyName )
    in (
        ( 1,'Red Hat USA' ),
        ( 3,'Red Hat USA' ),
        ( 1,'Red Hat Europe' ),
        ( 3,'Red Hat Europe' )
    )

delete
from
    Person
where
        ( id, companyName )
    in (
        ( 1,'Red Hat USA' ),
        ( 3,'Red Hat USA' ),
        ( 1,'Red Hat Europe' ),
        ( 3,'Red Hat Europe' )
    )

So, the entity identifiers are selected first and used for each particular update or delete statement.

The IN clause row value expression has long been supported by Oracle, PostgreSQL, and nowadays by MySQL 5.7. However, SQL Server 2014 does not support this syntax, so you’ll have to use a different strategy.

InlineIdsSubSelectValueListBulkIdStrategy

To use this strategy, you need to configure the following configuration property:

<property name="hibernate.hql.bulk_id_strategy"
          value="org.hibernate.hql.spi.id.inline.InlineIdsSubSelectValueListBulkIdStrategy"
/>

Now, when running the previous test case, Hibernate generates the following SQL statements:

select
    p.id as id,
    p.companyName as companyName
from
    Person p
where
    p.employed = ?

delete
from
    Engineer
where
    ( id, companyName ) in (
        select
            id,
            companyName
        from (
        values
            ( 1,'Red Hat USA' ),
            ( 3,'Red Hat USA' ),
            ( 1,'Red Hat Europe' ),
            ( 3,'Red Hat Europe' )
        ) as HT
            (id, companyName)
    )

delete
from
    Doctor
where
    ( id, companyName ) in (
         select
            id,
            companyName
        from (
        values
            ( 1,'Red Hat USA' ),
            ( 3,'Red Hat USA' ),
            ( 1,'Red Hat Europe' ),
            ( 3,'Red Hat Europe' )
        ) as HT
            (id, companyName)
    )

delete
from
    Person
where
    ( id, companyName ) in (
        select
            id,
            companyName
        from (
        values
            ( 1,'Red Hat USA' ),
            ( 3,'Red Hat USA' ),
            ( 1,'Red Hat Europe' ),
            ( 3,'Red Hat Europe' )
        ) as HT
            (id, companyName)
    )

The underlying database must support the VALUES list clause, like PostgreSQL or SQL Server 2008. However, this strategy requires the IN-clause row value expression for composite identifiers so you can use this strategy only with PostgreSQL.

InlineIdsOrClauseBulkIdStrategy

To use this strategy, you need to configure the following configuration property:

<property name="hibernate.hql.bulk_id_strategy"
          value="org.hibernate.hql.spi.id.inline.InlineIdsOrClauseBulkIdStrategy"
/>

Now, when running the previous test case, Hibernate generates the following SQL statements:

select
    p.id as id,
    p.companyName as companyName
from
    Person p
where
    p.employed = ?

delete
from
    Engineer
where
    ( id = 1 and companyName = 'Red Hat USA' )
or  ( id = 3 and companyName = 'Red Hat USA' )
or  ( id = 1 and companyName = 'Red Hat Europe' )
or  ( id = 3 and companyName = 'Red Hat Europe' )

delete
from
    Doctor
where
    ( id = 1 and companyName = 'Red Hat USA' )
or  ( id = 3 and companyName = 'Red Hat USA' )
or  ( id = 1 and companyName = 'Red Hat Europe' )
or  ( id = 3 and companyName = 'Red Hat Europe' )

delete
from
    Person
where
    ( id = 1 and companyName = 'Red Hat USA' )
or  ( id = 3 and companyName = 'Red Hat USA' )
or  ( id = 1 and companyName = 'Red Hat Europe' )
or  ( id = 3 and companyName = 'Red Hat Europe' )

This strategy has the advantage of being supported by all the major relational database systems (e.g. Oracle, SQL Server, MySQL, and PostgreSQL).

CteValuesListBulkIdStrategy

To use this strategy, you need to configure the following configuration property:

<property name="hibernate.hql.bulk_id_strategy"
          value="org.hibernate.hql.spi.id.inline.CteValuesListBulkIdStrategy"
/>

Now, when running the previous test case, Hibernate generates the following SQL statements:

select
    p.id as id,
    p.companyName as companyName
from
    Person p
where
    p.employed = ?

with HT_Person (id,companyName ) as (
    select id, companyName
    from (
    values
        (?, ?),
        (?, ?),
        (?, ?),
        (?, ?)
    ) as HT (id, companyName) )
delete
from
    Engineer
where
    ( id, companyName ) in (
        select
            id, companyName
        from
            HT_Person
    )

with HT_Person (id,companyName ) as (
    select id, companyName
    from (
    values
        (?, ?),
        (?, ?),
        (?, ?),
        (?, ?)
    ) as HT (id, companyName) )
delete
from
    Doctor
where
    ( id, companyName ) in (
        select
            id, companyName
        from
            HT_Person
    )


with HT_Person (id,companyName ) as (
    select id, companyName
    from (
    values
        (?, ?),
        (?, ?),
        (?, ?),
        (?, ?)
    ) as HT (id, companyName) )
delete
from
    Person
where
    ( id, companyName ) in (
        select
            id, companyName
        from
            HT_Person
    )

The underlying database must support the CTE (Common Table Expressions) that can be referenced from non-query statements as well, like PostgreSQL since 9.1 or SQL Server since 2005. The underlying database must also support the VALUES list clause, like PostgreSQL or SQL Server 2008.

However, this strategy requires the IN-clause row value expression for composite identifiers, so you can only use this strategy only with PostgreSQL.

Conclusion

If you can use temporary tables, that’s probably the best choice. However, if you are not allowed to create temporary tables, you must pick one of these four strategies that works with your underlying database. Before making your mind, you should benchmark which one works best for your current workload. For instance, CTE are optimization fences in PostgreSQL, so make sure you measure before taking a decision.

If you’re using Oracle or MySQL 5.7, you can choose either InlineIdsOrClauseBulkIdStrategy or InlineIdsInClauseBulkIdStrategy. For older version of MySQL, then you can only use InlineIdsOrClauseBulkIdStrategy.

If you’re using SQL Server, InlineIdsOrClauseBulkIdStrategy is the only option for you.

If you’re using PostgreSQL, then you have the luxury of choosing any of these four strategies.

Tool Time: Preventing leaky APIs with jQAssistant

Posted by    |       |    Tagged as Discussions

If you’ve ever watched the great show "Home Improvement", you’ll know that a fool with a tool is still a fool. At the same time though, the right tool used in the right way can be very effective for solving complex issues.

In this post I’d like to introduce a tool called jQAssistant which I’ve found very useful for running all sorts of analyses of a project’s code base, e.g. for preventing the leakage of internal types in the public API of a library. This is planned to be the first post in a blog series on developer-centric tools we’ve come to value when working on the different libraries of the Hibernate family.

Keeping your API clean

When providing a library it is an established best practice to clearly distinguish between those parts of the code base intended to be accessed by users (the API of the library) and those parts not meant for outside access.

Having a clearly defined API helps to reduce complexity for the user (they only need to learn and understand the API classes but not all the implementation details), while giving the library authors freedom to refactor and alter implementation classes as they deem it necessary. Usually, the separation of API and implementation types is achieved by using specific package names, with all the non-public parts being located under a package named internal, impl or similar.

One thing you have to be very careful about though is to not leak any internal types in the API. E.g. a method definition like the following is something which should be avoided:

package com.example;

import com.example.internal.Foo;

public interface MyPublicService {

    Foo doFoo();
}

MyPublicService is part of the public API (as it is not part of an internal package), but the doFoo() method returns the internal type Foo. Users of doFoo() thus would have to deal with an implementation type, which is exactly what you wanted to prevent when splitting the API and implementation parts of the library.

Unfortunately, inconsistent APIs like this are defined easily if one isn’t very careful. This is where tooling is helpful: by searching for such malformed APIs in an automated way, they can be spotted early on and be avoided.

Introducing jQAssistant

jQAssistant is an open-source tool allowing you to do exactly that. It parses a project’s code base and creates a model of all the classes, methods, fields etc. in a Neo4j graph database. Using Neo4j’s powerful Cypher query language, you then can execute queries to detect specific patterns and structures in your code base which you are interested in.

Setting up jQAssistant for your project is fairly simple. Assuming you are working with Maven, all that’s needed is the following plug-in configuration in your pom.xml (refer to the official documentation for further details and configuration options):

pom.xml
...
<build>
    <plugins>
        <plugin>
            <groupId>com.buschmais.jqassistant.scm</groupId>
            <artifactId>jqassistant-maven-plugin</artifactId>
            <version>1.1.4</version>
            <executions>
                <execution>
                    <goals>
                        <goal>scan</goal>
                        <goal>analyze</goal>
                    </goals>
                    <configuration>
                        <failOnViolations>true</failOnViolations>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>
...

With the plug-in in place, we can define a Cypher query which finds any API method with an internal return type:

jqassistant/rules.xml
<?xml version="1.0" encoding="UTF-8"?>
<jqa:jqassistant-rules xmlns:jqa="http://www.buschmais.com/jqassistant/core/analysis/rules/schema/v1.0">

    <constraint id="my-rules:PublicMethodsMayNotExposeInternalTypes">
        <description>API/SPI methods must not expose internal types.</description>
        <cypher><![CDATA[
            MATCH
                (class)-[:`DECLARES`]->(method)-[:`RETURNS`]->(returntype)
            WHERE
                NOT class.fqn =~ ".*\\.internal\\..*"
                AND (method.visibility="public" OR method.visibility="protected")
                AND returntype.fqn =~ ".*\\.internal\\..*"
            RETURN
                method
        ]]></cypher>
    </constraint>

    <group id="default">
        <includeConstraint refId="my-rules:PublicMethodsMayNotExposeInternalTypes" />
    </group>

</jqa:jqassistant-rules>

jQAssistant rules are given in a file named jqassistant/rules.xml by default.

The rules themselves are defined using the Cypher query language. If you haven’t used Cypher before, it may feel a bit uncommon at first, but from my own experience I can tell you that one gets the hang of it pretty quickly. Essentially, you describe patterns of graph nodes, their properties, their type (as expressed via "labels", a kind of tag) and their relationships.

In the query above, we use a pattern including three nodes ("class", "method" and "returntype") and two relationships. There must be a relationship of type "DECLARES" from the "class" node to the "method" node and another relationship of type "RETURNS" from the "method" to the "returntype" node.

As we are only interested in methods leaking internal types through the API, the WHERE clause is used to further refine the selection:

  • The declaring class must be part of the API (it must not be in an internal package, as expressed by filtering on the fully-qualified class name),

  • the method must either have public or protected visibility and

  • the return type must be located in an internal package.

To execute the rule, simply build the project with the jQAssistant Maven plug-in configured as above. The plug-in will automatically execute all rules of the default group given in the rules file. If there is any result for any of the executed rules, the build will fail, displaying the result(s) of the affected rules:

[INFO] --- jqassistant-maven-plugin:1.1.4:analyze (default) @ jqassistant-demo ---
...
[ERROR] --[ Constraint Violation ]-----------------------------------------
[ERROR] Constraint: my-rules:PublicMethodsMayNotExposeInternalTypes
[ERROR] Severity: INFO
[ERROR] API/SPI methods must not expose internal types.
[ERROR]   method=com.example.MyPublicService#com.example.internal.Foo doFoo()
[ERROR] -------------------------------------------------------------------
...
[INFO] BUILD FAILURE

API methods should only return API types, but they also should only take API types as parameters. Let’s expand the Cypher query to cover this case, too:

jqassistant/rules.xml
...
<constraint id="my-rules:PublicMethodsMayNotExposeInternalTypes">
    <description>API/SPI methods must not expose internal types.</description>
    <cypher><![CDATA[
      // return values
      MATCH
          (class)-[:`DECLARES`]->(method)-[:`RETURNS`]->(returntype)
      WHERE
          NOT class.fqn =~ ".*\\.internal\\..*"
          AND (method.visibility="public" OR method.visibility="protected")
          AND returntype.fqn =~ ".*\\.internal\\..*"
      RETURN
          method

      // parameters
      UNION ALL
      MATCH
          (class)-[:`DECLARES`]->(method)-[:`HAS`]->(parameter)-[:`OF_TYPE`]->(parametertype)
      WHERE
          NOT class.fqn =~ ".*\\.internal\\..*"
          AND (method.visibility="public" OR method.visibility="protected")
          AND parametertype.fqn =~ ".*\\.internal\\..*"
      RETURN
          method
    ]]></cypher>
</constraint>
...

Similar to SQL we can add further results using the UNION ALL clause. Searching for leaking parameters is done in a similar way as for return values, the only difference is the node pattern we need to detect: there must be a node (the class) with a DECLARES relationship to another node (the method) which has a relationship of type HAS to a third node (the parameter) which finally has a OF_TYPE relationship to a fourth node representing the parameter’s type. The same rules for package names and the method’s visibility apply as for the return value check.

Browsing the model of your project

When declaring rules as the one above, it is vital to know the meta-model of your project’s graph representation, e.g. which types of nodes there are (i.e. which labels they have), what types of relationships there are, which properties the nodes have and so on. This is described in great depth in the jQAssistant reference documentation.

But as jQAssistant is based on Neo4j, you also can use the browser app coming with the database for interactively exploring your project’s structure. To do so, simply run the following command:

mvn jqassistant:scan jqassistant:server

This will populate jQAssistant’s embedded Neo4j database with your project’s structure and start a Neo4j server. In a browser, go to http://localhost:7474/browser/ and you can explore the project code base, run Cypher queries etc. Start by selecting a node label or relationship type on the left or by submitting a Cypher query:

Browsing a jQAssistant model in Neo4j, align=

Trying it out yourself

You can find a complete example of using jQAssistant in a Maven project here on GitHub. You also may take a look at the rules file used by Hibernate Validator. Besides the check for methods exposing internal types this file defines some more rules:

  • public fields in API types must not expose implementation types

  • API types must not extend implementation types

These checks run regularly on our CI server, preventing the accidental introduction of leaky APIs very effectively.

Creating a model of a software project in a graph database is a great idea. The powerful Cypher query language allows to search for interesting structures and patterns in your project in a rather intuitive way. The detection of leaky APIs is just one example. For instance you may define a layered architecture of your business application and ensure that there are no illegal dependencies between the application layers.

Also jQAssistant is not limited to Java classes. Besides the Java plug-in the tool provides many other scanners, e.g. for Maven POM files, JPA persistence units or XML files, allowing you to run all kinds of analyses tailored to the specific needs of your project.

Hibernate Community Newsletter 2/2017

Posted by    |       |    Tagged as Discussions Hibernate ORM

Welcome to the Hibernate community newsletter in which we share blog posts, forum, and StackOverflow questions that are especially relevant to our users.

Interviews

Don’t miss our Hibernate developer interview with Dmitry Alexandrov.

If you want to share your story about Hibernate, let us know, and we can share it with our huge community of passionate developers.

Articles

I was told about a new blog post which proclaims that Lazy loading is a code smell. Well, in my experience, this is exactly the opposite since I’m a strong believer that EAGER fetching is almost always a bad way of fetching data. After reading Sebastian Malaca’s article, I managed to find a very interesting series of article on mixing JPA and DDD (Domain-driven design).

DDD is a great approach. However, trying to treat a relational database as if it were a document store can be very detrimental to application performance. All in all, JPA entities are not the same as DDD entities. In fact, JPA entities are just the persistent state of the Domain Model.

Orlando L Otero wrote a tutorial about implementing a Multitenant architecture on top of Spring, Hibernate, and PostgreSQL. Related to Multitenancy, I found this Microsoft article from 2006 very relevant to the day.

Choosing the right entity identifier strategy requires some knowledge of the underlying JPA provider. For this reason, if you want portability, check out how you can replace the suboptimal TABLE strategy with SEQUENCE or IDENTITY.

Thorben Janssen wrote a short guide which introduces several JPQL query features. For more on this topic, check out the exhaustive JPQL and HQL chapter in the Hibernate 5 User Guide.

Hibernate entity queries are suitable when you want to modify the fetched entities, and taking advantage of the dirty checking mechanism. However, if you want to take advantage of advanced SQL query capabilities, you need native SQL queries. Check out this article to learn why native SQL queries are a Magic Wand.

Time to upgrade

back to top