Showing some love for GraalVM: Hibernate ORM 5.4.12.Final is here

Another packed release in just days

Just a few days after the previous release, we publish now version 5.4.12.Final of our flagship project.

The reason of the expedited timeframe is that we had a regression in the previous release: the new enhanced entity scanner would fail if you had any module-info.class; that felt embarassing enough to publish a fix right away.

Which leads me to consider we didn’t introduce you to some of the enhancements in the previous release; on top of that we introduced an interesting, new hibernate-graalvm artifact: let me tell you about these.

Jandex based entity scanner

When Hibernate ORM boots, you either provide it with a list of mapped entities explicitly, or we need to find the objects having any mapping annotations, such as @Entity.

This internal service is provided by implementations of org.hibernate.boot.archive.scan.spi.Scanner: some frameworks and containers plugin their own implementation when they need particular control around the scanning process, but of course we have a default implementation.

Any Scanner implementation needs to be fast: searching for all objects on the classpath could take a lot of time, depending how you do this.

Until Hibernate ORM 5.4.10.Final the default implementation we provide was based on Javassist, as it provided an efficient solution; this implementation served us well for a long time, but it requires Javassist, while we’re trying hard to limit our dependencies. This wasn’t a big deal when the default Bytecode Enhancer implementation was Javassist, but since we switched to Byte Buddy, needing on the classpath as well, and just for this one use case is feeling like an unnecessary weight.

To be fair, it’s also a drain on the team to ensure such libraries are regularly updated to be compatible with the fast moving JDK releases: reducing the number of such libraries we depend on should help our capability to maintain it all.

So why Jandex?

Jandex is a small library which is purely focused on scanning, with am emphasis on efficiency; we’ve used it successfully in other contexts so it was the perfect candidate to update our scanner implementation; so Javassist is now an optional dependency: another small step to finally remove it in Hibernate 6.

Jandex has another benefit: it allows to store the index within the jarfiles. You might want to consider adding the Jandex build time plugins to your project to take advantage of this.

Metadata for GraalVM native-image

It’s possible to use GraalVM’s native-image tool to compile Java applications to native code; this is straight forward for most Java code, but it gets fairly more complicated when you’re using libraries such as Hibernate ORM.

I discussed the challenges extensively at some conferences such as Devoxx and QCon; if you want to learn the details some of these session are available for free on youtube. Best to read the introduction on the official documentation first though!

In a nutshell, the compiler needs to fully analyze the possible execution paths to be able to apply some brilliant optimisations, including removing all the symbols that it can prove that you’ll never actually going to need at runtime. That’s great as it reduces the size of the binary files significantly, and often will also result in improved runtime performance as, for example, data structures become smaller, there’s less branches, less types to consider for polymorphic calls, and many more. Running this analysis is fully automatic when you have "normal" flow of code, but when the library in question expects to be able to generate or modify bytecode at runtime, or even just use reflection, then the compiler can no longer tell which symbols are safe to remove, and needs a little explicit help.

Luckily, it’s quite straight forward to tell it which methods you intend to invoke reflectively; there’s a number of options, including a simple to craft json file. See Reflection for more details.

But could libraries include such metadata?

This is where it gets trickier. Hibernate ORM could include a list, but this is a static list, which raises a series of questions.

For example, should it include the code to support UUID identity generators?

If it does, those symbols will be included for any application relying on this metadata - including those who aren’t using UUID for their identifiers. You might expect this to be a small price, but you’d be surprised: consider that the native image code needs to include also all code that any of your classes depends on; in this case it would imply to also include a significant amount of code which allows the JDK to generate UUID instances: support for random number generators, and even a bit of cryptographic code.

If we decided to not include support for UUID identity generators in such a list, then any users would still have the option to explicitly add it to their own json file; that’s certainly better from a point of view of efficiency and optimisations, but is less convenient.

And that’s just a single identity generator! It’s easy to see that the wrong approach can lead to a huge overhead, when the problem is scaled to the choice and permutations of all internal components.

What’s the optimal solution? Clearly this could be automated: takes the inconvience away from the end users, and also an application could deal with a far larger set of possibilities, and combinations. Hibernate ORM is complex enough, but it gets worse: in practice when you compile the application you will need to craft a single set of compiler flags which will satisfy all of your code and all of its dependencies.

The amount and complexity of this, also makes it hard for a human to actually manage it all correctly, even when given precise instructions.

In Quarkus, the Hibernate ORM extension will simply analyse the application code and figure out which of the many internal components that Hibernate ORM offers are actually going to be necessary; it goes even further, by optimising for the specific configuration that these will be used in; the architecture based on extensions makes sure that each module has deep, specialized expertise in each library it supports, and yet it’s all coordinated by a central core which helps produce a coherent set of metadata and compiler flags.

This automated, custom build metadata is then fed to GraalVM’s native-image compiler to produce the perfect image: lean, fast and functional.

Still, while that solution is very elegant, its applicability is limited to applications based on Quarkus. What if you want to build your own native image from command line, so to experiment with the various options that GraalVM provides?

To be fair that’s not easy, but we can try making it less complex.

One problem is that Hibernate ORM will likely also need to generate proxies at runtime (depending on configuration), and very likely will need to run some reflective operations on your own code: at very leat it needs to use reflection on the mapped classes to know what properties they have, what annotations they have. I don’t have the solution for that, but it’s certainly possible to plug in a custom metadata source, if someone wanted to do so, or just manually list the domain model as additional types to be registered for reflective access in the json file: maybe a bit tedious, but certainly possible.

Adding the user domain model to the list, making a choice of some internal components are things I consider "dynamic metadata" needs: as they vary from application to application. There’s also a "static list": independently from your configuration and from your model, you will always need to register some key internal classes from Hibernate.

To be clear, I was on the initial R&D team who created Quarkus: much of its design stems from the need to solve these issues while keeping it all simple to use; so don’t be surprised that my opinion is that if you need this all to "just work" at minimal fuss my recommendation would be to use Quarkus, as I don’t know how else you could solve the need, for example, of needing proxies generated at runtime. Happy to discuss such aspects in more detail, as we’d love such solutions evolve further, but please remembed that this is definitely material for advanced users, researchers and framework developers.

Introducing `hibernate-graalvm`

To help people out who are prototyping alternative solutions to such problems, I’ve decided to extract the "static metadata" from Quarkus and publish it as a companion jar to the Hibernate ORM core jar.

Feel free to add this dependency to your project:

<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernate-graalvm</artifactId>
   <version>5.4.12.Final</version>
</dependency>

If you’re not compiling with native-image, this artifact will not be very useful at all; the good news is that it’s harmless. But if you are going to experiment with GraalVM, the compiler will pick it up automatically and register all the internals that are needed to be instantiated via reflection.

A word of caution: the list might be incomplete as it’s based on our knowledge so far, and the experience we’ve got from Quarkus and its community; it certainly is missing at least your own entities, so consider it a helpful starting point but don’t consider it an exhaustive list.

I hope that would make it far easier for more people to experiment with it!

Release changelog

You can find the full list of changes in this version here.

Getting Hibernate ORM 5.4.12.Final

All details are available and up to date on the dedicated page on hibernate.org.

Feedback, issues, ideas?

To get in touch, use the usual channels:

hibernate tag on Stack Overflow (usage questions)
User forum (usage questions, general feedback)
Issue tracker (bug reports, feature requests)
Mailing list (development-related discussions)

In Relation To