In this post, I’d like you to meet Lukas Eder, Java Champion, SQL aficionado, and data access framework developer.
Hi Vlad, I’m @lukaseder, founder and CEO of Data Geekery, the company behind jOOQ. I am a Java Champion and Oracle ACE, and I will surely get all the other cool badges from the other vendors as well, soon. I have millions of Stack Overflow reputation, reddit karma, and other important Internet points, and I was recently endorsed on LinkedIn not only for XML, but also for Microsoft Excel reactive programming - the only tool even more powerful than SQL.
Apart from those remarkable accomplishments ;), I’m mostly coding and tuning Java, SQL, and PL/SQL around jOOQ and for customers of Data Geekery’s consultancy services. I’m also delivering public and in-house SQL trainings, and I’m generally trying to make the lives of Java developers who work with SQL easier.
The spirit of jOOQ was born in the early 1800s when Ada Lovelace essentially invented programming. She surely would have predicted it. Once algorithms were a thing, the world was now ready for SQL, and thus for jOOQ. After her impressive work, however, many detours were taken, and it took another 150 years until the invention of SQL, and roughly, until 2009, when it became clear that SQL has to be part of Java as well. Thus: jOOQ.
When focusing only on the 2009 bit of history, jOOQ happened "by accident". At the time, in essence, everyone was creating home-grown query builders to abstract away the pain of concatenating the strings "A = 1" and "B = 2", connecting them with " AND ", and occasionally, with " OR ". Overall global efforts invested in that area can only be measured in terms of geologic time scales ("person-eons"). Those efforts all went to waste because even within the same company, several teams made the same efforts again and again.
Most of these in-house tools could roughly handle SELECT .. FROM .. WHERE. JOINs were hardly supported (or even understood), and if you added GROUP BY to the mix, there was a Cambrian explosion of bugs, patches, and further person-eons of maintenance work.
All this meant that there had to be an industry need, a market, for a tool that everyone can use. A tool to make sure this problem of building SQL strings will be solved for good. And there were such tools! At the time, there were at least 10 proof-of-concept style open source libraries, and perhaps one that took it a step further called QueryDSL. But none of them went "all in" on the SQL language. Again, most of them handled SELECT .. FROM .. WHERE, and then pretty much stopped at that.
jOOQ had to be made and stay here for good.
When it comes to persisting and fetching data, what’s the approach taken by jOOQ? How does it compare to the transactional write-behind one employed by Hibernate?
jOOQ is "just" typesafe JDBC/SQL, i.e. it is an API that allows its users to write SQL, not as strings (as with JDBC), but as compile-time type checked SQL expression trees that just happen to look an awful lot like actual SQL. So, the question isn’t really what the approach for persisting and fetching data, but for writing the queries to do so.
You could compare jOOQ with JPA’s Criteria Query API, which is "just" typesafe JPA/JPQL - although I must say that Criteria Query could have been better executed. Writing an internal DSL (Domain-Specific Language) is really hard. The implications in terms of API maintenance and keeping Backward Compatibility are very tricky. I think Criteria Query was added to the JPA specification prematurely. Now that we have it, it can never really be changed, only, perhaps, superseded.
Now, if you want to compare the approaches of writing queries (SQL, JDBC, jOOQ, JPA native queries) vs the "transactional write-behind" approach (JPA, Hibernate, other ORMs), the discussion gets a bit more complex, and not strictly tool related, but architecture related.
In short, the SQL approach is more bulk data processing oriented (for both reads and writes), whereas the JPA approach is more CRUD oriented. SQL is more stateless/sessionless/side-effect free whereas JPA is stateful/"sessionful"/imperative. SQL runs set based computational logic in the database, giving access to the sophistication of modern SQL optimisers, whereas JPA offers running record based computational logic in the client, giving access to the vast possibilities of Java libraries and client-side processing.
So, clearly, the two things represent not just different technologies, but different paradigms and mindsets.
However, this still doesn’t explain why each approach could be the preferred one in a given situation. And it’s a really tricky question. I have personally only ever worked with systems where the SQL approach was clearly more advantageous (large data sets, thousands of tables, dozens of joins needed per query, thousands of concurrent queries, thousands of stored procedures, few concurrent writes, mixed OLTP and OLAP workloads) - but I know many people and their systems, where pure SQL would be detrimental for the complex transactional patterns they employ and JPA really shines.
Ideally, there’s a mix of both worlds: SQL and JPA - or jOOQ and Hibernate, which are concrete implementations. Because ultimately, one size never fits all, and you will have reporting and analytics (SQL) in an OLTP/CRUD (JPA) application, to give a simple example.
Of course, I have blogged about this in the past.
How easy it would be to integrate jOOQ and Hibernate and what would be the main benefit of combining both frameworks?
There are different ways to integrate.
Integrating query API
From a mere API perspective, it is very easy. As of jOOQ 3.10, you can simply extract any SQL string and bind variable sets directly from your jOOQ query, send that query to the Hibernate/JPA native query API, and let Hibernate sort out the mapping according to JPA standards. This works with:
mapping ordinary native queries to
mapping "enriched" native queries to
mapping ordinary native queries to entities
Specifically, the latter can be quite powerful. Sometimes, you do want to get managed entities as a result, but you cannot express the complexity of your query in a JPQL query/Criteria Query/named entity graph. Simple examples involve using unions, common table expressions, derived tables, lateral joins, and many other features that are not well supported in JPQL.
In that case, SQL shines, and from a performance and maintainability
perspective, it is almost always the better option compared to fetching
all data into memory and implementing the logic in Java. All you have to
do is make sure you select all the columns needed for the entity graph
that you want to materialize, possibly using
ResultTransformer (look who wrote about that topic ;) ), and you’re
A future jOOQ version, hopefully, version 3.11 (for workgroups ;), will
further simplify the integration by binding the jOOQ SPIs directly to an
EntityManager. This will remove the need for extracting SQL strings and
bind variables from jOOQ queries and allow for executing the query
directly using jOOQ API but on the
EntityManager. I’m really looking
forward to this feature, which makes using the best of both worlds
Integrating code generation
Another cool integration point is the jOOQ code generation based off JPA annotated entities, with Hibernate being used behind the scenes. Many projects already use Hibernate and want to run a couple of reports or entity queries with SQL, and thus with jOOQ. They can now reverse-engineer the JPA annotated entities using Hibernate, generate an in-memory H2 database from them, and jOOQ can read that H2 database to generate jOOQ code.
Even if I personally prefer working with DDL, many projects see their JPA annotated entities as their primary source of schema information, so that approach suits them really well.
Integrating on a JDBC level
A lesser-known integration point is the fact that jOOQ exposes itself through at least two low-level, JDBC-based SPIs:
the parser API
the mocking API
In both cases, jOOQ can proxy a JDBC Connection and do things like:
Parsing the SQL string that gets sent to jOOQ and transforming the SQL expression tree to something else, e.g. by applying a VisitListener. This could be used to implement client-side row-level security, or sophisticated multi-tenancy, or other things. Also, the parsed SQL string can be translated to other SQL dialects (although that is not really useful in Hibernate, which is already dialect agnostic). A future jOOQ version will be able to apply custom formatting to arbitrary SQL strings, so this could work nicely as a formatting utility for Hibernate-generated SQL, for logging purposes.
The SQL statement can be mocked through a single SPI, returning "fake" results in some cases. In simple setups, this can be quite powerful to intercept queries both for testing and for other purposes.
Again, these features do not expose jOOQ to the client, but hide jOOQ behind JDBC, so that they can work with JDBC directly, or with Hibernate.
For many Java developers, the level of knowledge of SQL is rather basic. What awesome SQL features would you recommend Java developers to start learning more about?
That’s true, very unfortunately. I would recommend this:
First off, don’t be afraid of SQL. SQL is a very simple functional programming language that just happens to have a rather quirky, arcane syntax. THE MORE YOU YELL, THE FASTER IT RUNS, RIGHT? (Credit for this joke to Aleksey Shipilëv). But in order to truly understand SQL (both basic and sophisticated SQL), I think it is important to remember where it came from Relational Algebra. If this is properly understood, in particular, the fact that most operations are just syntax sugar over basic set operations like set unions and cartesian products, then the language will make a lot more sense, and it becomes clear how powerful it really is.
Then, I suggest reading this really cool blog about SQL ;) and looking out for a couple of more advanced features. The most important ones are common table expressions (CTE) and window functions. CTE is super easy to understand and will add value immediately when writing complex queries. Window functions are a bit more tricky, but I’d say also much more rewarding on an everyday basis. Once these are understood, a vast number of other features are worth visiting. Sophisticated examples are shown on my post "10SQL Tricks That You Didn’t Think Were Possible", but there are many other, simpler features that can be used on a daily basis. I will cover more of them on the blog in the future, I’m also writing a book (this does take longer than expected, with 2 kids…), and of course, these topics are covered in my SQL training.
We always value feedback from our community, so can you tell us what features you’d like us to add to make easier for other data access frameworks to integrate with JPA or Hibernate?
I know we’ve discussed the fact that the existing
will be improved in Hibernate 6.0. This is probably one of the most
interesting SPIs for other data access frameworks, like jOOQ. I hope the
new version will be standardized in JPA and allow for really easy custom
transformation between flat result sets and entity graphs.`
From my perspective, I’ve always wondered why popular ORMs like
Hibernate do everything in a single tool, mostly:
the modeling part
the mapping part
the querying part
the session/cache management part
If these parts could be split into different and independent JPA/Hibernate modules, the whole toolchain could be even more powerful. For instance, if there was a Hibernate mapping library that cares only about how to map between flat data and annotated entity graphs (but wouldn’t worry about managing such entities, or about fetching them, as that would belong to the session/cache management part, or the querying part), that would be really useful.
Thank you, Lukas, for taking your time. It is a great honor to have you here. To reach Lukas, you can follow him on Twitter.
In this post, I’d like you to meet Rafael Ponte, a software developer, blogger, conference speaker, and Java Persistence aficionado.
Hi, Rafael. Would you like to introduce yourself and tell us a little bit about your developer experience?
First of all, thank you very much for inviting me! My name is Rafael Ponte (@rponte on Twitter), a 33-year-old software developer, instructor, and co-founder at TriadWorks, currently living in Ceará, in the Northeast region of Brazil.
I’ve been working with software development since 2005, and most of my experience is developing and integrating enterprise systems with Java and relational databases. Although I’ve worked with other languages and technologies, I must be honest: I’m a Java enthusiast. That’s why during all this time I’ve been working with many Java technologies and frameworks and, I’ve tried to learn how to make the most of them.
For the last 4 years, I’ve been working in education with my own company, TriadWorks, where we’ve been training hundreds of developers and students on how to develop better software with the most popular Java technologies and approaches, like JPA with Hibernate, Spring, VRaptor, JSF with PrimeFaces and CDI, TDD and automated tests with JUnit, database migrations, continuous integration (CI) and so on.
So, most of my career has been helping developers, teams, and companies to build long-term software through consultancy and training.
You’ve written many articles about JPA and Hibernate on your blog. What motivates you to write on this particular topic?
Since my early days as a junior developer, I felt I could help other developers learn better and more easily what I had a hard time learning. So I started helping them by answering their questions in many discussion lists/forums, giving presentations in popular events in Brazil, and writing articles on my personal blog and the one of the company.
In Brazil, most of enterprise systems use JPA and Hibernate as persistence framework. While doing consultancy, I’ve realized many teams and developers were struggling with the framework just because they didn’t understand how to use it in scenarios different than CRUDs or, in some cases, they didn’t have a solid background about the challenges of persistence they could find and how to get the best out of the Hibernate.
So, writing articles in my blog about Hibernate and those common issues is my contribution to help other developers to overcome many of these challenges.
You are also speaking at conferences throughout Brazil. Do you think that JPA and Hibernate presentations are in good demand, even 16 years after Hibernate was first released, and 11 years since we’ve been having JPA?
As I said, in Brazil most of the companies have been choosing Hibernate as the first option when persistence is the focus. But unfortunately, most of the teams don’t have a senior developer with good experience in JPA and Hibernate, so usually, they end up facing the most common problems when building the software.
You know, problems like:
queries fetching more data than needed,
out of memory issues in batch process tasks,
incorrect configuration of connection pool, etc.
Therefore, one way to help them is talking about those problems and how to avoid them through presentations, articles, and books.
In fact, your book, High-Performance Java Persistence, certainly is one of the best ones to get started.
What are the main causes of permanence related issues when using JPA and Hibernate and what should developers do to overcome these problems?
In enterprise systems, what I’ve noted is that most of the bottlenecks come from persistence layer, so in my opinion, the main cause is related to developers not knowing the basics about persistence and how a relational database works.
The lack of this knowledge usually causes several problems when using any kind of persistence framework. So, it doesn’t matter what framework you’re using if you don’t understand the problem you have to solve.
For example, in applications that use JPA and Hibernate, lots of them still:
don’t use a connection pool,
queries are written without any concerned about N+1 Select,
queries are fetching too much data but using only a small part of them,
wrong mapping and fetch type configuration,
manual or poor transaction management,
bad implementation of batch processing, and so on.
What can developers do?
Well, first off, they must understand the problem they’re dealing with, then, after that, they can find solutions inside of the persistence framework they use. In case their current framework doesn’t solve it efficiently, they could try other options, like jOOQ framework for example.
We always value feedback from our users, so can you tell us what you’d like us to improve or are there features that we should add support for?
Wow, that’s a really good question!
Two years ago, I would say the documentation could be a little bit better, but you, Vlad, have been doing a great job improving it over the last years. Indeed, Hibernate team has done a great job since the beginning so I can’t think of a place which deserves an improvement right now.
In my opinion, Hibernate is a complete framework in every way. That’s why it’s the first option in the enterprise world.
Thank you, Rafael, for taking your time. It is a great honor to have you here. To reach Rafael, you can follow him on Twitter.
In this post, I’d like you to meet Jakub Kubryński, a software developer, blogger, conference speaker, and Java Persistence aficionado.
Hi, Jakub. Would you like to introduce yourself and tell us a little bit about your developer experience?
Thank you very much for inviting me here. My name is Jakub Kubryński (@jkubrynski on Twitter).
I live in Poland. I’ve been playing with programming since I was 7 years old.
My first language was Fortran 77, which was introduced to me by my dad, who is a researcher in aerodynamics. In 2004, I started my first regular job as a software developer, and since then I’ve stayed involved in this industry.
My career path has gone through Junior Developer to Architect, Team Leader, Development Manager and Trainer.
For the last 5 years, I’ve been working for my own company, Devskiller, where we’ve created a tool for testing technical skills of professional software developers and DevOps engineers, including practical knowledge of tools, libraries, and frameworks.
I liked your JPA Beyond copy-paste presentation. What did you decide to cover this topic in a conference presentation?
Apart from my main job at Devskiller, I try to give back some of my experience by training other developers.
During my professional career, I’ve already trained around 700 developers. Among other subjects, I give many trainings related to Java performance and JPA.
During those trainings, I’ve realized that many developers have problems with similar things. Those things are extremely important when working with JPA, so why not cover them in a conference talk to help more people than I can do during my trainings.
How many conferences did you give this presentation and what was the audience reaction upon unraveling so many tips about JPA and Hibernate performance?
As we all know, the Golden Hammer does not exist. While Hibernate is still the most popular persistence technology in Java ecosystem, we shouldn’t limit ourselves to just one technology.
We also, pretty often, decide to migrate the read model to non-relational databases like MongoDB.
I’m also curious how NewSQL databases (based on Google Spanner white paper) will change the situation. The goal is to get relational databases with performance and scalability similar to those of NoSQL engines.
Since NewSQL is still SQL-driven, we could probably get Hibernate to be their default interface to the Java world.
What are the main causes of permanence related issues when using JPA and Hibernate and what should developers do to overcome these problems?
The biggest problem is that developers are trusting Hibernate so much that they don’t even check what queries are executed under the hood.
In fact, both mappings and JPQL queries can be improved a lot by changing just a few lines of code like, for example,
removing joining tables or
adding fetch joins into queries.
There is also some serious incomprehension of the level-one cache behavior, which is especially important when doing batch processing.
Last but definitely not least, the problem is related to the Open Session in View anti-pattern. Allowing developers to ignore transactional boundaries can lead to serious performance issues after releasing the product into production.
We always value feedback from our users, so can you tell us what you’d like us to improve or are there features that we should add support for?
At first, I want to congratulate you. Hibernate is one of the oldest (or the oldest?) and still alive open-source project related to Enterprise Java.
I think the moment you’ve joined the core team refreshed the spirit. I hope you’ll continue working on making Hibernate simpler for newbies - even just by discovering and logging common bad practices related to modeling entities.
Thank you, Jakub, for taking your time. It is a great honor to have you here. To reach Jakub, you can follow him on Twitter.
In this post, I’d like you to meet Arnold Gálovics, a software developer, blogger, and Java Persistence aficionado.
Hi, Arnold. Would you like to introduce yourself and tell us a little bit about your developer experience?
Hi, Vlad. First of all, I’d like to thank you for having me. My name is Arnold Gálovics (@ArnoldGalovics on Twitter). I’m 25 years old and currently living in the capital of Hungary, Budapest.
Since I got my first computer, I’ve always been curious how software works, so I got into the software development industry to see what’s happening under the hood.
In the last 5 years, I worked in the telecommunication industry where I was responsible for developing an application for strictly internal usage which was purely just getting and manipulating the data we needed and at the end wiring it together.
After that, I got into the financial industry and now I’m working on a very interesting project where we are creating a huge platform with all the new fancy tools/technologies.
In the last few years, I was mostly focusing on the standard Java stack, Oracle, JPA/Hibernate, Spring, Eclipse RCP, Angular. First, when I got into the JPA world, it was like real magic for me what was happening and how clever the people are who wrote such a tool. A couple of months ago I wanted to try myself out in the open-source community, and I chose Hibernate ORM to contribute into as it was familiar to me and, of course, I wanted to give something back.
In the meantime, I also started a technical blog where I love to publish articles. Despite the fact, I don’t have the time I’d like to invest into blogging because I’m currently working as a Senior Software Engineer in a full-time job, but I’m trying to work on it as much as I can.
You wrote several articles about JPA and Hibernate on your blog. In your experience, is it JPA and Hibernate a valid option for a typical Java EE application?
Absolutely. Especially for smaller projects but of course it’s reliable for bigger ones as well you just have to understand the consequences of the choices you make. In good hands, for a relational database, Hibernate is a really great choice.
There are tons of great features which are helping you throughout the evolution of the application, and I strongly feel that especially lately the Hibernate team did an amazing job for making the ORM framework better and better.
What are the main causes of performance related issues when using JPA and Hibernate and what should developers do to overcomes these problems?
As I mentioned, there are lots of great features offered by Hibernate, but people often think that Hibernate is a silver bullet and they usually don’t understand the concept or how a certain feature should be used.
The main problem is that people tend to forget that under JPA/Hibernate, there are SQL statements executed against the underlying database and they don’t care about this, thus struggling with the performance of the application.
There are simple things which you should look out for, like the merge pitfall of declarative transaction management, entity state transitions, fetching only the data you need, etc.
I’ve seen many projects which are using anti-patterns in the data-access layer like putting every association to EAGER fetching, having a general data-access layer, wrong transaction boundaries.
Usually, after a couple of performance issues, projects want to drop JPA/Hibernate just to go with pure SQL to have better performance, however, understanding the problem and verifying the executed SQL statements is the way to go in my opinion instead of dropping the ORM completely.
My main message here is: you can use JPA/Hibernate for your project, just understand how it works and how to use it efficiently. There are books, blogs, the user guide improved a lot.
I also suggest visiting my blog as I usually write about Hibernate and performance related problems.
When you are developing an enterprise application, do you consider designing for performance from the very beginning, or do you postpone it in the name of "premature optimization"?
I’ve seen many issues popping up during the last few years, and most of them occurred due to not thinking about the performance in advance. I think JDBC batching is the perfect example for this as there are lots of things you have to be careful about, like JDBC driver, underlying DB, not flushing in the middle of a transaction, reordering, etc.
In my opinion, it’s essential to consider performance from the beginning because there are so many things which might go wrong in the data-access layer.
We always value feedback from our users, so can you tell us what you’d like us to improve or are there features that we should add support for?
I’m really thankful for the Hibernate team because you guys are making a great tool for the community. I think Hibernate is currently in a state where most of the features are in place, and only minor adjustments are necessary so I wouldn’t add anything special here.
Thank you, Arnold, for taking your time. It is a great honor to have you here. To reach Arnold, you can follow him on Twitter.
In this post, I’d like you to meet Anghel Leonard, a software developer, blogger, book author, and Java EE aficionado.
Hi, Leonard. Would you like to introduce yourself and tell us a little bit about your developer experience?
Hi Vlad, thanks for having me. My name is Anghel Leonard (@anghelleonard on Twitter), I’m living in a small village in Romania, and I’ve been a software developer for the last 17 years, mainly focusing on Java and Web development. I’m also a blogger and author of several books.
My career started as a Java developer in the oil field (Petrom S.A.). Mainly there I’ve been part of the team whose goal was to develop applications in the oil simulation field (Java desktop applications meant apply some specific mathematical models). After some time, we switched to web applications (based on HTML, Servlets /JSP and Struts) and we brought databases into the equation as well (especially MySQL and Visual FoxPro). About then I’ve started with Hibernate ORM, "native" API.
Shortly, I’ve started to learn Java EE (mainly, JSF, EJB, and JPA) and Spring. Further, I’ve worked for many years in GIS field developing RIA and SPA web applications. Since then I’m constantly using Hibernate implementation of the Java Persistence API (JPA) specification. By their nature, GIS RIA/SPA applications process a lot of data (spatial and non-spatial data) and must run fast, so I’ve always been interested to optimize the performance of the persistent layer. I can say that I’ve seen Hibernate "growing" and I’ve constantly tried to learn about every new feature and improvement it brought :)
Currently, I’m working as a Java CA. My main task here is to perform code reviews on a significant number of projects of different sizes, technologies, and areas of interest. Many of these projects use Hibernate "native" API and Hibernate JPA.
You are a very prolific writer, having published several books about JSF, Omnifaces, Hibernate, MongoDB and even JBoss Tools. Now that you are self-publishing your latest book, can you tell us the difference between working with a publisher and going on your own?
Well, I think that it is obvious that choosing between a publisher and self-publishing is a trade-off matter. From my experience, there are pros and cons on both sides and I can highlight the following aspects:
Publisher vs self-publishing pros:
publishers provide grammar and check spelling (this can be a major advantage for non-native English speakers, as me) and technical reviewers (usually, authors can recommend reviewers as well) while in self-publishing the author must take care of this aspects
publishers take care of book’s cover and index while in self-publishing the author must do it
publishers can consider the author very good and contact him for further contracts on the same topic while this is not true in self-publishing (at least, I did not hear of it)
publishers provide constant and consistent assistance during the writing process (e-mail or Skype) via editors, project coordinator, technical stuff, etc while in self-publishing this is not available
for authors is 100% costs free while in self-publishing the costs can seriously vary
publishers can be powerful brands on the market
Publisher vs self-publishing cons:
publishers can reject a book proposal for different reasons (e.g. they already have too many books on the suggested topic) while in self-publishing the chances to be accepted are significantly bigger
publishers work only with deadlines (each chapter has a fixed date and the book has a fixed calendar) while is self-publishing the author decides when to release an updated version and how significant the update will be
publishers provide "Errata" for book issues (typos, mistakes, technical leaks, content accuracy issues, etc) and those issues can be fixed only in subsequent versions of the book while in self-publishing the author can fix issues immediately and repeatable
publishers usually pay significantly smaller royalties in comparison with self-publishing
typically, publishers pay royalties at every 6 months, while in self-publishing is more often
publishers are not quite flexible about the book size, aspect, format, writing style, etc while in self-publishing these coordinates are very flexible
publishers require to be the only owner of the book content and it is forbidden to publish it in any other place or form while in self-publishing this restriction is not always applied
publishers set the price of the book without consulting the author while in self-publishing the author sets the price (moreover, the author can choose the price in a range of values)
publishers decide the discounts and donations policy while in self-publishing the author can provide coupons, discounts and make donations.
Your latest book is called "Java Persistence Performance Illustrated Guide". Could you tell us more about this new project of yours?
Well, in the beginning, the content of this new book was not meant to be part of any book. The story is like this: Over time, I have collected the best online articles about Java persistence layer (JPA, Hibernate, SQL, etc) and, on my desk, I have some of the most amazing books about these topics.
In order to mitigate the performance issues related to persistence layer, I strongly and constantly recommend these resources to developers involved in persistence layer, but the remaining question is: in a regular day of work, how the members of a team can quickly understand/recognize a performance issue in order to fix it?
Well, there is no time to study in that moment, so I decided to have a new approach: have drawn of a specific performance issue and for 5-15 minutes talk on that draw (after all, "a picture is worth a thousand words"). This way, the audience can quickly understand/recognize the issue and have the hints to fix it.
Further, I’ve published these draws on Twitter, where I was surprised to see that even without the words (the talk), they were appreciated. Well, over time I’ve collected a significant number of draws and people started asking me if I will publish them somewhere (I remember that we had a little talk about this on Twitter as well). And, this is how the idea of the book was born. :)
The main reason of choosing the self-publishing approach was the fact that I’m not constrained by fix deadlines. The only extra-effort I’ve done was to find somebody to make the cover - it was designed and drawn by an excellent painter, Mr. Barsan Florian.
Now, the goal of this book is to act as a quick illustrated guide for developers that need to deal with persistence layer performance issues (SQL, JDBC, JPA, Hibernate (most covered) and Hazelcast).
Each drawing is accompanied by a short description of the issue and the solution. It’s like "first-aid", a quick and condensed recipe that can be followed by an extended and comprehensive article with examples and benchmarks, as you have on your blog.
What are the main causes of performance related issues for a typical Java EE application and what should developers do to overcomes these problems?
Most of the applications that I reviewed are Java EE and Spring based applications. Since most of the performance penalties have their roots in the persistence layer, I tried to make a top 10 of the most frequent programming mistakes that cause them (this trend was computed from ~300 PRs in different projects and it is in progress):
Having long or useless transactions (e.g. using @Transactional at class level on Spring controllers that delegate tasks to "heavy" services or never interact with the database)
Avoiding PreparedStatement bind parameters and using "+" concatenations for setting parameters in SQL queries.
Fetching too much data from the database instead of using a combinations of DTO, LIMIT/ROWNUM/TOP and JOINs (e.g. in the worst scenario: a read-only query (marked as read-only or not) fetches all entities, a stream is created over the returned collection, and afterwards, the findFirst stream terminal operation is executed in order to fetch and use further a single entity).
Wrong configuration of batching (the team lives with the sensation that batching is working behind the scene, but they don’t check/inspect the actually SQLs and batch size)
Bad usage or missing transaction boundaries (e.g. omitting @Transactional for read-only queries or executing separate transactions for a bunch of SQL statements that must run in a single transaction)
Ignoring the fact that data is loaded eagerly.
Don’t rely on a pool connection or avoid tuning the pool connection (Flexy Pool should be promoted intensively). Even worse, increase the connections number to 300, 400.
Use unidirectional one-to-many associations with insert and delete entities operations
Using CriteriaBuilder for all SQL statements and rely on whatever is generated behind the scene
Lack of knowledge about Hibernate features (e.g. attributes lazy loading, bytecode enhancement, delay DB connection acquisition, suppress sending DISTINCT to the database, etc)
First I want to congratulate the whole Hibernate team because is doing a great job! I really love the latest features and the comprehensive improvements in documentation. Taking into account the type of applications that I’m involved in, I will like to see the Hibernate - CDI integration ready.
Thank you, Leonard, for taking your time. It is a great honor to have you here. To reach Leonard, you can follow him on Twitter.
In this post, I’d like you to meet Kevin Peters, a Software Developer from Germany and Hibernate aficionado.
Hi, Kevin. Would you like to introduce yourself and tell us a little bit about your developer experience?
My name is Kevin Peters, and I live in Germany where I work as a Software Developer. My first contact with the Java language was around 2005 during my vocational training, and I fell in love with it immediately.
I worked for several companies leveraging Java and Spring to implement ERP extensions, customizing eCommerce systems and PIM solutions. Nearly one year ago, I joined the GBTEC Software + Consulting AG, one of the leading suppliers of business process management (BPM) software, and there we are now reimplementing a BPM system in a cloud-based manner using Dockerized Spring Boot microservices.
You have recently mentioned on Twitter a DataSource proxy solution for validating auto-generated statements. Can you tell us what about this tool and how it works?
We use Spring Data JPA with Hibernate as JPA provider to implement our persistence layer, and we really enjoy the convenience coming along with it. But we also know about the "common" obstacles like Cartesian Products or the N+1 query problem while working with an ORM framework.
In our daily technical discussions and during knowledge transfer sessions we try to raise awareness for these topics among our colleagues, and in my opinion, the best way to achieve this is implementing tests and real world code examples showing that practically.
I started to prepare a small mapping example for one of our technical meetings, called "techtime", to demonstrate the "unordered element collection recreation" issue, and I wanted to show the unexpected amount of queries fired in this simple use case.
Fortunately, I came across the ttddyy/datasource-proxy GitHub project which helped me a lot to make that problem tangible.
The datasource-proxy project empowers you to wrap your existing datasource with a proxy and allows you to count all executed queries separated by query type (e.g.
With that opportunity you can not only write tests which assert that you are doing the right thing within your use cases, you can also check if you are doing it in an effective way and avoid the traps I did mention before.
At the time when our Coding Architect Ingo Griebsch suggested to use this approach to enhance our test environment by automating the hunt for performance penalties, you caught us talking about your article on Twitter.
Proxies are a great way to add cross-cutting concerns without cluttering business logic. For instance, FlexyPool brings Monitoring and Fallback capabilities to connection pools. Are you using Proxies for other concerns as well, like logging statements?
There are many ways to enrich application code with proxies, facades or aspects. Starting with small things like logging with a facade like SLF4J, using Spring Security for access control, Hystrix service-to-service communication or even "basic" stuff like transactions in Spring Data, all these features are working with proxies, and we won’t miss them anymore.
Why did you choose Hibernate for that particular project, and did it meet your expectations, especially when it comes to application performance?
Hibernate provides a lot of convenience to us, especially if we combine it with Spring Data JPA. But the fact I enjoy most is that you can still switch to Hibernate specific features like Hibernate Named Queries or special Hibernate annotations.
It’s important to know when you can relax using "magic" ORM features and when the opposite is needed - forgo bidirectional relations and write HQL instead or even using database native queries to receive complex data. In our opinion, Hibernate offers the best balance between convenience and performance if one knows how to use it.
Hence, we have a quite complex data model and customers which store a lot of data it’s vital for our software to fetch and write data in a performant way in every of our use cases. And in case of any doubts, at least your articles help us getting things done right.
In general, we love the feature set of Hibernate. Only the support of UNION HQL queries/Criteria API would be an awesome feature that we missed recently.
Thank you, Kevin, for taking your time. It is a great honor to have you here. To reach Kevin, you can follow him on Twitter.
Hi, Marco. Would you like to introduce yourself and tell us a little bit about your developer experience?
I’m Marco "Ocramius" Pivetta, an Italian PHP consultant, currently living in Germany. Yes, the nickname is weird, but it comes from an era of Quake 3 Arena, Unreal Tournament & co.
I’ve been tinkering with computers since I was a child, and have been working with PHP for more than half my life now, developing a love-hate relationship with the language. Interestingly, I didn’t start with the usual Joomla/Wordpress/Drupal/etc, but built a quite complex website that interacted with a browser game called "OGame", and scraped game information through a Firefox addon that would then provide an additional information to the players.
The reason why this project ("stogame") is important for me is that it included extremely challenging problems to be solved for a rookie with no help at all, and is still one of the most complex projects that I worked on:
XSS/SQL injections - had those, wasn’t fun
queuing mechanisms to sync browser extensions and the website - invented my own system
optimizing queries and indexes on ~60Gb of MySQL MyISAM tables
disaster recoveries on such a system - had those too, wasn’t fun either
real-time push mechanisms for clients via BOSHXMPP
simplistic prediction engine to aid players in decision making
All of the above were built by 15-years-old-me by just spending countless sleepless nights on it, and also jeopardizing my school evaluations. Still, this was before libraries, design patterns, mentoring, Github: only me, some friends, and a good amount of design and prediction work.
I then moved on, gave up on the project, failed university (I’m a terrible student), got a few jobs and started using frameworks. Eventually, I got to work with all of the typical DB abstraction approaches:
Active Record (with ZendFramework)
Table Data Gateway - in a custom solution
Data Mapper - in a Java EE project
I liked the JPA approach in the Java EE project so much that I started looking for a PHP analogue solution for my daytime job, and ended up discovering Doctrine 2.
Since then, I started getting more and more involved with the project, starting from answering questions on the mailing list and StackOverflow. Benjamin Eberlei, who was the lead on the project at that time, pushed me towards contributing with actual code changes back in 2011.
Eventually, I became part of the maintainers of the project, and that also boosted my career, allowing me to become a consultant for Roave, which allows me to see dozens of different projects, teams and tools every month, as well as a public speaker.
You are one of the developers of Doctrine ORM framework. Can you please tell us what’s the goal of Doctrine?
I am actually not one of the developers, but one of the current maintainers. The initial designers of the current Doctrine 2 ORM, as far as I know, are Jonathan Wage, Guilherme Blanco, Benjamin Eberlei and Roman Borschel. I can probably still answer the question: Doctrine ORM tries to abstract the "database thinking" away from PHP software projects, while still being a leaky abstraction on purpose.
To clarify, most PHP developers are used to developing applications from the database up to the application layer, rather than from the domain logic down, and that’s a quite widespread problem that leads to hardly maintainable and unreadable code. This tool gets rid of most of those problems, by still allowing developers to access the database directly when needed.
Interestingly, Doctrine 1.x was an Active Record library, and also a quite good one, but it became evident quite quickly that the JPA specification and Data Mapper plus Unit of Work were better solutions altogether.
Specifically, the Data Mapper approach allows consumers of the library to write abstractions that decouple the tool from the domain almost completely (there are always limitations to this). The Unit of Work pattern has an increased memory impact for PHP applications, but also massively reduces required query operations (via in-memory identity maps) while adding some transactional boundaries, and that is a big win for most PHP apps, which often don’t even use transactions at all.
There are more advantages, but I personally wouldn’t ever consider using Active Record again due to its limitations and inherent framework coupling. This doesn’t mean that Active Record doesn’t work, but I’ve been burnt many more times with AR than with DM.
Since Hibernate ORM has been influencing Doctrine, can you tell use the similarities and differences between these two frameworks?
Doctrine is hugely inspired by Hibernate and the JPA, although we couldn’t really copy things, both due to licensing issues and life-cycle differences in Java and PHP software.
Doctrine resembles Hibernate in the Unit of Work, mappings, basic event system, second level cache and the DQL language (HQL in Hibernate). We even designed an annotation system for PHP, since the language doesn’t support them, and it currently is the de-facto standard for custom annotations in PHP libraries, and we initially only needed this to simulate inline mappings like Hibernate allows them.
Where things differ a lot are flexibility and lifecycle, since Java is an AOT-compiled language with a powerful JIT and generally deployed in long-running applications.
PHP is an interpreted language, and its strength is also its pitfall: the typical share-nothing architecture allows for short-lived, memory-safe, retry-able application runs. That also means that we have no connection pooling, and the ORM internals are much more inflexible and less event-driven than Hibernate’s due to memory and execution time constraints. That also means that we rarely encounter memory issues due to large Unit of Work instances, and connections and entity instances aren’t shared across separate web application page loads, and slow ORM will unlikely slow down an entire application server.
Another huge difference is managed state: DETACHED makes little sense in the PHP world, since a detached entity may only come from serialized state. In Doctrine 3.x, we are planning to remove support for detaching entities, since storing serialized objects in PHP is generally leading to security issues and more trouble.
As you can see, the differences are indeed mostly in the lifecycle, but each language and framework has its strengths and pitfalls.
I’m probably being weird here, but I don’t lack any particular features from either ORM at this time. What would be interesting is reducing support for entity and transaction lifecycle events, since most consumers of these ORMs tend to code application and domain logic in those, while they were mostly intended for technical tasks, such as creating audit logs and executing pre- and post- DB cleanup tasks.
A possible improvement is to explore saving/loading of single aggregate-root-acting entities attached to a Unit of Work, which is only responsible for tracking state in child aggregates. This is only to prevent sharing entity references across aggregates, and to prevent DB transactions from crossing aggregate root boundaries.
Thank you, Marco, for taking your time. It is a great honor to have you here. To reach Marco, you can follow him on Twitter.
In this post, I’d like you to meet Dmitry Alexandrov, who, not only that he’s a well known Java technologist and conference speaker, but he’s also a polyglot, speaking 6 languages (e.g. Russian, Bulgarian, Ukrainian, English, German, and French).
Hi, Dmitry. Would you like to introduce yourself and tell us a little bit about your developer experience?
Hi! My name is Dmitry Aleksandrov and currently for more than a year I’m a Principal expert developer and architect at T-Systems. I’ve got 10 years experience mainly in Java EE/Spring stack.
You have recently published an article about a major performance optimization you underwent in one enterprise project. Can you tell us what are the most common performance issues in enterprise systems?
Surprisingly, or actually not so much surprisingly, the most of the optimizations in enterprise projects are made on the persistence layer. The way the data is stored and accessed is essential as the most of the latency may come out of there.
The other source of latency may be the remote calls, but the only way gain performance there is to reduce their quantity and upgrade the hardware architecture. As for the persistence much more can be done in this field. It is essential to really pay attention to what is taken out the DB and what is shown to the user. Heavy CPU processing is rarely seen, at least from my experience.
So. it is really important to invest time in a good design of the persistence layer. ORMs are doing really great job, and the automation they have brought saves tremendous efforts, time and money. But at the same time, the users of the ORMs are a little bit spoiled of the magic they bring.
The developers and architects tend to design the object model as the primary source of data and the DB schemas as a product of the model and heavily rely on the ORM to manage this. This quite often leads to very suboptimal data representation in the RDBMS thus performance issues, since the mathematics in Relational Databases are much different from those of the programming language objects. And those mistakes are often very hard and expensive to fix, as DB schemas are extremely hard to change especially when they are in production already.
And the ORM, although it is an extremely smart tool nowadays, is still not an AI (yet). So to deal with those problems, I believe that every enterprise or full stack developer should invest more time in educating in Databases and the way their programming language interacts with them. A good persistence layer design may solve the most of the performance issues or even fully prevent them from happening.
Hibernate offers many optimizations that aim to increase application performance. Has Hibernate met your goals in the projects you’ve been involved with?
Yes, definitely. Although we try to use as much standard JPA as possible, on our final customer deployments we also do Hibernate specific optimizations, like pre-build code instrumentation if we use Hibernate version 5. In one of my previous project we have used some second-level caching, and Hibernate integrated almost seamlessly.
You are a Java EE aficionado and international speaker. How to you see the future of Java EE and JPA in the context of cloud computing and Microservices architectures?
Java EE is a subject of many discussions recently. Quite a lot of even fatal prognoses were made, but I personally believe Java EE will still be there and make big progress. There is a huge аmount of companies and enterprises that build their business with Java EE technologies, and they won’t disappear soon since EE is a proven standard.
Actually, this is the main advantage of Java EE – it is a standard. It means it is guaranteed, a reliable and tested set of functionalities that have the same behavior and results on all supported platforms. And a standard is not something that is just assigned, standards are established based on what’s the best and most valuable in current technology at the moment. And the establishment of these technologies the most often comes from the community.
A good example of community effort is exactly the Microprofile initiative, which is driven by Independent Java EE Server vendors. As Microservices are now very popular, the activists try to create a really common solution for the best utilization of this architecture on Java EE.
Although there are some controversies about what should this profile include, there is a starting point. The discussion is open, and everybody is welcome to contribute. Actually, it is very curious to see how a standard is being born! The guys are doing a great job! Another example is the Java EE Guardians who are doing great input in all aspects of the Java EE evolution!
As for the cloud, Oracle has made some promises that they will put more efforts in a better Java EE cloud integration. But as for now in our environments, we have a mixture of PaaS and IaaS solutions. Like some of the servers are Dockerized or packed as executable jars and running somewhere in the cloud, and the databases are provided like services. But there we have some issues with the latency.
I am now waiting for the full support of the Entity Graph functionality. I personally believe that’s a very handy way to have a good fine control over what you fetch and can give some really good performance improvements, especially on systems which are in production already.
Thank you, Dmitry, for taking your time. It is a great honor to have you here. To reach Dmitry, you can follow him on Twitter.
In this post, I’d like you to meet Simon Martinelli, who, among many other things, is teaching Java EE and JPA at the University of Applied Sciences in Berne.
Hi, Simon. Would you like to introduce yourself and tell us a little bit about your developer experience?
I’m a software architect, developer, and trainer from Switzerland working for simas GmbH. Besides that, I’m a lecturer for Java EE, architecture and design and persistence technologies at the University of Applied Sciences in Berne. In my spare time, I’m working on some open source projects and as I’m an expert group member of JSR 352 Batch and JSR 354 Money and Currency API.
I started my IT career in 1995 as a Cobol developer on IBM mainframe. Since 2000, I’ve been working in many enterprise projects as a developer, architect, and technical lead, using J2EE/Java EE, Spring framework, and from time to time .NET. My first contact with OR-Mapping was in 2000 when we used TopLink in a project for Swiss Railways.
You have an open-source project on GitHub called Query Language Result Mapper. Can you tell us what’s the goal of this framework?
I love the JPA constructor expression. In my opinion, it’s the best way to get around the common performance problems when using JPA in a naive way. But the constructor expression only works with JPQL or Criteria API and sometimes you need to execute a SQL query but don’t want to use a fully featured SQL framework like jOOQ.
Sure, JPA comes with the ConstructorResult but I find it too complicated, and it was not available with JPA 1.0. Hibernate has the
ResultTransformer, but this only works with Hibernate. So I decided to start Query Language Result Mapper (QLRM).
QLRM simply tries to find a matching constructor based on a JPA native query result or, when using plain JDBC, a JDBC
ResultSet. It’s simple, small and not related to a specific JPA implementation.
You are also teaching Java EE and Java Persistence API at the University of Applied Science in Berne. Is JPA easy to learn? What do your students think of this Java Persistence standard?
JPA is by far the most complicated part of Java EE to learn. Because it’s leaky abstraction, you have to know a lot about SQL databases and what’s going on behind the scenes. It’s also the most common source of performance problems. While teaching, I always try to focus on how to avoid these performance problems.
My students usually like JPA because it makes data access much easier. For those who don’t know the history of OR-Mapping, it’s hard to understand that the JPA implementation behaves differently in some situations. What they don’t like is that (currently) JPA does not support the Java 8 Date Time API.
Since I started blogging, I realized that explaining a subject helps me better understand it. Do you think all developers should be involved in teaching or writing articles or books about the subjects they want to master?
Absolutely! I’m teaching JPA since 2007, and this forces me to get a deep understanding of the topic because the questions of the students can be very challenging. Sometimes I even have to look at the code of the JPA implementation to understand what happens under the hood. In return, this know-how helps me to write better and faster applications.
As Albert Einstein said: "If you can’t explain it simply, you don’t understand it well enough."
I think the whole Hibernate team is doing a great job! It’s more than feature-complete (it’s hard to know every feature, though).
Just one minor thing: When reading the documentation it’s often hard to differentiate what’s JPA standard and what’s Hibernate specific. But I don’t think that this is very important because not many developers are switching the JPA implementation in a project.
Thank you, Simon, for taking your time. It is a great honor to have you here. To reach Simon, you can follow him on Twitter.
In this post, I’d like you to meet Christian Beikov, who is one of the most active Hibernate contributors.
Hi, Christian. Would you like to introduce yourself and tell us a little bit about your developer experience?
Hey, Vlad. My name is Christian Beikov, and I am 25. I’m living with my girlfriend in Vienna, the capital city of Austria. I started working with Java EE technologies in school and continued to do so at my first job where I am still employed part-time. Next to my job, I am doing my master studies in Software Engineering at TU Wien which I am hopefully finishing next year.
My main interests are in distributed systems, database technologies and everything Java/JVM-related. In school, I came into contact with Java EE for the first time when I was by developing a JSF-based web app with Hibernate on top of GlassFish with NetBeans. When I started my job at Curecomp GmbH, I mainly worked with Eclipse and WebSphere and about 2 years ago, I managed to fully migrate the company’s development stack to WildFly and IntelliJ IDEA. During these migrations and the countless university assignment projects in which I have used Hibernate, I’ve stumbled upon one or another bug.
You’ve been very active in the Hibernate ecosystem, sending Pull Requests and getting involved in future design discussions. How do you manage to blend the open-source involvement with your day job?
The work I am doing in open-source projects happens mostly in my free time. I like to give back something to the community, even if it’s just bug reports. Since I use Hibernate in so many projects, I also see my contributions as an investment in improving the overall quality of the projects I do.
At my day job, I am sometimes facing problems that I simply can’t workaround or doing a proper fix seems equally hard to me, which is how I justify fixing the Hibernate bug in the core. The deep knowledge that I gain from analyzing bugs and discussing features also helps me in my day job when reasoning about the behavior of Hibernate in certain situations which is a big plus.
You are also developing Blaze Persistence. Can you tell us a little bit about this framework and how does it compare to Criteria API?
Blaze-Persistence is a library on top of the JPA APIs. The core module provides a fluent query builder API that allows you to express queries in a Java DSL which should feel mostly intuitive. In addition to the standard features that are defined in JPA 2.1, it also implements support for some common functionality that already every JPA provider supports like for example aliasing fetch joins or entity joins. On top of that, Blaze-Persistence also provides deep integration with the JPA provider to support features like (recursive) CTEs or set operations like UNION, etc. Beware that the deep integration is currently only available for Hibernate since it is the provider I am mostly familiar with, but support for others is planned.
One of the greatest features that Blaze-Persistence makes possible are Entity Views which are to JPA entities roughly what views are to tables in the RDBMS sense. An Entity View is an interface or abstract class that represents the structure of a projection for an entity. It’s basically the definition of a DTO, with the difference that you only need to specify getter methods along with the projection for that attribute as JPQL expression. When you then apply the Entity View on a base query, it will contribute the JPQL expressions as select items, thus creating an optimized JPQL and SQL query. The result of such a query, of course, is a list of objects that are a subtype of the Entity View. Apart from avoiding all the manual plumbing code to get the data into shape, you can make use of features like Collection mappings, Subviews or SubqueryProviders which let you define complex projections that one would normally not do.
The Criteria API provided by JPA is hard to use as it requires a lot of typing and also some kind of skill. You need to know how to wire things up which is one of the big pain points that I tried to solve by introducing a fluent API. Sure the JPA Criteria API is type-safe, but that comes at the cost of obfuscating your query. A type-safe variant of the Blaze-Persistence core API or maybe even just some additional methods in the existing API are already on my roadmap, so I will also try to fill this gap while retaining readability.
Since I don’t expect everyone to rewrite his existing JPA Criteria API based queries, I also implemented the JPA Criteria API on top of the Blaze-Persistence core API. You can even let your existing code build the queries with the JPA Criteria API and retrieve a Blaze-Persistence query builder from it. The resulting query builder can be used just like any other query builder which means you can use CTEs and all the other great features.
Blaze Persistence works with any JPA provider. From your experience, how does Hibernate ORM compare to EclipseLink or OpenJPA?
Just as a disclaimer, I haven’t dug too much into the communities of the other JPA providers as I don’t use them in any of my projects. Also, beware that I might be biased now since I know people from the Hibernate team and know who to contact if I have a problem, but I’ll try to be as neutral as possible.
I got the feeling that the EclipseLink community didn’t care about the bugs I reported or forum posts I did, but apart from that, the implementation seems ok. It has some quirks like e.g. allowing lazy loading although the underlying entity manager is closed, but maybe that’s a feature :D
DataNucleus which is one of the lesser known JPA providers is actually pretty good and the main developer there reacts super fast to bug reports. I found some bugs and also proposed some features to increase Hibernate compatibility and as far as I know, all of these issues have been resolved by now.
I can’t tell you much about OpenJPA except that it seems rather dead or in maintenance mode only to me. The latest version is only JPA 2.0 compatible and unfortunately, lacks even proprietary ways to do certain things that are possible with other JPA providers.
The thing I am mostly unsatisfied with is that most of the issues I found with any JPA provider are pretty basic things and should be asserted by the JPA TCK. I hope some Oracle guy who can actually do something reads this and pushes harder to make the JPA TCK open-source :)
I think Hibernate already does a very good job. What I really would like to see is the decoupling of the SQL generation and execution from the ORM specifics. This is something I often would have needed in one way or another to workaround bugs or simply to execute the SQL that is needed for a specific task. Imagine you could specify a HQL query that just describes how the result mapping should be done, but specify your own SQL. This is something I am doing internally in Blaze-Persistence all the time for advanced queries. I hope the SQM feature that is planned for Hibernate 6 will allow me to do that so I can get rid of the dirty tricks I have to do right now to get stuff done.
Thank you, Christian, for taking your time. It is a great honor to have you here. To reach Christian, you can follow him on Twitter.