Today someone asked us to add some documentation explaining how to deal with addition of elements to very large collections. I’m not sure if this is a topic I really want to talk about in the documentation, but it’s definitely worth a blog.
The problem
The context is the following: suppose I have an entity Book with thousands of Pages.
We might choose to make Page an @Embeddable record, and map this with @ElementCollection.
@Entity
class Book {
@Id String isbn;
@ElementCollection
@OrderColumn(name="number")
List<Page> pages;
...
}
@Embeddable
record Page(String text){}
That looks nice and clean.
Now, imagine that we add a new Page to our book:
emf.runInTransaction(em -> {
var book = em.find(Book.class, isbn);
book.getPages().add(new Page(text));
});
This code fragment results the execution of three SQL statements:
-
the
Bookis retrieved at the first line, -
the collection is fetched at the call to
add()on the second line, and -
an
insertoccurs when the transaction commits.
That’s very inefficient.
Attempted solution
We can improve this in the usual way, by requesting that the collection be fetched upfront:
emf.runInTransaction(em -> {
var bookWithPages = em.createEntityGraph(Book.class);
bookWithPages.addAttributeNode(Book_.pages)
var book = em.find(bookWithPages, isbn);
book.getPages().add(new Page(text));
});
Now we have only two SQL statements being executed.
On the other hand, the first statement — the select — now has to join the Page table.
But a Book can have thousands of pages, so retrieving them all is still costly.
What we would really like to do is avoid initializing the collection of Pages at all.
So using EntityGraph wasn’t really a proper fix.
A proper solution
Instead, we’re going to make the following changes to our model:
-
make
Pagean@Entityinstead of an@Embeddable, -
map
pagesas an unowned@OneToManyinstead of an owned@ElementCollection.
Let’s begin with the Page class:
@Entity // important
class Page {
@Id String isbn; // important!
@Id int number;
String text;
...
}
The Page entity has fields holding:
-
the
isbnof its book, and -
the page
number, its position in theList.
Notice that we didn’t need to give Page a direct reference to Book (though we could have).
The isbn field is good enough.
Now, on the Book side of things we need to change the collection mapping to:
@Entity
class Book {
@Id String isbn;
@OneToMany(cascade = PERSIST,
mappedBy = Page_.ISBN) // important!
@OrderBy(Page_.NUMBER)
List<Page> pages;
...
}
Notice here that mappedBy = Page_.ISBN is a typesafe reference to the isbn field of Page.
This is crucial—we need the collection to be "unowned".
Now when we execute:
emf.runInTransaction(em -> {
var book = em.find(Book.class, isbn);
book.getPages().add(new Page(isbn, pageNumber, text));
});
the collection of Page objects does not need to be fetched from the database.
I’ve said it many times before, but I’m going to say it again now: entities with one-to-many associations are much more flexible than more exotic things like @ElementCollection or @ManyToMany.
You will have more success with Hibernate if you map entities in the most boring way.
|
In the past, this problem was sometimes addressed using @LazyCollection(LazyCollectionOption.EXTRA), but this feature was deprecated in Hibernate 6, and has been removed in Hibernate 7.
|
Dealing with sets
There’s one caveat to be aware of.
The add() method of Set returns a boolean value that can’t be computed without fetching all the elements of the Set.
Therefore, the solution above does not work for Sets.
However, there’s a much hackier solution which we tolerate but don’t necessarily encourage: just don’t add the new Page to the Set collection, and instead persist() it directly:
emf.runInTransaction(em -> {
em.persist(new Page(isbn, pageNumber, text));
});
With this approach, if the set of Page objects is held in the second-level cache, you will need to explicitly invalidate the cache by calling Cache.evictCollectionData().
|