When we profiled our application’s data import pipeline, we expected the bottleneck to be in our business logic — JQ filters, JavaScript evaluation, recursive SQL queries. Instead, Hibernate’s internal JSON handling consumed over 50% of CPU time and allocated 20 GB of unnecessary objects during a routine import of 5 records.
Two small annotation changes fixed it. No code changes, no architectural redesign — just telling Hibernate what it needs to know about how our application uses its data.
The Application
h5m is a lightweight rewrite of Horreum, a performance regression detection system. It accepts JSON benchmark results, transforms them through a DAG (directed acyclic graph — a pipeline of dependent computation steps where each node’s output feeds into downstream nodes) and detects regressions using statistical methods. The core entity stores computed results as JSONB:
@Entity
public class ValueEntity extends PanacheEntity {
@Column(columnDefinition = "JSONB")
@JdbcTypeCode(SqlTypes.JSON)
@Basic(fetch = FetchType.LAZY)
public JsonNode data;
// ... other fields
}
Each upload creates dozens of ValueEntity instances through a work queue — JQ filter results, fingerprints, change detection outputs. A typical import processes thousands of these entities per minute.
Profiling: Where Is the Time Going?
We profiled a 5-record import from a production dataset using async-profiler with CPU sampling:
java -agentpath:libasyncProfiler.so=start,event=cpu,file=profile.jfr,jfr \
-jar app.jar import-data ...
The results were surprising:
| Category | CPU Samples | % |
|---|---|---|
Hibernate dirty-checking |
6,087 |
34% |
GC |
4,397 |
25% |
Hibernate deepCopy |
2,361 |
13% |
JIT compilation |
2,084 |
12% |
Actual computation |
1,187 |
7% |
Hibernate JSON serialize |
507 |
3% |
47% of CPU was spent on Hibernate’s internal JSON dirty-checking and snapshot management — not on our application logic, not on database queries, but on Hibernate comparing JSON columns to decide whether they changed.
The Problem: FormatMapperBasedJavaType.deepCopy
When Hibernate manages an entity with a JSON column (@JdbcTypeCode(SqlTypes.JSON) with a JsonNode field), it needs to detect changes for dirty-checking. On every em.flush(), Hibernate:
-
Deep-copies the JSON value by serializing it to a string and deserializing it back (
FormatMapperBasedJavaType.deepCopy) — this creates the snapshot -
Compares the current value against the snapshot using
ObjectNode.equals()— a recursive tree comparison
For large JSON documents, this means every flush triggers a full serialize-deserialize-compare cycle for every managed entity with a JSON column. In our case, with hundreds of ValueEntity instances in the persistence context, each flush() was doing hundreds of these cycles.
The allocation profile confirmed this:
# async-profiler allocation profiling
java -agentpath:libasyncProfiler.so=start,event=alloc,file=alloc.jfr,jfr ...
| Allocator | Bytes |
|---|---|
|
10.7 GB |
|
7.6 GB |
18.3 GB of allocation just for dirty-checking JSON columns — in a 1-minute import.
Fix 1: @Mutability(Immutability.class)
The key insight: our application never mutates JsonNode objects in-place. When a value changes, we always replace the reference:
// We always do this (reference replacement):
entity.data = newJsonNode;
// We never do this (in-place mutation):
entity.data.put("key", "value");
This means Hibernate doesn’t need to deep-copy the JSON for snapshot comparison — it can use reference equality (==) instead. The @Mutability annotation tells Hibernate exactly this:
@Column(columnDefinition = "JSONB")
@JdbcTypeCode(SqlTypes.JSON)
@Basic(fetch = FetchType.LAZY)
@Mutability(Immutability.class) // <-- this one annotation
public JsonNode data;
org.hibernate.type.descriptor.java.Immutability tells Hibernate: "this value is never mutated in-place; use the original reference as the snapshot." Dirty-checking becomes a reference comparison (==) instead of a full tree serialization and comparison.
Results
| Metric | Before | After | Improvement |
|---|---|---|---|
dirty-checking CPU samples |
3,946 |
2 |
1,973x fewer (-99.97%) |
deepCopy CPU samples |
2,169 |
0 |
eliminated (-100%) |
Wall-clock time |
1m36s |
1m20s |
1.2x faster (-17%) |
CPU user time |
2m02s |
0m59s |
2x faster (-51%) |
One annotation eliminated both deepCopy and dirty-checking entirely. All existing tests passed without modification.
When Can You Use This?
@Mutability(Immutability.class) is safe when:
-
You never mutate the field value in-place (no
jsonNode.put(),arrayNode.add(), etc.) -
All changes are reference replacements (
entity.data = newValue) -
The field type is naturally immutable or you treat it as immutable by convention
This applies to JsonNode, String, and any custom type where you follow a replace-not-mutate pattern. If you’re unsure, search your codebase for in-place mutations: grep -r "\.data\.put\|\.data\.add\|\.data\.remove".
This pattern isn’t limited to JSON columns. @Mutability(Immutability.class) works on any @Basic field where Hibernate uses deepCopy for snapshot comparison — custom @Type mappings, large byte[] fields, or any JavaType with an expensive copy implementation. JSON columns are the most common and painful case because deepCopy involves a full Jackson serialize-deserialize roundtrip, but the annotation eliminates snapshot overhead for any immutable-by-convention field.
Fix 2: Remove CascadeType.MERGE from Join Tables
The allocation profile also revealed an unexpected hotspot:
10,689 MB JacksonJsonFormatMapper.toString (JSON serialize)
+-- 9,442 MB from EntityUpdateAction (UPDATE statements)
+-- 1,247 MB from EntityInsertAction (INSERT statements)
9.4 GB of JSON serialization was from UPDATEs, not INSERTs. We were only creating new entities — where were the updates coming from?
The answer: CascadeType.MERGE on a join table relationship.
@Entity
public class Work extends PanacheEntity {
@ManyToMany(cascade = {CascadeType.PERSIST, CascadeType.MERGE}, fetch = FetchType.LAZY)
@JoinTable(name = "work_values", ...)
public List<ValueEntity> sourceValues;
}
When we called em.merge(work) to persist a new Work entity, Hibernate cascaded the merge to every ValueEntity in sourceValues. Each of these was an already-managed, unmodified entity in the persistence context — but the cascade merge caused Hibernate to re-attach them as if they were detached instances, triggering a second dirty-check cycle on top of the normal session flush. For each re-attached entity, Hibernate serialized the JSONB data column to create a new snapshot for comparison.
This is where the two optimizations compound: CascadeType.MERGE was causing redundant dirty-checking, and the JSON deepCopy made each of those checks expensive. Removing the cascade eliminates the redundant cycles; @Mutability(Immutability.class) makes the remaining necessary dirty-checks cheap.
The fix: remove CascadeType.MERGE, keep only CascadeType.PERSIST:
@ManyToMany(cascade = {CascadeType.PERSIST}, fetch = FetchType.LAZY)
@JoinTable(name = "work_values", ...)
public List<ValueEntity> sourceValues;
This is safe because all entities in the relationship are already managed in the persistence context when em.merge() is called. The cascade merge was a no-op that triggered expensive side effects.
Results
| Metric | Before | After | Improvement |
|---|---|---|---|
JSON serialize (toString) |
10,689 MB |
1,237 MB |
8.6x less (-88%) |
UPDATE serialization |
9,442 MB |
0 MB |
eliminated (-100%) |
Total allocation |
31.6 GB |
20.8 GB |
1.5x less (-34%) |
Wall-clock time |
1m24s |
1m14s |
1.1x faster (-12%) |
When Can You Remove CascadeType.MERGE?
CascadeType.MERGE is safe to remove when:
-
The related entities are always managed (loaded from DB or created in the same transaction) when the parent is merged
-
You never pass detached, modified entities through the relationship expecting cascade to persist their changes
-
The relationship is a reference (like "this Work uses these Values") rather than an ownership (like "this Order owns these LineItems")
Check your em.merge() call sites: if the related entities are loaded via queries or em.find() in the same transaction, the cascade merge is redundant.
Note that this optimization has nothing to do with JSON specifically. CascadeType.MERGE causes Hibernate to re-merge and therefore re-dirty-check every related entity on em.merge(), regardless of their column types. In our case, the JSON data column made the cost highly visible because each dirty-check involved expensive JSON serialization. But even with simple scalar columns, unnecessary cascade merges add flush overhead proportional to the number of related entities — each one gets a full dirty-check cycle.
Combined Impact
Both fixes together, measured on our production import pipeline (5 records, 4 worker threads, PostgreSQL):
| Metric | Original | + @Mutability | + No MERGE | Total Improvement |
|---|---|---|---|---|
Total allocation |
31.8 GB |
31.6 GB |
20.6 GB |
1.5x less (-35%) |
JSON serialize |
10,717 MB |
10,689 MB |
1,237 MB |
8.7x less (-88%) |
dirty-checking CPU |
6,087 samples |
2 samples |
2 samples |
3,044x fewer (-99.97%) |
Wall-clock |
5m38s |
1m20s |
1m12s |
4.7x faster (-79%) |
The wall-clock improvement from 5m38s to 1m12s includes other optimizations (worker parallelism, query improvements), but the Hibernate annotation changes alone account for a 1.4x wall-clock speedup (-29%) and 1.5x allocation reduction (-35%).
How to Find This in Your Application
-
Profile with async-profiler — look for
deepCopy,isDirty,findDirty, andperformDirtyCheckin CPU flamegraphs. For JSON columns specifically,FormatMapperBasedJavaType.deepCopyandJacksonJsonFormatMapper.toString/fromStringare the hotspots, but any expensivedeepCopyimplementation will show up here. -
Check fields with expensive snapshot comparison — any
@JdbcTypeCode(SqlTypes.JSON), custom@Type, or large binary field is a candidate for@Mutability(Immutability.class)if you never mutate the value in-place. JSON columns are the most common case, but the annotation works on any@Basicfield. -
Check your
CascadeType.MERGErelationships — if related entities are always managed when the parent is merged, the cascade is pure overhead regardless of what columns those entities have. -
Use
--alloc --totalwith async-profiler — allocation profiling reveals the true cost. Look forEntityUpdateActiondominating overEntityInsertActionin the serialization stack — that’s a strong signal that cascade merges are generating unnecessary UPDATE dirty-checks, independent of column types.
Key Takeaways
-
Hibernate’s dirty-checking uses
deepCopyto snapshot field values for comparison — for JSON columns this means a full serialize-deserialize roundtrip, but any field with an expensive copy implementation pays this cost on every flush. -
@Mutability(Immutability.class)eliminates snapshot overhead for any field you treat as replace-not-mutate — JSON columns are the highest-impact case, but the annotation works on any@Basicfield. -
CascadeType.MERGEon relationships causes cascade re-merging and dirty-checking of all related entities, regardless of their column types — remove it when related entities are always already managed. -
These are one-line annotation changes with no behavioral impact — but they can cut processing time in half.