Options for Entity Dirtness Checking

At flush-time, Hibernate needs to know what entity state has become dirty (changed) so that it knows what data needs to be written to the database and what data does not. Historically Hibernate only defined one means of such dirtiness checking, but has since added multiple additional schemes which seem to be not as well known. The intent for this blog is to start a base for improving the documentation around dirty checking and these various options.

flush-time state comparison strategy

In such a scheme, the loaded state of an entity is kept around as part of the persistence context (Session/EntityManager). This loaded state is the last known state of the entity data as it exists in the database. That happens whenever the entity is loaded or whenever we update the entity from that persistence context. But either way, we have a last, well-known state of the entity in the database as part of this transaction.

This strategy, then, at flush time, performs a deep comparison between that loaded state and the entity's current state. This happens for every entity associated with the persistence context on every flush, which can be a bit of a performance drain in long running persistence contexts and batch processing use-cases. In fact this is why the documentation discusses the importance of evicting as part of batch processing use-cases, but the same thing applies to long running persistence contexts. The best way to mitigate the performance impact of this strategy is to either:

minimize the number of flushes that occur on a given persistence context (always a good idea anyway)
minimize the number of entities that are associated with the persistence context during each flush

Such a strategy falls into a category of dirtiness checking that I call computed. Initially, this was the only strategy supported by Hibernate. However even as far back as 3.0 (which dates back 8-9 years ago!) Hibernate started branching out into tracked dirtiness checking strategies. The most well known general approach to this is using bytecode enhancement.

Tracked approach : bytecode enhancement

Hibernate's first foray into tracked strategies was through bytecode enhancement, which was how most other JPA providers did tracked dirty checking. This is a process whereby your Java Class's bytecode is read and enhanced (changed) to introduce new bytecode level instructions. Hibernate supports this as an Ant task and also through runtime enhancment, although runtime enhancement currently requires a JPA container and container-managed EntityManagerFactory bootstrapping. Basically, Hibernate does not provide a Java agent to perform the enhancement at classload time. We are not against it so much as we just do not have time and no one has contributed such support to-date; though if you are interested... ;)

The enhancement weaves in support for the entity itself keeping track of when its state has changed. At flush time we can then check that flag rather than performing the state comparison computation.

To apply the build-time enhancement using the Ant task, you'd specify the Ant task like:

<target name="instrument" depends="compile">
    <taskdef name="instrument" classname="org.hibernate.tool.instrument.javassist.InstrumentTask">
        <classpath ... >
    </taskdef>

    <instrument verbose="true">
        <fileset dir="${yourClassesDir}">
            <include name="*.class"/>
        </fileset>
    </instrument>
</target>

which performs the enhancement/instrumentation on your classes as defined by the nested Ant fileset. It is important that the classpath used to define the task contain the Hibernate jar, all its dependencies and your classes.

To use runtime enhancement, simply enable the setting hibernate.ejb.use_class_enhancer to true. Again, this requires that your application be using JPA container-managed EMF bootstrapping.

There are also third party maven plugins support Hibernate bytecode enhancement within a Maven build.

Be aware that work is under way (actually its first steps are already in the 4.2 release) to revamp bytecode enhancement support. See HHH-7667 and HHH-7963 for details.

Tracked approach : delegation

A new feature added in 4.1 allows all dirtiness checking to be delegated to application code. See HHH-3910 and HHH-6998 for complete design discussions. But essentially this feature allows your application to control how dirtiness checking happens. The idea is that your application model classes are monitoring their own dirtiness. Imagine an entity class like:

@Entity
public class MyEntity {
    private Long id;
    private String name;

    @Id 
    public Long getId() { ... }
    public void setId(Long id) { ... }

    public Long getName() { ... }
    public void setName(String name) {
        if ( ! areEqual( this.name, name ) ) {
            trackChange( "name", this.name );
        }
        this.name = name;
    }

    private Map<String,?> tracker;

    private void trackChange(String attributeName, Object value) {
        if ( tracker == null ) {
            tracker = new HashMap<String,Object>();
        }
        else if ( tracker.containsKey( attributeName ) {
            // no need to re-put, we want to keep the original value
            return;
        }
        tracker.put( attributeName, value );
    }

    public boolean hadDirtyAttributes() {
        return tracker != null && ! tracker.isEmpty();
    }

    public Set<String> getDirtyAttributeNames() {
        return tracker.keySet();
    }

    public void resetDirtiness() {
        if ( tracker != null ) {
            tracker.clear();
        }
    }
}

Using the org.hibernate.CustomEntityDirtinessStrategy introduced in 4.1 as part of HHH-3910 we can easily tie the entity's intrinsic dirty checking into Hibernate's dirty checking:

public class CustomEntityDirtinessStrategyImpl implements CustomEntityDirtinessStrategy {
    @Override
    public boolean canDirtyCheck(Object entity, EntityPersister persister, Session session) {
        // we only manage dirty checking for MyEntity instances (for this example; a CustomEntityDirtinessStrategy
        // manage dirty checking for any number of entity types).
        return MyEntity.class.isInstance( entity );
    }

    @Override
    public boolean isDirty(Object entity, EntityPersister persister, Session session) {
        return ( (MyEntity) entity ).hadDirtyAttributes();
    }

    @Override
    public void findDirty(Object entity, EntityPersister persister, Session session, DirtyCheckContext dirtyCheckContext) {
        final MyEntity myEntity = (MyEntity) entity;
        final Set<String> dirtyAttributeNames = entity.getDirtyAttributeNames();

        dirtyCheckContext.doDirtyChecking(
                new AttributeChecker() {
                        @Override
                        public boolean isDirty(AttributeInformation attributeInformation) {
                            return dirtyAttributeNames.contains( attributeInformation.getName() );
                        }
                }
        );
    }

    @Override
    public void resetDirty(Object entity, EntityPersister persister, Session session) {
        return ( (MyEntity) entity ).resetDirtiness();
    }
}

This delegation could even be combined with some custom bytecode enhancement that you apply to your domain model in order to weave in dirtiness tracking. It also fits very nicely with dynamic models (using Map-backed proxies, etc).

There can currently be only one CustomEntityDirtinessStrategy associated with a SessionFactory/EntityManagerFactory. The CustomEntityDirtinessStrategy implementation to use is defined by the hibernate.entity_dirtiness_strategy setting.

Conclusion

Hopefully this gives a more clear picture of the possibilities for managing dirtiness checking in Hibernate. If things are still unclear discuss in comments and I'll try to aggregate all constructive criticisms and suggestions into the docs.

In Relation To