Bulk Operations

Posted by    |      

The EJB3 persistence specification calls for implementors to support Bulk Operations in EJB-QL (the EJB Query Language). As part of Hibernate's implementation of EJB3 persistence, HQL (the Hibernate Query Language : which is a superset of EJB-QL) needed to support these Bulk Operations. This support is now code complete, even going beyond what is offered in the EJB3 persistence specification. There is one task outstanding against this bulk operation support in HQL, but this is completely beyond the scope of the support called for in the EJB3 persistence specification. I'll blog about this one later as it simply rocks ;)

So what exactly are Bulk Operations? Well for those of you familiar with SQL, it is analogous to Data Manipulation Language (DML) but, just like HQL and EJB-QL, defined in terms of the object model. What is DML? DML is the SQL statements which actually manipulate the state of the tabular data: INSERT, UPDATE, and DELETE.

Essentially, all that is to say that EJB-QL and HQL now support UPDATE and DELETE statements (HQL also supports INSERT statements, but more about that at a later time).

In its basic form, this support is not really all that difficult. I mean Hibernate already knows all the information pertaining to tables and columns; it already knows how to parse WHERE-clauses and the like. So what's the big deal? Well, in implementation, we ran across a few topics that make this support more challenging; which of course made it all the more fun to implement ;)

Update Statements

From the EJB3 persistence specification:

Bulk update and delete operations apply to entities of a single entity class 
(together with its subclasses, if any). Only one entity abstract schema type 
may be specified in the FROM or UPDATE clause.

The specification-defined psuedo-grammar for the update syntax:

update_statement ::= update_clause [where_clause]

update_clause ::=UPDATE abstract_schema_name [[AS ] identification_variable]
    SET update_item {, update_item}*

update_item ::= [identification_variable.]state_field = new_value

new_value ::=
    simple_arithmetic_expression |
    string_primary |
    datetime_primary |
    boolean_primary

The basic jist is:

  • There can only be a single entity (abstractschemaname) named in the update-clause; it can optionally be aliased. If the entity name is aliased, then any property references must be qualified using that alias; if the entity name is not aliased, then it is illegal for any property references to be qualified.
  • No joins (either implicit or explicit) can be specified in the update. Sub-queries may be used in the where-clause; the subqueries, themselves, can contain joins.
  • The where-clause is also optional.

Two interesting things to point out:

  • According to the specification, an UPDATE against a versioned entity should not cause the version to be bumped
  • According to the specification, the assigned new_value does not allow subqueries; HQL supports this!

Even though the spec disallows bumping the version on an update of a versioned entity, this is more-often-than-not the desired behavior. Because of the spec, Hibernate cannot do this by default so we introduced a new keyword VERSIONED into the grammar instead. The syntax is update versioned MyEntity ..., which will cause the version column values to get bumped for any affected entities.

Delete Statements

From the EJB3 persistence specification:

Bulk update and delete operations apply to entities of a single entity class 
(together with its subclasses, if any). Only one entity abstract schema type 
may be specified in the FROM or UPDATE clause.

A delete operation only applies to entities of the specified class and its 
subclasses. It does not cascade to related entities.

The specification-defined psuedo-grammar for the delete syntax:

delete_statement ::= delete_clause [where_clause]

delete_clause ::= DELETE FROM abstract_schema_name [[AS ] identification_variable]

The basic jist is:

  • There can only be a single entity (abstractschemaname) named in the from-clause; it can optionally be aliased. If the entity name is aliased, then any property references must be qualified using that alias; if the entity name is not aliased, then it is illegal for any property references to be qualified.
  • No joins (either implicit or explicit) can be specified in the delete. Sub-queries may be used in the where-clause; the subqueries, themselves, can contain joins.
  • The where-clause is also optional.

One very interesting thing to point out there. The specification specifically disallows cascading of the delete to releated entities (not including, abviously, db-level cascades).

Caching

Automatic and transparent object/relational mapping is concerned with the management of object state. This implies that the object state is available in memory. Bulk Operations, to a large extent, undermine that concern. The biggest issue is that of caching performed by the ORM tool/EJB3 persistence implementor.

The spec even makes a point to caution regarding this:

Caution should be used when executing bulk update or delete operations because 
they may result in inconsistencies between the database and the entities in the 
active persistence context. In general, bulk update and delete operations 
should only be performed within a separate transaction or at the beginning of a 
transaction (before entities have been accessed whose state might be affected 
by such operations).

In Hibernate terms, be sure to perform any needed Bulk Operations prior to pulling entities into the session, as failing to do so poses a risk for inconsistencies between the session (the /active persistence context/) and the database.

Hibernate also offers, as do most ORM tools, a shared cache (the second level cache). Executing Bulk Operations also poses a risk of inconsistencies between the shared cache and the database. Hibernate actually takes the responsility of managing this risk for you. Upon completion of a Bulk Operation, Hibernate invalidates any needed region(s) within the shared cache to maintain consistency. It has to be done through invalidation because the UPDATE or DELETE is executed solely on the database server; thus Hibernate has no idea about the ids of any affected entities, nor (in the case of updates) what the new state might be.

Conclusion

Bulk Operations are complimentary to the functionality provided by ORM tools. Especially in the case of batch processes, Bulk Operations coupled with the new StatelessSession functionlity (available > 3.1beta1) offer a more performant alternative to the normal row-based ORM focus.

This-n-that

Entities which are contained across multiple tables (not counting associations) cause particular challenges that I'll blog about later.

Have a look at the reference manual for discussion of these Bulk Operations within HQL.

For those of you familiar with ANTLR and its grammar definitions, the authoritative source for what is supported by HQL is the grammar files themselves.


Back to top