The EJB3 persistence specification calls for implementors to
support Bulk Operations
in EJB-QL (the EJB Query Language
). As part of Hibernate's implementation of
EJB3 persistence, HQL (the Hibernate Query Language
: which is a superset of EJB-QL) needed to support
these Bulk Operations
. This support is now code complete, even going beyond what is offered in the EJB3
persistence specification. There is one task outstanding against this bulk operation support in HQL, but
this is completely beyond the scope of the support called for in the EJB3 persistence specification.
I'll blog about this one later as it simply rocks ;)
So what exactly are Bulk Operations
? Well for those of you familiar with SQL, it is analogous to Data Manipulation Language
(DML) but, just like HQL and EJB-QL, defined in terms of the object model. What is DML? DML is the SQL statements which actually manipulate the state of the tabular data:
INSERT, UPDATE, and DELETE.
Essentially, all that is to say that EJB-QL and HQL now support UPDATE and DELETE statements (HQL also supports INSERT statements, but more about that at a later time).
In its basic form, this support is not really all that difficult. I mean Hibernate already knows all the information pertaining to tables and columns; it already knows how to parse WHERE-clauses and the like. So what's the big deal? Well, in implementation, we ran across a few topics that make this support more challenging; which of course made it all the more fun to implement ;)
Update Statements
From the EJB3 persistence specification:
Bulk update and delete operations apply to entities of a single entity class (together with its subclasses, if any). Only one entity abstract schema type may be specified in the FROM or UPDATE clause.
The specification-defined psuedo-grammar for the update syntax:
update_statement ::= update_clause [where_clause] update_clause ::=UPDATE abstract_schema_name [[AS ] identification_variable] SET update_item {, update_item}* update_item ::= [identification_variable.]state_field = new_value new_value ::= simple_arithmetic_expression | string_primary | datetime_primary | boolean_primary
The basic jist is:
- There can only be a single entity (abstractschemaname) named in the update-clause; it can optionally be aliased. If the entity name is aliased, then any property references must be qualified using that alias; if the entity name is not aliased, then it is illegal for any property references to be qualified.
- No joins (either implicit or explicit) can be specified in the update. Sub-queries may be used in the where-clause; the subqueries, themselves, can contain joins.
- The where-clause is also optional.
Two interesting things to point out:
- According to the specification, an UPDATE against a versioned entity should not cause the version to
be bumped
- According to the specification, the assigned new_value does not allow subqueries; HQL supports this!
Even though the spec disallows bumping the version on an update of a versioned
entity, this is more-often-than-not the desired behavior. Because of the
spec, Hibernate cannot do this by default so we introduced a new keyword
VERSIONED into the grammar instead. The syntax is update versioned MyEntity ...
,
which will cause the version column values to get bumped for any affected entities.
Delete Statements
From the EJB3 persistence specification:
Bulk update and delete operations apply to entities of a single entity class (together with its subclasses, if any). Only one entity abstract schema type may be specified in the FROM or UPDATE clause. A delete operation only applies to entities of the specified class and its subclasses. It does not cascade to related entities.
The specification-defined psuedo-grammar for the delete syntax:
delete_statement ::= delete_clause [where_clause] delete_clause ::= DELETE FROM abstract_schema_name [[AS ] identification_variable]
The basic jist is:
- There can only be a single entity (abstractschemaname) named in the from-clause; it can optionally be aliased. If the entity name is aliased, then any property references must be qualified using that alias; if the entity name is not aliased, then it is illegal for any property references to be qualified.
- No joins (either implicit or explicit) can be specified in the delete. Sub-queries may be used in the where-clause; the subqueries, themselves, can contain joins.
- The where-clause is also optional.
One very interesting thing to point out there. The specification specifically disallows cascading of the delete to releated entities (not including, abviously, db-level cascades).
Caching
Automatic and transparent object/relational mapping is concerned with the management of
object state. This implies that the object state is available in memory. Bulk Operations
,
to a large extent, undermine that concern. The biggest issue is that of caching performed by the
ORM tool/EJB3 persistence implementor.
The spec even makes a point to caution regarding this:
Caution should be used when executing bulk update or delete operations because they may result in inconsistencies between the database and the entities in the active persistence context. In general, bulk update and delete operations should only be performed within a separate transaction or at the beginning of a transaction (before entities have been accessed whose state might be affected by such operations).
In Hibernate terms, be sure to perform any needed Bulk Operations
prior to pulling
entities into the session, as failing to do so poses a risk for inconsistencies
between the session (the /active persistence context/) and the database.
Hibernate also offers, as do most ORM tools, a shared cache (the second level cache).
Executing Bulk Operations
also poses a risk of inconsistencies between the shared cache
and the database. Hibernate actually takes the responsility of managing this risk for
you. Upon completion of a Bulk Operation
, Hibernate invalidates any needed region(s)
within the shared cache to maintain consistency. It has to be done through invalidation
because the UPDATE or DELETE is executed solely on the database server; thus Hibernate
has no idea about the ids of any affected entities, nor (in the case of updates) what
the new state might be.
Conclusion
Bulk Operations
are complimentary to the functionality provided by ORM tools.
Especially in the case of batch processes, Bulk Operations
coupled with the new
StatelessSession functionlity (available > 3.1beta1) offer a more performant
alternative to the normal row-based ORM focus.
This-n-that
Entities which are contained across multiple tables (not counting associations) cause particular challenges that I'll blog about later.
Have a look at the reference manual for discussion of these Bulk Operations within HQL.
For those of you familiar with ANTLR and its grammar definitions, the authoritative source for what is supported by HQL is the grammar files themselves.