Help

The EJB3 persistence specification calls for implementors to support Bulk Operations in EJB-QL (the EJB Query Language). As part of Hibernate's implementation of EJB3 persistence, HQL (the Hibernate Query Language : which is a superset of EJB-QL) needed to support these Bulk Operations. This support is now code complete, even going beyond what is offered in the EJB3 persistence specification. There is one task outstanding against this bulk operation support in HQL, but this is completely beyond the scope of the support called for in the EJB3 persistence specification. I'll blog about this one later as it simply rocks ;)

So what exactly are Bulk Operations? Well for those of you familiar with SQL, it is analogous to Data Manipulation Language (DML) but, just like HQL and EJB-QL, defined in terms of the object model. What is DML? DML is the SQL statements which actually manipulate the state of the tabular data: INSERT, UPDATE, and DELETE.

Essentially, all that is to say that EJB-QL and HQL now support UPDATE and DELETE statements (HQL also supports INSERT statements, but more about that at a later time).

In its basic form, this support is not really all that difficult. I mean Hibernate already knows all the information pertaining to tables and columns; it already knows how to parse WHERE-clauses and the like. So what's the big deal? Well, in implementation, we ran across a few topics that make this support more challenging; which of course made it all the more fun to implement ;)

Update Statements

From the EJB3 persistence specification:

Bulk update and delete operations apply to entities of a single entity class 
(together with its subclasses, if any). Only one entity abstract schema type 
may be specified in the FROM or UPDATE clause.

The specification-defined psuedo-grammar for the update syntax:

update_statement ::= update_clause [where_clause]

update_clause ::=UPDATE abstract_schema_name [[AS ] identification_variable]
    SET update_item {, update_item}*

update_item ::= [identification_variable.]state_field = new_value

new_value ::=
    simple_arithmetic_expression |
    string_primary |
    datetime_primary |
    boolean_primary

The basic jist is:

  • There can only be a single entity (abstractschemaname) named in the update-clause; it can optionally be aliased. If the entity name is aliased, then any property references must be qualified using that alias; if the entity name is not aliased, then it is illegal for any property references to be qualified.
  • No joins (either implicit or explicit) can be specified in the update. Sub-queries may be used in the where-clause; the subqueries, themselves, can contain joins.
  • The where-clause is also optional.

Two interesting things to point out:

  • According to the specification, an UPDATE against a versioned entity should not cause the version to be bumped
  • According to the specification, the assigned new_value does not allow subqueries; HQL supports this!

Even though the spec disallows bumping the version on an update of a versioned entity, this is more-often-than-not the desired behavior. Because of the spec, Hibernate cannot do this by default so we introduced a new keyword VERSIONED into the grammar instead. The syntax is update versioned MyEntity ..., which will cause the version column values to get bumped for any affected entities.

Delete Statements

From the EJB3 persistence specification:

Bulk update and delete operations apply to entities of a single entity class 
(together with its subclasses, if any). Only one entity abstract schema type 
may be specified in the FROM or UPDATE clause.

A delete operation only applies to entities of the specified class and its 
subclasses. It does not cascade to related entities.

The specification-defined psuedo-grammar for the delete syntax:

delete_statement ::= delete_clause [where_clause]

delete_clause ::= DELETE FROM abstract_schema_name [[AS ] identification_variable]

The basic jist is:

  • There can only be a single entity (abstractschemaname) named in the from-clause; it can optionally be aliased. If the entity name is aliased, then any property references must be qualified using that alias; if the entity name is not aliased, then it is illegal for any property references to be qualified.
  • No joins (either implicit or explicit) can be specified in the delete. Sub-queries may be used in the where-clause; the subqueries, themselves, can contain joins.
  • The where-clause is also optional.

One very interesting thing to point out there. The specification specifically disallows cascading of the delete to releated entities (not including, abviously, db-level cascades).

Caching

Automatic and transparent object/relational mapping is concerned with the management of object state. This implies that the object state is available in memory. Bulk Operations, to a large extent, undermine that concern. The biggest issue is that of caching performed by the ORM tool/EJB3 persistence implementor.

The spec even makes a point to caution regarding this:

Caution should be used when executing bulk update or delete operations because 
they may result in inconsistencies between the database and the entities in the 
active persistence context. In general, bulk update and delete operations 
should only be performed within a separate transaction or at the beginning of a 
transaction (before entities have been accessed whose state might be affected 
by such operations).

In Hibernate terms, be sure to perform any needed Bulk Operations prior to pulling entities into the session, as failing to do so poses a risk for inconsistencies between the session (the /active persistence context/) and the database.

Hibernate also offers, as do most ORM tools, a shared cache (the second level cache). Executing Bulk Operations also poses a risk of inconsistencies between the shared cache and the database. Hibernate actually takes the responsility of managing this risk for you. Upon completion of a Bulk Operation, Hibernate invalidates any needed region(s) within the shared cache to maintain consistency. It has to be done through invalidation because the UPDATE or DELETE is executed solely on the database server; thus Hibernate has no idea about the ids of any affected entities, nor (in the case of updates) what the new state might be.

Conclusion

Bulk Operations are complimentary to the functionality provided by ORM tools. Especially in the case of batch processes, Bulk Operations coupled with the new StatelessSession functionlity (available > 3.1beta1) offer a more performant alternative to the normal row-based ORM focus.

This-n-that

Entities which are contained across multiple tables (not counting associations) cause particular challenges that I'll blog about later.

Have a look at the reference manual for discussion of these Bulk Operations within HQL.

For those of you familiar with ANTLR and its grammar definitions, the authoritative source for what is supported by HQL is the grammar files themselves.

8 comments:
 
20. Jul 2005, 22:24 CET | Link
john hume
Why did the spec authors stipulate that version should not be bumped for bulk updates?

Do they presume that anyone who's working with individual entities affected by the bulk update at the time it occurs knows better than the bulk updater what the state of that entity should be?
ReplyQuote
 
22. Jul 2005, 10:47 CET | Link
Because, efficient implementation will not be able to compare the version with the current one on the persistence context anyway. So, you either increase the version number wo guaranty of consistence or you do not increase it wo guaranty of consistence. This choice is under discussion in the EG.
 
22. Jul 2005, 10:56 CET | Link
Gavin
It's likely that the final spec *will* require version number increment. Personally, I think I'm in favor of this. However, it is still under consideration and we can't speak for the rest of the group.

OTOH, I do kinda like the flexibility of choosing between

   UPDATE Customer

and

   UPDATE VERSIONED Customer
 
22. Jul 2005, 10:56 CET | Link
Gavin
P.S. Steve, I'm looking forward to your next blog about INSERT ... SELECT ;)
 
23. Jul 2005, 06:31 CET | Link
anonymous
Hi

Is the session cache updated with bulk updates and deletes?
 
24. Jul 2005, 20:52 CET | Link
Christian
No, as the Session _is_ the persistence context. Re-read the quote from the spec. Steve even tells you how to avoid any problems: "In Hibernate terms, be sure to perform any needed "Bulk Operations" prior to pulling entities into the session, as failing to do so poses a risk for inconsistencies between the session (the active persistence context) and the database."

 
31. Jul 2009, 08:41 CET | Link
Hi Steve,
I was looking at the Cache Invalidation during bulk operation implementation.
'/** returns true if no queryspaces or if there are a match */
        private boolean affectedEntity(Set querySpaces, Serializable[] entitySpaces) {
                if(querySpaces==null || querySpaces.isEmpty()) {
                        return true;
                }'
Why we are returning true if no queryspaces?
What is the logic behind that?

If query spaces indicate what all tables affected by bulk operation , why we are invalidating all the cache region if there is no queryspeces.

Thanking you advance.

With warm regards
 
09. Jul 2010, 07:11 CET | Link
Rajesh kumar | pbrajesh(AT)gmail.com
The bulkoperation clears all the cache if no query spaces .
For achieving this,synchronize should be added to the sql-query during a bulk operations.
`Eg:
<sql-query name="queryname">
     <synchronize table="tablename"/>
     update tablename set value = ?
 </sql-query>`

The synchronization should be done on the affected tables because of the update. There can be one or more synchronize tags within the same block depending on what all tables got affected. This will result in cache invalidation of only those tables which are synchronized; otherwise the entire cache region is invalidated.

Note: Hibernate does a flush of the pending data for synchronized tables (from insert/update queue) before actually doing the synchronization. So the flush is triggered automatically for those data in the insert/update/delete queues where the affected tables are synchronized.

Please update the hibernate documentation with this informations
Post Comment