Red Hat

In Relation To Steve Ebersole

In Relation To Steve Ebersole

Hibernate Core 3.3.0 goes GA

Posted by    |       |    Tagged as Releases

Hibernate 3.3.0.GA has been released. A big thanks to everyone who helped us get here.

A few changes from 3.2 worth noting:

  1. Migration to a Maven-based build
  2. Splitting of the project into modular jars (here I mean what Maven calls modules). This allows users to easily see and minimize dependencies.
  3. Redesigned second-level caching SPI
  4. Integration with JBossCache 2.x as a second level cache provider.

I'll be following up with more detailed blogs about some of the specific points.

You can download the full distribution from SourceForge or make use of the Maven artifacts from the JBoss Maven repository.

New (3.2.3) Hibernate identifier generators

Posted by    |       |    Tagged as

Introducing...

Starting in 3.2.3, I include 2 new identifier generators targetted at portability. They take a differect approach to portability than do the older native generator. Typically speaking using a synthetic identifier generation strategy while eyeing portability really comes down to wanting the capabilities that a sequence provides even though the database may not support sequences. Note that I explicitly leave off IDENTITY-style generators because generally speaking an Object/Relational Mapping technology will prefer identifier generation strategies where the identifier value can be retrieved before performing (and without having to actually perform) the insert statement; this is certainly true of Hibernate and other transactional write behind technologies because in the case of IDENTITY columns the insert must be performed immediately (and furthermore, because JDBC does not define a mechanism to retrieve batches of IDENTITY generated values, batching must be impliciitly disabled for entities using an IDENTITY generator), circumventing the transactional write behind behavior.

The two generators are:

  • org.hibernate.id.enhanced.SequenceStyleGenerator - the approach it takes to portability is that really you dont care whether you are physically using a SEQUENCE in the database; really you just want a sequence-like generation of values. On databases which support SEQUENCES, SequenceStyleGenerator will in fact use a SEQUNCE as the value generator; for those database which do not support SEQUENCES, it will instead use a single-row table as the value generator, but with the same exact charecteristics as a SEQUENCE value generator (namely it deals with the sequence table in a separate transaction at all times).
  • org.hibernate.id.enhanced.TableGenerator - while not specifically targetting portability, TableGenerator can certainly be used across all databases. It uses a multi-row table where the rows are keyed by a (configurable) sequence_name column; one approach would be to have each entity define a unique sequence_name value in the table to segment its identifier values. It grew out of the older org.hibernate.id.MultipleHiLoPerTableGenerator and uses basically the same table structure. However, while MultipleHiLoPerTableGenerator inherently applies a hi-lo algorithm to the value generation, this new TableGenerator was added to be able to take advantage of the pluggable optimizers.

Both generators, in addition to other specific parameters, share 3 useful configuration parameters:

  • optimizer
  • initial_value
  • increment_size

The role of the optimizer is to limit the number of times we actually need to hit the database in order to determine the next identifier value. The exact effect of initial_value and increment_size somewhat depend on the optimizer chosen. optimizer provides 3 choices:

  • none - says to hit the database on each and every request
  • hilo - says to use an in-memory pooling technique which is the same basic logic as the older Hibernate hilo or seqhilo generators. In terms of the database values, they are incremented one at a time; in other words, increment_size applies to the in-memory algorithm
  • pooled - says to use a stored pooling technique. Unlike hilo, where incremental values are stored and retrieved from the database sequence/table, pooled stores the actual current hi-value into the database. As an example, consider increment_size=10

Under the covers

So generally speaking, both the hilo and pooled optimizer seeks to optimize performance by minimizing the number of times we need to hit the database. Great! So then exactly how are they different? Well, lets take a look at the values stored in the database as a means to illustrate the distinction.

optimizer=hilo (increment_size=10)

After the initial request, we will have:

|  value (db)  |  value (in-memory)  |  hi (in-memory)  |
| 1            | 1                   | 11               |

The db-value and hi will remain the same until the 12th request, at which point we would clock over:

|  value (db)  |  value (in-memory)  |  hi (in-memory)  |
| 2            | 12                  | 21               |

Essentially, hi defines the clock-over value; once the in-memory value reaches the hi value, we need to hit the database and define a new bucket of values. The major drawback to this approach is when using this strategy with legacy applications that also need to insert values; those other applications must also understand and use this hilo algorithm.

optimizer=pooled (increment_size=10)

After the initial request, we will have:

|  value (db)  |  value (in-memory)  |  hi (in-memory)  |
| 11           | 1                   | 11               |

The db-value and hi will remain the same until the 12th request, at which point we would clock over:

|  value (db)  |  value (in-memory)  |  hi (in-memory)  |
| 21           | 12                  | 31               |

As you can see, with this optimizer the increment_size is actually encoded into the database values. This is perfect for databases which support sequences because typically they also define an /INCREMENT BY/ option to creating the sequence such that calls to get the next sequence value automatically apply the proper increment_size. Even if other applications are also inserting values, we'll be perfectly safe because the SEQUENCE itself will handle applying this increment_size. And in practice, it turns out, you will also be safe if SequenceStyleGenerator reverts to using a table in the same situation because of how the clock over happens.

Conclusion

I would expect that these two new generators actually replace currently existing ones in terms of short-hand names. Specifically, I would expect

  • the implementation behind sequence to change from org.hibernate.SequenceGenerator to the new org.hibernate.id.enhanced.SequenceStyleGenerator
  • the implementation behing table to change from org.hibernate.TableGenerator to the new org.hibernate.id.enhanced.TableGenerator

The second is the more risky replacement because of the big difference between the two. But we've all along discouraged direct use of the current table generator so I think we should be safe there. I am still uncertain when that replacement will happen (probably 4.0?), but in the meantime, the new generators are available and highly recommended for use.

Subversion move

Posted by    |       |    Tagged as

Moving Hibernate source code to our new Subversion home is done. Both developer and annonymous access have been set up. Currently, web access is only available via the Apache module which is less than ideal. We have been told that either Fisheye or ViewCVS access over the Subversion repsoitory will be setup soon.

For the access details, check out: http://hibernate.org/30.html#A3

Pro Hibernate 3 deadlocking example

Posted by    |       |    Tagged as Hibernate ORM

A bug report was recently opened in Hibernate's JIRA stating that Hibernate incorrectly handles deadlock scenarios. The basis for the report was an example in the /Pro Hibernate 3/ book (Chapter 9). For those perhaps not familiar with the term deadlock, the basic gist is that two processes each hold resource locks that the other needs to complete processing. While this phenomena is not restricted to databases, in database terms the idea is that the first process (P1) holds a write lock on a given row (R1) while the second process (P2) holds a write lock on another row (R2). Now, to complete its processing P1 needs to acquire a write lock on R2, but cannot do so because P2 already holds its write lock. Conversely, P2 needs to acquire a write lock on R1 in order to complete its processing, but cannot because P1 already holds its write lock. So neither P1 nor P2 can complete its processing because each is indefinitely waiting on the other to release the needed lock, which neither can do until its processing is complete. The two processes are said to be deadlocked in this situation.

Almost all databases have support to circumvent this scenario by specifying that locks should be timed-out after a certain period of time; after the time-out period, one of the processes is forced to rollback and release its locks, allowing the other to continue and complete. While this works, it is not ideal as it requires that the processes remained deadlocked until the underlying timeout period is exceeded. A better solution is for the database to actively seek out deadlock situations and immediately force one of the deadlock participants to rollback and release its locks, which most databases do in fact also support.

So now back to the /Pro Hibernate 3/ example. Let me say up front that I have not read the book and so do not understand the background discussion in the chapter nor the authors' intent/exceptations in regards to the particular example code. I only know the expectations of a (quite possibly mis-guided) reader. So what this example attempts to do is to spawn two threads that each use their own Hibernate Session to load the same two objects in reverse order and then modify their properties. So the above mentioned reader expects that this sould cause a deadlock scenario to occur. But it does not. Or more correctly, in my running of the example, it typically does not, although the results are inconsistent. Sometimes a deadlock is reported; but the vast majority of runs actually just succeed. Why is that the case?

So here is what really happens in this example code. As I mentioned before, the example attempts to load the same two objects in reverse order. The example uses the entities Publisher and Subscriber. The first thread (T1) loads a given Publisher and modifies its state; it is then forced to wait. The second thread (T2) loads a given Subscriber and modified its state; it is then forced to wait. Then both threads are released from their waiting state. From there, T1 loads the same Subscriber previously loaded by T2 and modifies its state; T2 loads the same Publisher previously loaded by T1 and modifies its state. The thing you need to keep in mind here is that so far neither of these two Sessions have actually been flushed, thus no UPDATE statements have actually occurred against the database at this point. The flush occurs on each Session after each thread's second load and modify sequence. Thus, until that point neither thread (i.e. the corresponding database process) is actually holding any write locks on the underlying data. Clearly, the outcome here is going to depend upon the manner in which the two threads are actually allowed to re-awaken by the underlying threading model, and in particular whether the UPDATE statements from the two sessions happen to get interleaved. If the two threads happen to interleave their requests to the database (i.e. T1's UPDATE PUBLISHER happens first, T2's UPDATE SUBSCRIBER hapens second, etc) then a deadlock will occur; if not interleaved, then the outcome will be success.

There are three ways to inequivocally ensure that lock acquisition errors in the database force one of these two transactions to fail in this example:

  • use of SERIALIZABLE transaction isolation in the database
  • flushing the sesssion after each state change (and the end of the example code's step1() and step2() methods)
  • use of locking (either optimistic or pessimistic)

Seems simple enough. Yet apparently not simple enough for the reader of the /Pro Hibernate 3/ book that opened the previously mentioned JIRA case. After all this was explained to him, he wrote me some ill-tempered, misconception-laden replies in private emails. I am not going to go into all the misconceptions here, but one in particular I think needs to be exposed as many developers without a lot of database background seem to stumble over various concepts relating to transactions. Isolation and locking are not the same thing. In fact, to a large degreee, they actually have completely opposite goals and purposes. Transaction isolation aims to isolate or insulate one transaction from other concurrent transactions, such that operations performed in one transaction do not effect (to varying degrees, based on the exact isolation mode employed) operations performed in others. Locking, on the other hand, has essentially the exact opposite goal; it seeks to ensure that certain operations performed in a transaction do have certain effects on other concurrent transactions. In fact locking really has nothing to do with transactions at all except for the fact that their duration is typically scoped to the transaction in which they are acquired and that their presence/absense might affect the outcome of the different transactions. Perhaps, although I cannot say for sure, this confusion comes from the fact that a lot of databases use locking as the basis for their isolation model. But that is just an implementation detail and some databases such as Oracle, Postgres, and the newest SQL Server have very sophisticated and modern isolation engines not at all based on locking.

Multi-table Bulk Operations

Posted by    |       |    Tagged as

As I mentioned in my previous blog about Bulk Operations , both UPDATE and DELETE statements are challenging to handle against single entities contained across multiple tables (not counting associations), which might be the case with:

  • inheritence using <joined-subclass/>
  • inheritence using <union-subclass/>
  • entity mapping using the <join/> construct

For illustration purposes, lets use the following inheritance hierarchy:

Animal
  /   \
 /     \
Mammal   Reptile
   / \
  /   \
Human   Dog

all of which is mapped using the joined-subclass strategy.

Deletes

There are three related challenges with deletes.

  • deletes against a multi-table entity need to recursively cascade to:
  • all sub-class(es) row(s) matched by primary key (PK) value
  • its super-class row
  • all these orchestrated deletes need to occur in an order to avoid constraint violations
  • which rows need to get deleted?

Consider the following code:

session.createQuery( "delete Mammal m where m.age > 150" ).executeUpdate();

Obviously we need to delete from the MAMMAL table. Additionally, every row in the MAMMAL table has a corresponding row in the ANIMAL table; so for any row deleted from the MAMMAL table, we need to delete that corresponding ANIMAL table row. This fulfills cascading to the super-class. If the Animal entity itself had a super-class, we'd need to delete that row also, etc.

Next, rows in the MAMMAL table might have corresponding rows in either the HUMAN table or the DOG table; so, again, for each row deleted from the MAMMAL table, we need to make sure that any corresponding row gets deleted from the HUMAN or DOG table. This fulfills cascading to the sub-class. If either the Human or Dog entities had further sub-classes, we'd need to delete any of those rows also, etc.

The other challenge I mentioned is proper ordering of the deletes to avoid violating any constraints. The typical foreign key (FK) set up in our example structure is to have the FKs pointing up the hierarchy. Thus, the MAMMAL table has a FK from its PK to the PK of the ANIMAL table, etc. So we need to be certain that we order the deletes:

( HUMAN | DOG ) -> MAMMAL -> ANIMAL

Here, it does not really matter whether we delete from the HUMAN table first, or from the DOG table first.

So exactly which rows need to get deleted (a lot of this discussion applies to update statements as well)? Most databases do not support joined deletes, so we definitely need to perform the deletes seperately against the individual tables involved. The naive approach is to simply use a subquery returning the restricted PK values with the user-defined restriction as the restriction for the delete statement. That actually works in the example given before. But consider another example:

session.createQuery( "delete Human h where h.firstName = 'Steve'" ).executeUpdate();

I said before that we need to order the deletes so as to avoid violating defined FK constraints. Here, that means that we need to delete from the HUMAN table first; so we'd issue some SQL like:

delete from HUMAN where ID IN (select ID from HUMAN where f_name = 'Steve')

So far so good; perhaps not the most efficient way, but it works. Next we need to delete the corresponding row from the MAMMAL table; so we'd issue some more SQL:

delete from MAMMAL where ID IN (select ID from HUMAN where f_name = 'Steve')

Oops! This won't work because we previously deleted any such rows from the HUMAN table.

So how do we get around this? Definitely we need to pre-select and store the PK values matching the given where-clause restriction. One approach is to select the PK values through JDBC and store them within the JVM memory space; then later the PK values are bound into the individual delete statements. Something like:

PreparedStatement ps = connection.prepareStatement( 
        "select ID from HUMAN where f_name = 'Steve'"
);
ResultSet rs = ps.executeQuery();
HashSet ids = extractIds( rs );
int idCount = ids.size();

rs.close();
ps.close();

....

// issue the delete from HUMAN
String sql = 

ps = connection.prepareStatement(
        "delete from HUMAN where ID IN (" +
        generateCommaSeperatedParameterHolders( idCount ) +
        ")"
);
bindParameters( ps, ids );
ps.executeUpdate();

...

The other approach, the one taken by Hibernate, is to utilize temporary tables; where the matching PK values are stored on the database server itself. This is far more performant in quite a number of ways, which is the main reason this approach was chosen. Now we have something like:

// where HT_HUMAN is the temporary table (varies by DB)
PreparedStatement ps = connection.prepareStatement( 
        "insert into HT_HUMAN (ID) select ID from HUMAN where f_name = 'Steve'"
);
int idCount = ps.executeUpdate();
ps.close();

....

// issue the delete from HUMAN 
ps = connection.prepareStatement(
        "delete from HUMAN where ID IN (select ID from HT_HUMAN)"
);
ps.executeUpdate();

In the first step, we avoid the overhead of potential network communication associated with returning the results; we also avoid some JDBC overhead; we also avoid the memory overhead of needing to store the id values. In the second step, we again minimized the amount of data traveling between us and the database server; the driver and server can also recognize this as a repeatable prepared statement and avoid execution plan creation overhead.

Updates

There are really only two challenges with multi-table update statements:

  • partitioning the assignments from the set-clause
  • which rows need to get updated? This one was already discussed above...

Consider the following code:

session.createQuery( "update Mammal m set m.firstName = 'Steve', m.age = 20" )
        .executeUpdate();

We saw from before that the age property is actually defined on the Animal super-class and thus is mapped to the ANIMAL.AGE column; whereas the firstName property is defined on the Mammal class and thus mapped to the MAMMAL.F_NAME column. So here, we know that we need to perform updates against both the ANIMAL and MAMMAL tables (no other tables are touched, even though the Mammal might further be a Human or a Dog). Partitioning the assignments really just means identifying which tables are affected by the individual assignments and then building approppriate update statements. A minor challenge here was accounting for this fact when actually binding user-supplied parameters. Though, for the most part, partitioning the assignments and parameters was fairly academic exercise.

Bulk Operations

Posted by    |       |    Tagged as

The EJB3 persistence specification calls for implementors to support Bulk Operations in EJB-QL (the EJB Query Language). As part of Hibernate's implementation of EJB3 persistence, HQL (the Hibernate Query Language : which is a superset of EJB-QL) needed to support these Bulk Operations. This support is now code complete, even going beyond what is offered in the EJB3 persistence specification. There is one task outstanding against this bulk operation support in HQL, but this is completely beyond the scope of the support called for in the EJB3 persistence specification. I'll blog about this one later as it simply rocks ;)

So what exactly are Bulk Operations? Well for those of you familiar with SQL, it is analogous to Data Manipulation Language (DML) but, just like HQL and EJB-QL, defined in terms of the object model. What is DML? DML is the SQL statements which actually manipulate the state of the tabular data: INSERT, UPDATE, and DELETE.

Essentially, all that is to say that EJB-QL and HQL now support UPDATE and DELETE statements (HQL also supports INSERT statements, but more about that at a later time).

In its basic form, this support is not really all that difficult. I mean Hibernate already knows all the information pertaining to tables and columns; it already knows how to parse WHERE-clauses and the like. So what's the big deal? Well, in implementation, we ran across a few topics that make this support more challenging; which of course made it all the more fun to implement ;)

Update Statements

From the EJB3 persistence specification:

Bulk update and delete operations apply to entities of a single entity class 
(together with its subclasses, if any). Only one entity abstract schema type 
may be specified in the FROM or UPDATE clause.

The specification-defined psuedo-grammar for the update syntax:

update_statement ::= update_clause [where_clause]

update_clause ::=UPDATE abstract_schema_name [[AS ] identification_variable]
    SET update_item {, update_item}*

update_item ::= [identification_variable.]state_field = new_value

new_value ::=
    simple_arithmetic_expression |
    string_primary |
    datetime_primary |
    boolean_primary

The basic jist is:

  • There can only be a single entity (abstractschemaname) named in the update-clause; it can optionally be aliased. If the entity name is aliased, then any property references must be qualified using that alias; if the entity name is not aliased, then it is illegal for any property references to be qualified.
  • No joins (either implicit or explicit) can be specified in the update. Sub-queries may be used in the where-clause; the subqueries, themselves, can contain joins.
  • The where-clause is also optional.

Two interesting things to point out:

  • According to the specification, an UPDATE against a versioned entity should not cause the version to be bumped
  • According to the specification, the assigned new_value does not allow subqueries; HQL supports this!

Even though the spec disallows bumping the version on an update of a versioned entity, this is more-often-than-not the desired behavior. Because of the spec, Hibernate cannot do this by default so we introduced a new keyword VERSIONED into the grammar instead. The syntax is update versioned MyEntity ..., which will cause the version column values to get bumped for any affected entities.

Delete Statements

From the EJB3 persistence specification:

Bulk update and delete operations apply to entities of a single entity class 
(together with its subclasses, if any). Only one entity abstract schema type 
may be specified in the FROM or UPDATE clause.

A delete operation only applies to entities of the specified class and its 
subclasses. It does not cascade to related entities.

The specification-defined psuedo-grammar for the delete syntax:

delete_statement ::= delete_clause [where_clause]

delete_clause ::= DELETE FROM abstract_schema_name [[AS ] identification_variable]

The basic jist is:

  • There can only be a single entity (abstractschemaname) named in the from-clause; it can optionally be aliased. If the entity name is aliased, then any property references must be qualified using that alias; if the entity name is not aliased, then it is illegal for any property references to be qualified.
  • No joins (either implicit or explicit) can be specified in the delete. Sub-queries may be used in the where-clause; the subqueries, themselves, can contain joins.
  • The where-clause is also optional.

One very interesting thing to point out there. The specification specifically disallows cascading of the delete to releated entities (not including, abviously, db-level cascades).

Caching

Automatic and transparent object/relational mapping is concerned with the management of object state. This implies that the object state is available in memory. Bulk Operations, to a large extent, undermine that concern. The biggest issue is that of caching performed by the ORM tool/EJB3 persistence implementor.

The spec even makes a point to caution regarding this:

Caution should be used when executing bulk update or delete operations because 
they may result in inconsistencies between the database and the entities in the 
active persistence context. In general, bulk update and delete operations 
should only be performed within a separate transaction or at the beginning of a 
transaction (before entities have been accessed whose state might be affected 
by such operations).

In Hibernate terms, be sure to perform any needed Bulk Operations prior to pulling entities into the session, as failing to do so poses a risk for inconsistencies between the session (the /active persistence context/) and the database.

Hibernate also offers, as do most ORM tools, a shared cache (the second level cache). Executing Bulk Operations also poses a risk of inconsistencies between the shared cache and the database. Hibernate actually takes the responsility of managing this risk for you. Upon completion of a Bulk Operation, Hibernate invalidates any needed region(s) within the shared cache to maintain consistency. It has to be done through invalidation because the UPDATE or DELETE is executed solely on the database server; thus Hibernate has no idea about the ids of any affected entities, nor (in the case of updates) what the new state might be.

Conclusion

Bulk Operations are complimentary to the functionality provided by ORM tools. Especially in the case of batch processes, Bulk Operations coupled with the new StatelessSession functionlity (available > 3.1beta1) offer a more performant alternative to the normal row-based ORM focus.

This-n-that

Entities which are contained across multiple tables (not counting associations) cause particular challenges that I'll blog about later.

Have a look at the reference manual for discussion of these Bulk Operations within HQL.

For those of you familiar with ANTLR and its grammar definitions, the authoritative source for what is supported by HQL is the grammar files themselves.

Current (Contextual) Sessions

Posted by    |       |    Tagged as

New to Hibernate 3.0.1 is the SessionFactory.getCurrentSession() method. It allows application developers to delegate tracking of current sessions to Hibernate itself. This is fairly trivial functionality, but stuff just about any user of Hibernate had to implement themselves, or rely on third party stuff to do for them. Let's take a look at how this is implemented in Hibernate and how it might be useful.

Context Scope

I said that SessionFactory.getCurrentSession() tracks the current session on behalf of the application developer. What exactly does that mean? What is the scope in which a session is considered current? The transaction! More specifically, a JTA transaction.

Another dimension to scoping the current session is to which factory it belongs. Because Hibernate implements this internal to the SessionFactory, the current sessions are inherently tracked by that given factory. Internally, the SessionFactory maintains a Map of sessions keyed by JTA transaction. There is little overhead in this since the Map is built lazily, and only utilized during getCurrentSession() calls: if you don't use this feature, the map is never even built.

Example Usage

Imagine a simple scenario coordinating efforts between three DAOs:

public Bid myExposedCmtEjbServiceMethod(Long itemId, Double amount) {
    ItemDAO itemDAO = new ItemDAO( getSessionFactory() );
    BidDAO bidDAO = new BidDAO( getSessionFactory() );
    UserDAO userDAO = new UserDAO( getSessionFactory() );

    Item item = itemDAO.load( itemId );
    User bidder = userDAO.load( getCurrentUsername() );
    return bidDAO.create( item, amount, user );
}

How should each of the DAOs utilize the same session to perform their work? The typical pattern is to use ThreadLocals or similiar contextual storage (perhaps a JBoss TransactionLocal) to maintain the current session within that context. Furthermore, how do we know when this current session should be cleaned up?

The usual pattern to implement these functionalities is that a top-level service/method is defined as the service controller which is responsible for opening a session at the start, binding it to the contextual storage (so other collaborators can find it), and cleaning up the session at the end of the service processing. A slight twist on this is to use method interception to apply those behaviours (or aspects) on top of the service controller method. Either way, this can be a lot of work to setup requiring that we either:

  • modify all the service controller points to perform the open-bind-cleanup functionality
  • wrapping all our services (sometimes spuriously) in proxies so that we can intercept the method execution and apply those behavioural aspects

So instead, lets look at using the SessionFactory.getCurrentSession() approach:

public class ItemDAO {
    private SessionFactory sf;

    public ItemDAO(SessionFactory sf) { this.sf = sf; }

    public Item load(Long itemId) {
        return ( Item ) sf.getCurrentSession().load( Item.class, itemId );
    }

    ...
}

Here, each of the DAO collaborators simply use the getCurrentSession() method; the things collaborating with the DAOs do not need to perform anything extra and we do not need to generate proxies and method interceptors just to apply the notion of contextual sessions.

So now, by using getCurrentSession() we can easily scope the notion of a current session to the JTA transaction and reuse the same session throughout that JTA transaction. But how do we clean up the session? And how do we manage flushing of the session state with the database?

Auto flush and close

Two new configuration options introduced in Hibernate3 are extremely powerful, especially when combined with the SessionFactory.getCurrentSession(). Both of these are available in the JTA environments, as well as scenarios where application is utilizing the Hibernate transaction-abstraction API.

The first is flush_before_completion, which forces a flush of the session just prior to transaction completion (think Synchronization.beforeCompletion()...). With this setting enabled, we do not have to worry about flushing the session after we are done in order to synchronize in-memory state with the database; Hibernate does it for us (just prior the transaction commit).

The second is auto_close_session, which forces the session to be closed after transaction completion. In JTA environments, this setting has an additional effect; it forces Hibernate to release JDBC connections much more aggresively. Basically, Hibernate will obtain a connection, use it, and then immediately release it back to the datasource. This allows better integration into JTA environments which implement some form of connection containment check (i.e. the JBoss CachedConnectionManager).

Conclusion

All of these together allow application developers to free themselves from managing session lifecycle and have Hibernate do it for them.

Hibernate3 Events

Posted by    |       |    Tagged as Hibernate ORM

Another major change in Hibernate3 is the evolution to use an event and listener paradigm as its core processing model. This allows very fine-grained hooks into Hibernate internal processing in response to external, application initiated requests. It even allows customization or complete over-riding of how Hibernate reacts to these requests. It really serves as an expansion of what Hibernate tried to acheive though the earlier Interceptor, Lifecycle, and Validatable interafaces.

Note: The Lifecycle and Validatable interfaces have been moved to the new classic package in Hibernate3. Their use is not encouraged, as it introduces dependencies on the Hibernate library into the users domain model and can be handled by a custom Interceptor or through the new event model external to the domain classes. This is nothing new, as the same recomendation was made in Hibernate2 usage.

So what types of events does the new Hibernate event model define? Essentially all of the methods of the org.hibernate.Session interface correlate to an event. So you have a LoadEvent, a FlushEvent, etc (consult the configuration DTD or the org.hibernate.event package for the full list of defined event types). When a request is made of one of these methods, the Hibernate session generates an appropriate event and passes it to the configured event listener for that type. Out-of-the-box, these listeners implement the same processing in which those methods always resulted. However, the user is free to implement a customization of one of the listener interfaces (i.e., the LoadEvent is processed by the registered implemenation of the LoadEventListener interface), in which case their implementation would be responsible for processing any load() requests made of the Session.

These listeners should be considered effectively singletons; meaning, they are shared between requests, and thus should not save any state as instance variables. The event objects themselves, however, do hold a lot of the context needed for processing as they are unique to each request. Custom event listeners may also make use of the event's context for storage of any needed processing variables. The context is a simple map, but the default listeners don't use the context map at all, so don't worry about over-writing internally required context variables.

A custom listener should implement the appropriate interface for the event it wants to process and/or extend one of the convenience base classes (or even the default event listeners used by Hibernate out-of-the-box as these are declared non-final for this purpose). Custom listeners can either be registered programatically through the Configuration object, or specified in the Hibernate configuration XML (declarative configuration through the properties file is not supported). Here's an example of a custom load event listener:

public class MyLoadListener extends DefaultLoadEventListener {
    // this is the single method defined by the LoadEventListener interface
    public Object onLoad(LoadEvent event, LoadEventListener.LoadType loadType) 
            throws HibernateException {
        if ( !MySecurity.isAuthorized( event.getEntityName(), event.getEntityId() ) ) {
            throw MySecurityException("Unauthorized access");
        }
        return super.onLoad(event, loadType);
    }
}

Then we need a configuration entry telling Hibernate to use our listener instead of the default listener:

<hibernate-configuration>
    <session-factory>
        ...
        <listener type="load" class="MyLoadListener"/>
    </session-factory>
</hibernate-configuration>

Or we could register it programatically:

Configuration cfg = new Configuration();
cfg.getSessionEventListenerConfig().setLoadEventListener( new MyLoadListener() );
....

Listeners registered declaratively cannot share instances. If the same class name is used in multiple <listener/> elements, each reference will result in a seperate instance of that class. If you need the capability to share listener instances between listener types you must use the programatic registration approach.

Why implement an interface and define the specific type during configuration? Well, a listener implementation could implement multiple event listener interfaces. Having the type additionally defined during registration makes it easier to turn custom listeners on or off during configuration.

Hibernate3 Filters

Posted by    |       |    Tagged as Hibernate ORM

Hibernate3 adds the ability to pre-define filter criteria and attach those filters at both a class and a collection level. What's a pre-defined filter criteria? Well, it's the ability to define a limit clause very similiar to the existing where attribute available on the class and various collection elements. Except these filter conditions can be parameterized! The application can then make the decision at runtime whether given filters should be enabled and what their parameter values should be.

Configuration

In order to use filters, they must first be defined and then attached to the appropriate mapping elements. To define a filter, use the new <filter-def/> element within a <hibernate-mapping/> element:

<filter-def name="myFilter">
    <filter-param name="myFilterParam" type="string"/>
</filter-def>

Then, this filter can be attched to a class:

<class name="myClass" ...>
    ...
    <filter name="myFilter" condition=":myFilterParam = my_filtered_column"/>
</class>

or, to a collection:

<set ...>
    <filter name="myFilter" condition=":myFilterParam = my_filtered_column"/>
</set>

or, even to both (or multiples of each) at the same time!

Usage

In support of this, a new interface was added to Hibernate3, org.hibernate.Filter, and some new methods added to org.hibernate.Session. The new methods on Session are: enableFilter(String filterName), getEnabledFilter(String filterName), and disableFilter(String filterName). By default, filters are not enabled for a given session; they must be explcitly enabled through use of the Session.enabledFilter() method, which returns an instance of the new Filter interface. Using the simple filter defined above, this would look something like:

session.enableFilter("myFilter").setParameter("myFilterParam", "some-value");

Note that methods on the org.hibernate.Filter interface do allow the method-chaining common to much of Hibernate.

Big Deal

This is all functionality that was available in Hibernate before version 3, right? Of course. But before version 3, this was all manual processes by application code. To filter a collection you'd need to load the entity containing the collection and then apply the collection to the Session.filter() method. And for entity filtration you'd have to write stuff that manually modified the HQL string by hand or a custom Interceptor.

This new feature provides a clean and consistent way to apply these types of constraints. The Hibernate team envisions the usefulness of this feature in everything from internationalization to temporal data to security considerations (and even combinations of these at the same time) and much more. Of course it's hard to envision the potential power of this feature given the simple example used so far, so let's look at some slightly more in depth usages.

Temporal Data Example

Say you have an entity that follows the effective record database pattern. This entity has multiple rows each varying based on the date range during which that record was effective (possibly even maintained via a Hibernate Interceptor). An employment record might be a good example of such data, since employees might come and go and come back again. Further, say you are developing a UI which always needs to deal in current records of employment data. To use the new filter feature to acheive these goals, we would first need to define the filter and then attach it to our Employee class:

<filter-def name="effectiveDate">
    <filter-param name="asOfDate" type="date"/>
</filter-def>

<class name="Employee" ...>
    ...
    <many-to-one name="department" column="dept_id" class="Department"/>
    <property name="effectiveStartDate" type="date" column="eff_start_dt"/>
    <property name="effectiveEndDate" type="date" column="eff_end_dt"/>
    ...
    <!--
        Note that this assumes non-terminal records have an eff_end_dt set to a max db date
        for simplicity-sake
    -->
    <filter name="effectiveDate" condition=":asOfDate BETWEEN eff_start_dt and eff_end_dt"/>
</class>

<class name="Department" ...>
    ...
    <set name="employees" lazy="true">
        <key column="dept_id"/>
        <one-to-many class="Employee"/>
        <filter name="effectiveDate" condition=":asOfDate BETWEEN eff_start_dt and eff_end_dt"/>
    </set>
</class>

Then, in order to ensure that you always get back currently effective records, simply enable the filter on the session prior to retrieving employee data:

Session session = ...;
session.enabledFilter("effectiveDate").setParameter("asOfDate", new Date());
List results = session.createQuery("from Employee as e where e.salary > :targetSalary")
        .setLong("targetSalary", new Long(1000000))
        .list();

In the HQL above, even though we only explicitly mentioned a salary constraint on the results, because of the enabled filter the query will return only currently active employees who have a salary greater than a million dollars (lucky stiffs).

Even further, if a given department is loaded from a session with the effectiveDate filter enabled, its employee collection will only contain active employees.

Security Example

Imagine we have an application that assigns each user an access level, and that some sensitive entities in the system are assigned access levels (way simplistic, I understand, but this is just illustration). So a user should be able to see anything where their assigned access level is greater than that assigned to the entity they are trying to see. Again, first we need to define the filter and apply it:

<filter-def name="accessLevel">
    <filter-param name="userLevel" type="int"/>
</filter-def>

<class name="Opportunity" ...>
    ...
    <many-to-one name="region" column="region_id" class="Region"/>
    <property name="amount" type="Money">
        <column name="amt"/>
        <cloumn name="currency"/>
    </property>
    <property name="accessLevel" type="int" column="access_lvl"/>
    ...
    <filter name="accessLevel"><![CDATA[:userLevel >= access_lvl]]></filter>
</class>

<class name="Region" ...>
    ...
    <set name="opportunities" lazy="true">
        <key column="region_id"/>
        <one-to-many class="Opportunity"/>
        <filter name="accessLevel"><![CDATA[:userLevel >= access_lvl]]></filter>
    </set>
    ...
</class>

Next, our application code would need to enable the filter:

User user = ...;
Session session = ...;
session.enableFilter("accessLevel").setParameter("userLevel", user.getAccessLevel());

At this point, loading a Region would filter its opportunities collection based on the current user's access level:

Region region = (Region) session.get(Region.class, "EMEA");
region.getOpportunities().size(); // <- limited to those accessible by the user's level

Conclusion

These were some pretty simple examples. But hopefully, they'll give you a glimpse of how powerful these filters can be and maybe sparked some thoughts as to where you might be able to apply such constraints within your own application. This can become even more powerful in combination with various interception methods, like web filters, etc. Also a note: if you plan on using filters with outer joining (either through HQL or load fetching) be careful of the direction of the condition expression. Its safest to set this up for left outer joining; in general, place the parameter first followed by the column name(s) after the operator.

back to top