Red Hat

In Relation To

The Hibernate team blog on everything data.

EJB 3.0 EDR2

Posted by    |       |    Tagged as

Cedric beat me to this, but if you missed his announcement, the EJB 3.0 second early draft is available. The most interesting new stuff to me is:

  • the new callback listener/interceptor architecture
  • the separate document dealing with entity beans that will evolve into a complete specification out-of-container operation of the entity manager (the persistence engine)
  • native SQL queries
  • definition of interoperability with legacy EJB 2.1 clients and servers
  • complete specification of the semantics of association mappings
  • complete specification of the semantics of EJBQL

There is also an example of what the XML-based ORM metadata might look like. This is intended to spark discussion and is certainly not final at this stage.

Of course, there are many other revisions compared to EDR1. (Those were just the ones I could think of now.)

The goals for a next draft include

  • API for obtaining an EntityManager outside the container
  • more complete definition of XML-based deployment descriptors

http://jcp.org/en/jsr/detail?id=220

Before I go, I can't overemphasize how important the new @Interceptor stuff is - for a long time we've been unable to extend the basic set of EJB services in a portable way. Yes, you could do it in JBoss, using JBoss-specific APIs, and in WebLogic using BEA-specific APIs, but that just isn't good enough! One immediate consequence of this new feature is that people will be able to build /EJB frameworks/. I anticipate a whole new marketplace for open source addons to EJB, just like there is a profusion of web-tier frameworks today. But unlike web-tier frameworks, the architecture lends itself to combining different extensions in the same application! It's interesting; the original EJB vision was for a marketplace of reuseable /application components/, which didn't eventuate. /This/ feature will further foster the availability of reuseable /infrastructure services/, which actually /has/ worked in practice (though it hasn't worked so well for EJB).

Enjoy!

Hibernate Annotations alpha1

Posted by    |       |    Tagged as

We just released the brand new Hibernate Annotations module as an alpha version. This module provides the Hibernate facilities to declare Hibernate O/R mapping metadata through JDK 5.0 metadata annotations (instead of XML mapping files or XDoclet preprocessing). We have implemented the EJB3 (JSR-220) O/R metadata annotations defined in the early draft 1 of the spec. This set of annotations covers all common mappings cases.

Next step for us is to provide custom annotations to cover Hibernate specific features while following the EJB3 spec evolutions.

This tool is designed for easy-of-dev and quick application development (mapping metadata in the sources, no preprocessing step, configuration by exception minimizing the metadata declarations). Give a try to this new programming model. Feedbacks are welcome, especially on the Hibernate specific features you want to see covered by annotations.

Download it and have a look at the tutorial and the comprehensive test suite. It will give some good samples.

Hibernate 3.0 goes beta

Posted by    |       |    Tagged as Hibernate ORM

We just released Hibernate 3.0 beta 1. I've no time to list all the many changes since the alpha was released four months ago, let alone everything that is new in Hibernate3, which has been in development for over a year now.

The most exciting new thing from our point of view is the new AST-based HQL parser, written by Joshua Davis. It uses 3 ANTLR grammars to transform HQL/EJBQL to SQL. The work on this is not quite finished, but almost all legacy tests pass. You can try out the new query parser by setting

hibernate.query.factory_class=org.hibernate.hql.ast.ASTQueryTranslatorFactor

I'll try to get Joshua to blog about the design of the parser (very cool stuff).

Reattaching an object having no modification

Posted by    |       |    Tagged as

It happens sometimes that a domain model objet dettached from a previous session needs/can be reattached without trigging an UPDATE (whether it has optimistic locking or not). Hibernate supports this kind of feature by providing:

session.lock(myEntity, LockMode.NONE);

Thus, Hibernate will not propagate the modifications unless they are made after the reattachment to the session.

Don't use it as an optimization

First of all, try to avoid this feature. Usually, reattaching an object using session.saveOrUpdate() or by retrieving it through a session.get()/session.load() is fast enough for you. I'm not kidding, do not suppose it's significantly faster in your real application not to update the object (esp. when versioning is switched on) or not to retrieve it from the DB: do some benchmark before any decision.

Beware of boundaries

Associated object are not always managed the way you want. Uncascaded properties are not reattached to the session.

I am currently working on a project where some entities needs to be reattached using a lock mode NONE. In my case, the entity is a Contact. Contact is linked to some barely updated (and cached) entities. In a particular but very frequent use case, I know for sure that the Contact instance I get is the same as the DB version one. It appeared to me to be the best use case for session.lock(contact, LockMode.NONE);

Unfortunatly, the contact I retrieve is associated (but not cascaded) to some lazy loaded (at the second level of the graph) entities, so I cannot properly walk through my object graph.

The solution I've used is to redefine properly my object graph boundary, by refreshing some of the Contact properties with fresh entities.

contact = ... //get it from somewhere
session = sf.openSession();
Transaction tx = session.beginTransaction();
ContactType type = session.load( 
        ContactType.class, 
        contact.getContactType().getId() 
    );
contact.setContactType(type);
session.lock(contact, LockMode.NONE);
//work with contact safely

I'm now able to walk my graph through contact, eg:

contact.getContactType().getFavoriteChannel();

Notice several things:

  • I used load() instead of get(), it allows me to get a proxy if available and thus don't hit the DB
  • I set the refreshed property before reattaching the object to avoid useless updates.

Know what you do

This feature it quite complex because it may break the ACID semantic (contact and the other entities where not loaded in the same transaction) and may lead to very tricky issues. I did use this optimization (reduce of DB hit) because I knew my application business process very well, I was sure it won't break any data, and because it makes me gain signficant time. I have considered seriously the following solution.

session.get(Contact.class, contact.getId());

This one line of code is your friend, don't ban it too early. That's the most natural way of playing with the object.

Don't get tricked

Posted by    |       |    Tagged as

Yesterday, another vendor marketing statement was posted on TSS. I usually ignore these, but when it is about data management, I just have to reply. What is always surprising to me is how little we Java developers still know about data management. Here is a statement made by Maxim Kramarenko in the discussion thread:

"OO/network DBMS can be very useful when you need to handle large 
hierarchies - simple graph navigation can be times more fast and simple 
way to fetch data then SQL. Even ORACLE START WITH/CONNECT BY statement 
works VERY slow for hierarchies." 

Now, this statement has several fundamentally different (and important) things mixed together, and in the end confuses everyone more than it helps. I expressed my disbelief (yes, I should have been more verbose and friendly...) and then was asked to contribute to the discussion. Here is my (rather long) response:

I'm using the EXPLODE operator provided by my database management system if I'd like to get a hierarchical view of my data (which is the most common operation when it cames to data hierarchies, I think). If no such operator is available or if I find its performance characteristics unacceptable I call my DBMS vendor and complain until they fix it. Of course, I only do that after optimizing my query execution plans, etc. (this should eliminate most issues).

If all of this is not sufficient, I might consider using a nested set model or materialized path to represent my tree information, variations of the simple adjacency list. Again, this is certainly the last step I'd take (which still happens way too often with the poor quality of the SQL products we have today), and is highly likely only acceptable for read-mostly trees. What we have in SQL, the recursive WITH foo as () or the proprietary CONNECT BY variation, is not neccessarily what I have in mind if I think about a good EXPLODE operator. But see below for a reference with a better and complete explanation.

I would certainly not sacrifice any of the (with SQL very rare) advantages I get from the relational data model if I can't find a good operator for my implementation in my DBMS product. After all, its a logical data model, and any performance concern is a physical implementation detail. I don't see how both are related, but I know we are in bad shape if we start messing up the differences between the two. There is no reason why a network/graph-based logical model or a relational model couldn't be implemented with the same performance characteristics. Just because some products are not implemented optimal doesn't mean we should ditch the whole concept of data independence!

Complain to your DBMS vendor. Ask them why their product doesn't fully support state of the art relational data management, such as a relational (and possibly polymorphic) data type system, or a query language that supports closures/explosion for data hierarchies. The list of deficiencies in todays SQL products is unfortunately much longer than this. It's not the fault of the data model if you can't do something in a particular product, or if a specification has serious problems.

It's easy for the snake oil salesman to sell you his old wine if you let yourself get confused about logical data models and physical implementations. It hurts everyone in the end, as we all have to tell our software vendors what we would like to see and what support we need in a product. If we are not able to clearly articulate our needs and if we forget history (ie. what worked and what didn't work in the past), we might get tricked. I'm not feeling comfortable with that.

Finally, a recommendation I can make is the book Practical Issues in Database Management by Fabian Pascal. It is a small book, having only 250 pages. Fabian shows you 10 common issues you will face in your daily practice, but instead of simply explaining how to work your way through SQL, he first explains the relational data model basics for each problem. He then looks at the current practice and explains what you can do in SQL (or what we would need in a DBMS) to solve or work around the issue. A quick read for a weekend and definitely recommended. If you want to brush up your data management basics, buy it in a bundle with Chris Date's Introduction to Database Systems.

Discrimination

Posted by    |       |    Tagged as

Type - not sex, or race - discrimination is what we do when we read a row from a SQL query result set, and determine what Java class we should instantiate to hold the data from that row. Type discrimination is needed by any ORM solution or handwritten persistence layer that supports polymorphic queries or associations.

This simplest way to visualize type discrimination is to consider the following result set:

( 123, '1234-4567-8910-1234', 'Foo Bar', 'VISA' )
( 111, '4321-7654-0198-0987', 'Baz Qux', 'MCD'  )

This result set contains details of two credit cards. Suppose our application has a different Java class to represent each type of credit card. We have two subclasses, VisaCard and MasterCard of the CreditCard class. Then we can check the last column of the result set to decide which class to instantiate for each row. This column is the discriminator column.

You're probably wondering why I'm talking about about result sets instead of tables. Well, there are various ways to map a Java class hierarchy to a relational database schema: table-per-hierarchy, table-per-concrete-class, table-per-class. So the actual table structure may be quite complex. But the only way to actually get data efficiently out of the database is to denormalize it into one big square result set. In fact, I usually think of the job of the SQL query as transforming table-per-concrete-class or table-per-class mapped data into an intermediate table-per-hierarchy denormalized form. (This is, incidentally, why table-per-hierarchy offers the best performance of the three mapping strategies - it is already in a convenient form and does not require unions or joins on the database.) Whatever strategy we choose, we need to perform type discrimination upon this flattened result set.

Now, what I'm most interested in for the moment is the result set discriminator column. Most writings on the topic of inheritance mappings - and there have not been very many - and even most ORM solutions (actually every one that I know of), have assumed that the discriminator column is an actual physical column of the root table of the supertype. Indeed, it has usually been further assumed that the mapping from discriminator values to classes is one-to-one. But this need not be the case. In my credit card example, it certainly made sense. But now let's consider a different case. We're storing data relating to specific individuals in the PERSON table; our SQL query might look like this:

SELECT ID, NAME, SPECIES FROM PERSON

and the result set might be:

( 12345, 'Zxychg Ycjzy', 'Martian' )
( 52778, 'Glooble Queghm', 'Venusian' )
( 98876, 'Gavin King', 'Human' )

Now, here on earth, we consider Humans to be quite special and worthy of extra attention compared to other species of intelligent alien. So we might have a specific Human class, and a generic Alien class. Then the mapping from discriminator column values to classes is certainly not one-to-one. Indeed, there is a specific value for Human, and a catchall for Alien.

Actually, it's quite reasonable that we might even have some extra, Human-specific data, in the HUMAN table. To get all the data we need, let's use the following join:

SELECT ID, NAME, SPECIES, COUNTRY 
FROM PERSON 
    NATURAL JOIN HUMAN

( 12345, 'Zxychg Ycjzy', 'Martian', null )
( 52778, 'Glooble Queghm', 'Venusian', null )
( 98876, 'Arnold Schwarzenegger', 'Human', 'US' )

In this result set, we have two potential discriminator columns. Either the COUNTRY or SPECIES column could be used to determine if the individual is human. And the COUNTRY column isn't a column of the root PERSON table. Now imagine we introduce a further level of specialization, and include data specific to employees of our organization:

SELECT ID, NAME, SPECIES, COUNTRY, JOB
FROM PERSON 
    NATURAL JOIN HUMAN 
    NATURAL JOIN EMPLOYEE

( 12345, 'Zxychg Ycjzy', 'Martian', null, null )
( 52778, 'Glooble Queghm', 'Venusian', null, null )
( 98876, 'Arnold Schwarzenegger', 'Human', 'US', null )
( 34556, 'Gavin King', 'Human', 'AU', 'Java Developer' )

Now we can no longer perform type discrimination using just a single column. Ouch! That's messy. Let's change the query slightly:

SELECT ID, NAME, SPECIES, COUNTRY, JOB, 
    CASE 
        WHEN LOC IS NULL THEN 'ALIEN' 
        WHEN JOB IS NULL THEN 'HUMAN' 
        ELSE 'EMPLOYEE' 
    END 
FROM PERSON 
    NATURAL JOIN HUMAN 
    NATURAL JOIN EMPLOYEE

( 12345, 'Zxychg Ycjzy', 'Martian', null, null, 'ALIEN' )
( 52778, 'Glooble Queghm', 'Venusian', null, null, 'ALIEN' )
( 98876, 'Arnold Schwarzenegger', 'Human', 'US', null, 'HUMAN' )
( 34556, 'Gavin King', 'Human', 'AU', 'Java Developer', 'EMPLOYEE' )

Yay, we've got our nice clean discriminator column back again! But this column most certainly does not correspond to any physical column of a table. It holds a pure derived value.

This is content-based discrimination. Our example uses a table-per-class mapping strategy, but the result sets above could all just as easily have come from from some other mapping strategy.

Here's a second example of content-based discrimination:

SELECT TX_ID, ACCOUNT_ID, AMOUNT, 
    CASE 
        WHEN AMOUNT>0 THEN 'CREDIT' 
        ELSE 'DEBIT' 
    END 
FROM TRANSACTIONS

( 12875467987, 98798723, 56.99, 'CREDIT' )
( 09808343123, 87558345, 123.25, 'DEBIT' )

Here, we use a column that is based upon a mathematical expression, AMOUNT>0, to discriminate between DebitTransaction and CreditTransaction. In principle, much more complex expressions are possible. (In practice, they are likely to remain quite simple.)

In Hibernate 2.x the table-per-class mapping strategy always used content-based discrimination, and the table-per-hierarchy strategy always used column-based discrimination. For some reason - that is now kind of obscure to me - that felt quite natural. In Hibernate3, you can use content based discrimination for the table-per-hierarchy mapping strategy:

<class name="Person" table="PERSON" 
    discriminator-value="ALIEN">
    ...
    <discriminator type="string"> 
        <formula>
            CASE 
                WHEN LOC IS NULL THEN 'ALIEN' 
                WHEN JOB IS NULL THEN 'HUMAN' 
                ELSE 'EMPLOYEE' 
            END
        </formula>
    </discriminator>
    ...
    <subclass name="Human" 
        discriminator-value="HUMAN">
        ...
        <subclass name="Employee" 
            discriminator-value="EMPLOYEE">
            ...
        </subclass>
     </subclass>
 </class>

And you can use column-based discrimination for table-per-class mappings:

<class name="Person" table="PERSON" 
    discriminator-value="ALIEN">
    ...
    <discriminator type="string" column="TYPE"/>
    ...
    <subclass name="Human" 
        discriminator-value="HUMAN">
        <join table="HUMAN">
            ...
            <subclass name="Employee" 
                discriminator-value="EMPLOYEE">
                <join table="EMPLOYEE">
                    ...
                </join>
            </subclass>
        </join>
     </subclass>
 </class>

For the table-per-concrete-class strategy (<union-subclass> mapping), only content-based discrimination makes sense.

Hibernate Training in Melbourne

Posted by    |       |    Tagged as

September 20-22 in Melbourne will be the first time we deliver our new three-day Hibernate course. The course has been heavily revised and expanded to include previews of the cool new stuff coming in Hibernate3 and an overview of Hibernate internals (/very/ useful if you ever need to debug a Hibernate application). There are still seats available, if you're quick! This will be the last training we run in Australia for a while, since I won't be in the country much, if at all, over the next six months or so. Email training@jboss.com for more information. (We also have an upcoming course in Paris, November 3-5.)

Batch processing in Hibernate

Posted by    |       |    Tagged as

I gotta preface this post by saying that we are very skeptical of the idea that Java is the right place to do processing that works with data in bulk. By extension, ORM is probably not an especially appropriate way to do batch processing. We think that most databases offer excellent solutions in this area: stored procedure support, and various tools for import and export. Because of this, we've neglected to properly explain to people how to use Hibernate for batch processing if they really feel they /have/ to do it in Java. At some point, we have to swallow our pride, and accept that lots of people are actually doing this, and make sure they are doing it the Right Way.

A naive approach to inserting 100 000 rows in the database using Hibernate might look like this:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
   Customer customer = new Customer(.....);
   session.save(customer);
}
tx.commit();
session.close();

This would fall over with an OutOfMemoryException somewhere after the 50 000th row. That's because Hibernate cache's all the newly inserted Customers in the session-level cache. Certain people have expressed the view that Hibernate should manage memory better, and not simply fill up all available memory with the cache. One very noisy guy who used Hibernate for a day and noticed this is even going around posting on all kinds of forums and blog comments, shouting about how this demonstrates what shitty code Hibernate is. For his benefit, let's remember why the first-level cache is not bounded in size:

  • persistent instances are /managed/ - at the end of the transaction, Hibernate synchronizes any change to the managed objects to the database (this is sometimes called /automatic dirty checking/)
  • in the scope of a single persistence context, persistent identity is equivalent to Java identity (this helps eliminate data /aliasing/ effects)
  • the session implements /asynchronous write-behind/, which allows Hibernate to transparently batch together write operations

For typical OLTP work, these are all very, very useful features. Since ORM is really intended as a solution for OLTP problems, I usually ignore criticisms of ORM which focus upon OLAP or batch stuff as simply missing the point.

However, it turns out that this problem is incredibly easy to work around. For the record, here is how you do batch inserts in Hibernate.

First, set the JDBC batch size to a reasonable number (say, 10-20):

hibernate.jdbc.batch_size 20

Then, flush() and clear() the session every so often:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

for ( int i=0; i<100000; i++ ) {
   Customer customer = new Customer(.....);
   session.save(customer);
   if ( i % 20 == 0 ) {
      //flush a batch of inserts and release memory:
      session.flush();
      session.clear();
   }
}

tx.commit();
session.close();

What about retreiving and updating data? Well, in Hibernate 2.1.6 or later, the scroll() method is the best approach:

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

ScrollableResults customers = session.getNamedQuery("GetCustomers")
   .scroll(ScrollMode.FORWARD_ONLY);
int count=0;
while ( customers.next() ) {
   Customer customer = (Customer) customers.get(0);
   customer.updateStuff(...);
   if ( ++count % 20 == 0 ) {
      //flush a batch of updates and release memory:
      session.flush();
      session.clear();
   }
}

tx.commit();
session.close();

Not so difficult, or even shitty, I guess. Actually, I think you'll agree that this was much easier to write than the equivalent JDBC code messing with scrollable result sets and the JDBC batch API.

One caveat: if Customer has second-level caching enabled, you can still get some memory management problems. The reason for this is that Hibernate has to notify the second-level cache /after the end of the transaction/, about each inserted or updated customer. So you should disable caching of customers for the batch process.

Hibernate3 Join Tricks

Posted by    |       |    Tagged as Hibernate ORM

One of the joys of working on an open source project with commercial competitors is having to implement features that our users simply don't ask for, and probably won't use in practice, just because those competitors try to spin their useless features as a competitive advantage. We realized ages ago that it's really hard to tell people that they don't need and shouldn't use a feature if you don't have it.

Multi-table mappings started out as a good example of that kind of features. We have been repeating the your object model should be at /least/ as fine-grained as your relational schema mantra for years now. Unfortunately, we keep hearing this echo back as Hibernate can't do multitable mappings. Nobody has ever once shown me a truly compelling usecase for multitable mappings in a real application, but apparently, if our competitors are to be believed, it is common to find schemas with attributes of the same entity scattered randomly across several different physical tables. I'll have to take their word on that one. I'm not saying you will /never/ run into this kind of thing and, indeed, I've seen a few borderline cases, though nothing that wasn't at least arguably better represented as an association. But certainly, to my mind, valid usecases for multitable mappings are not something you run into commonly enough for this to be an important feature. Perhaps the difference in perception is due to the fact that only /sane/ organizations use Hibernate.

Anyway, we introduced the <join/> mapping, just so we could tell people not to use it. Actually, it was fun to implement, and helped me make some really nice refactorings to the EntityPersister hierarchy.

Then a funny thing happened. I started to think of all kinds of useful things to do with <join/>, none of which had anything much to do with multitable mappings, as usually understood. And I'm pretty certain that these things were not what the other guys were talking about!

The first application I came up with is a mixed inheritance mapping strategy. Before, you had a choice between <subclass/> and <joined-subclass/> (now also <union-subclass/>), and you had to stick with that one strategy for the whole hierarchy.

It's now possible to write a mapping like this:

<class name="Superclass" 
        table="parent"
        discriminator-value="0">
    <id name="id">.....</id>
    <discriminator column="type" type="int"/>
    <property ...../>
    ...
    
    <subclass name="Subclass" 
            discriminator-value="1">
        <property .... >
        ...
    </subclass>
    
    <subclass name="JoinedSubclass" 
            discriminator-value="-1">
        <join table="child">
            <property ...../>
            ....
        </join>
    </subclass>
    
</class>

That's /really/ useful.

The next thing that <join/> can be used for required a little tweak. I added an inverse attribute to the join element, to declare that the joined table should not be updated by the owning entity. Now, it's possible to map an association (link) table - which usually represents a many-to-many association - with one-to-many multiplicity in the domain model. First, we have a basic many-to-many mapping, on the Parent side:

<class name="Parent">
    ...
    <set name="children" table="ParentChild" lazy="true">
        <key column="parentId"/>
        <many-to-many column="childId" class="Child"/>
    </set>
</class>

Now, we use a <join> mapping, to hide the association table from the Child end:

<class name="Child">
    ...
    <join table="ParentChild" inverse="true">
        <key column="childId"/>
        <many-to-one name="parent" column="parentId"/>
    </join>
</class>

Well, I'm not sure really how useful this is, but I was always jealous of the TopLink guys when they bragged how they could do this, and we got it /almost/ for free!

A third trick was also inspired by TopLink. A number of former TopLink users porting code to Hibernate found that Hibernate's table-per-class mapping strategy has significantly different performance characteristics to TopLink's. Hibernate has what seems to be a unique implementation of the table-per-class mapping strategy, in that no discriminator column is required to achieve polymorphism. Instead, Hibernate performs an outer join across all sublass tables, and checks which primary keys values are null in each returned row of results in order to determine the subclass that the row represents. In most circumstances, this offers an excellent performance balance, since it is not vulnerable to the dreaded N+1 selects problem. Furthermore, it does not require the addition of a type discriminator column to the table of the root class, which really feels extremely unnatural and redundant for this relational model.

An alternative approach, that TopLink uses, is to perform an initial query, check the value of a discriminator column, and then issue an extra query if the row represents a subclass instance. This isn't usually very efficient for shallow inheritance trees, but what we've seen is that some ex-TopLink users have created very deep or wide inheritance trees, in which case Hibernate's strategy can result in a single query with simply too many joins.

So, I added the outer-join attribute to <join/>. Its effect is slightly subtle. Consider the following mapping:

<class name="Foo" table="foos" discriminator-value="0">
    <id name="id">...</id>
    <discriminator column="type" type="int"/>
    <property name="name"/>
    <subclass name="Bar" discriminator-value="1">
        <join table="bars">
            <key column="fooId"/>
            <property name="amount"/>
        </join>
    </subclass>
</class>

When we execute a HQL query against the subclass Bar, Hibernate will generate SQL with an inner join between foos and bars. If we query against the superclass Foo, Hibernate will use an outer join.

(Note that you would not write the above mapping in practice; instead you would use <joined-subclass/> and eliminate the need for the discriminator.)

Suppose we set outer-join="false":

<class name="Foo" table="foos" discriminator-value="0">
    <id name="id">...</id>
    <discriminator column="type" type="int"/>
    <property name="name"/>
    <subclass name="Bar" discriminator-value="1">
        <join table="bars" outer-join="false">
            <key column="fooId"/>
            <property name="amount"/>
        </join>
    </subclass>
</class>

Now, when we query the subclass, the same SQL inner join will be used. But when we query the superclass, Hibernate won't use an outer join. Instead, it will issue an initial query against the foos table, and a sequential select against the bars table, whenever it finds a row with a discriminator value of 1.

Well, that's not such a great idea in this case. But imagine if Foo had a very large number of immediate subclasses. Then we might be avoiding a query with very many outer joins, in favor of several queries with no joins. Well, perhaps some people will find this useful....

Contextual logging

Posted by    |       |    Tagged as

We were doing some work with a customer with a very large project recently, and they were concerned about traceability of the SQL issued by Hibernate. Their problem is one that I guess is common: suppose I see something wong in the Hibernate log (say, some N+1 selects problem), how do I know which of my business classes is producing this? All I've got in the Hibernate log is org.hibernate.SQL, line 224 as the source of the log message!

I started to explain how Hibernate3 can embed comments into the generated SQL, so you could at least track the problem back to a particular HQL query. But then Steve remembered that log4j provides the /nested diagnostic context/. Now, I've seen a lot of projects using log4j, but I've never actually seen this used anywhere. I think it might be a better alternative to adding entry and exit logging everywhere, since we can see this context even if the entry/exit log categories are disabled. It's a good way to track the source of SQL in the Hibernate log. All you need to do is add calls to push() and pop() in your DAO:

public List getCustomersByName(String pattern) {
    NDC.push("CustomerDAO.getCustomersByName()");
    try {
        return getSession()
            .createQuery("from Customer c where c.name like :pattern")
            .setString("pattern", pattern)
            .list();
    }
    finally {
        NDC.pop();
    }
}

Then, if I set my pattern right:

log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %c{1}:%L - %m (%x)%n

I'll get a log message like this:

20:59:38,249 DEBUG [=>SQL:244] - select .... like ? (CustomerDAO.getCustomersByName())

Just thought I'd mention it, in case it helps someone.

back to top