Red Hat

In Relation To Gavin King

In Relation To Gavin King

Don't lock in the middle tier!

Posted by    |       |    Tagged as

One of the reasons we use relational database technology is that existing RDBMS implementations provide extremely mature, scalable and robust concurrency control. This means much more than simple read/write locks. For example, databases that use locking are built to scale efficiently when a particular transaction obtains /many/ locks - this is called /lock escalation/. On the other hand, some databases (for example, Oracle and PostgreSQL) don't use locks at all - instead, they use the multiversion concurrency model. This sophisticated approach to concurrency is designed to achieve higher scalability than is possible using traditional locking models. Databases even let you specify the required level of transaction isolation, allowing you to trade isolation for scalability.

Unfortunately, some Java persistence frameworks (especially CMP engines) assume that they can improve upon the many years of research and development that has gone into these relational systems by implementing their own concurrency control in the Java application. Usually, this takes the form of a comparatively crude locking model, with the locks held in the Java middle tier. There are three main problems with this approach. First, it subverts the concurrency model of the underlying database. If you have spent a lot of money on your Oracle installation, it seems insane to throw away Oracle's sophisticated multiversion concurrency model and replace it with a (less-scalable) locking model. Second, other (non-Java?) applications that share the same database are not aware of the locks. Finally, locks held in the middle tier do not naturally scale to a clustered environment. Some kind of distributed lock will be needed. At best, distributed locking will be implemented using some efficient group communication library like JGroups. At worst (for example, in OJB), the persistence framework will persist the locks to a special database table. Clearly, both of these solutions carry a heavy performance cost. Accordingly, Hibernate was designed to /not require/ any middle-tier locks - even thread synchronization is avoided. This is perhaps the best and least-understood feature of Hibernate and is the key to why Hibernate scales well. So why do other frameworks not just let the database handle concurrency?

Well, the only good justification for holding locks in the middle tier is that we might be using a middle-tier cache. It turns out that the problem of ensuring consistency between the database and the cache is an extremely difficult one and solutions usually do involve some use of middle-tier locking. (Incidently, most applications which use a cache do not solve this problem correctly, even in a non-clustered environment.)

So, for example, when Hibernate integrates with JBoss Cache, the cache implementation must obtain clustered locks internally (again, using JGroups). In Hibernate, we consider it a quality-of-service concern of the cache implementation to provide this kind of functionality. We can do this because Hibernate, unlike many other persistence layers, features a two-level cache architecture. This design separates the transaction-scoped /session cache/ (which does /not/ require middle-tier locking and delegates concurrency concerns to the database) from the process or cluster scoped /second-level cache/ (which /may/ require middle-tier locks). So when the second-level cache is disabled for a particular class, no middle-tier lock is required. Hence, in this case, the scalability of Hibernate is limited only by the scalability of the underlying database. Our design also allows us to consider other, more sophisticated approaches to ensuring consistency between the second-level cache and database - approaches that do not require the use of middle-tier locking. I'll keep this stuff secret for now; it is an active area of investigation!

Jason's observations

Posted by    |       |    Tagged as

Jason has pointed out some interesting things about the current release of Hibernate.

First, he notes that Hibernate's query cache may not currently be used with SwarmCache. This is due to SwarmCache not being a /replicated/ cache - it uses clustered eviction. Hibernate's query cache may be used with any replicated clustered cache, as long as clocks are synchronized in the cluster. Our documentation should make this requirement clearer. (For most applications, the query cache is not an especially useful feature, so this issue affects only a small number of users.)

Second, he has noticed that Hibernate currently writes to the second-level cache too often. I had not previously been aware of this issue. Hibernate was optimized with local, perhaps disk-backed, cache implementations in mind, where lookups are at least as expensive (in the case of a disk-backed cache, /more/ expensive) than puts. Jason has pointed out that in the case of a clustered cache, this is reversed and puts are much more expensive than lookups. We will provide a switch in a future point release of Hibernate 2.1 to adjust this behavior.

Finally, Jason is unhappy that we are not supporting JDK dynamic proxies and require CGLIB for lazy association fetching. (CGLIB causes problems if the JVM is running with a restrictive security policy.) Well, as I explained to Jason, Hibernate already provides a hook to allow the proxy implementation to be customized using a custom persister. (Dynamic proxy support could be implemented with a very small amount of code.) This evening I actually refactored this stuff slightly and introduced a ProxyFactory interface to make the hook more obvious.

I'd like to point out that the Hibernate project has many thousands of users and just a handful of active developers. As a result, we must prioritize feature requests by two considerations: first, how many users beg for the feature; second, how strategic a feature is with respect to the future direction of the project. We simply do not implement every requested feature since we have limited resources and are fighting a continual battle against code bloat (bloat is a /very/ serious issue if you are the person maintaining and enhancing the codebase). Hibernate is open source, so users with special requirements can always use a patched version in their application. In addition, we do not usually process feature requests via IM or! The correct process is to submit an enhancement request to JIRA (and let other users comment and vote) or, even better, to start a discussion in the mailing list. We may be an open source project; this does not mean that we are completely unprofessional and disorganized!

Criteria queries reloaded

Posted by    |       |    Tagged as

There were quite a few comments in response to my post about Criteria queries. I finally get around to responding. A number of people suggest a more tree-oriented approach, where we treat all logical operators as binary. For example, anonymous suggests the following:

session.createCriteria( Project.class,
      eq("name", "Hibernate"), 
      like("description", "%ORM%")

Now, certainly logical operations are binary. But they are also /associative/, and this seems to be denied by the tree approach. We would never, ever write:

( (x=2 and x=1) and y=3 ) and z=4

We always write:

x=2 and x=1 and y=3 and z=4

This is particularly relevant in the case of Criteria queries since the common case is that we compose together many conditions using conjunction. (There were objections to my use of conjunction and disjunction, but I don't know of any other word for a string of expressions composed with and/or.)

Actually, the current API /does/ already allow this alternative. We have Expression.or() and Expression.and(). But to me they seem to be much messier than add().

Carl Rosenberger, the man behind SODA and db4o writes:

In my opinion a clean object querying system is the first and foremost 
basis for any standard on object persistence. 

Enhancers/Reflection/BCEL/code calls to make objects persistent 
all this can be very quickly exchanged, if you want to switch the 
underlying system.

Queries can not be exchanged!

And I couldn't possibly agree more! This is absolutely right. He goes on to say:

Could you maybe bring this thought into JDO 2.0 ? Please ?

Besides, I am positive that a de-facto standard for object querying will 
have a much greater impact on the industry than JDO.
Java is not the only programming language on this planet. 

In fact, I've mentioned the idea of adopting a better query approach to a couple of the guys on the expert group, but didn't get much of a positive response. Besides, I'm not at all convinced that it is even possible to design a nice query language or API in a committee environment. These kind of things need a strong unified vision. Comittees are okay at standardizing mature solutions to well-understood problems, but it is my view that object-oriented querying is a far from well understood problem. Especially, it does not seem to be commonly appreciated that an ORM-oriented query language would look quite different to an object database query language!

James Strachan is pimpin' Groovy , his new JVM-compatible language (I'm not sure if its quite correct to call it a scripting language) that looks quite like Python but features one of the best things about SmallTalk: closures. (Digression: it seems to be widely believed that the JDK Collections API is one of Java's good points. But if you've ever used collections in SmallTalk, you'll know just how impoverished Java actually is when it comes to working with collections. Iterators are far, far uglier than the more functional approach available in languages with closures. If there is /anything/ that I would ask to be fixed in Java, it wouldn't be the lack of generics, the lack of enums, etc, etc - it would be the lack of closures.) Anyway, I'm interested in what James has to say, but I'm skeptical toward the notion that an object-oriented expression language is the right starting point for an ORM query language. Firstly, relational databases implement very different null value semantics to object oriented languages. In fact, SQL's ternary logic is /quite/ different to the binary logic implemented by programming languages. Secondly, equality is a much more slippery concept in the object oriented world than in the database world. For these reasons, and others, we have chosen to base HQL on SQL, not on Java-like syntax. It is my view that this was one of the best decisions we've made.

Razvan describes what (s)he calls an an associative query model (much like the query in JavaSpaces). If I'm not mistaken, this is what we usually call query by example. I'll get round to discussing Hibernate's new Example query API in a future post...

Query Objects vs Query Languages

Posted by    |       |    Tagged as

Chris Winters doesn't like object-oriented query APIs. Since Hibernate emphasizes the query /language/ approach, I'm not the best person to disagree with him here. Criteria queries are usually noisier, no doubt about that. And the query languages I've seen tend to be more expressive. Writing arithmetic and even logical expressions is a breeze in a query lanugage, but certainly not in an object-oriented Criteria API.

However, there are a couple of advantages of the Criteria approach. First, some folks like the fact that more can be done to validate the query at compile time. This is no big deal to me because I'm very unit test oriented. Much more importantly, object-oriented query APIs are much better for building queries programmatically. String manipulations suck! Getting your parentheses and whitespace right for all combinations of criteria is a pain. I've seen some truly ugly code that builds HQL strings and it could be rewritten /much/ more neatly using the Criteria API.

Especially, the new query by example stuff could potentially reduce 20 lines of code that builds a query containing the needed properties of Person to:

    .add( Example.create(elvis) )

Well, I suppose query by example is a bit of an extreme case, and you could perhaps implement a similar feature in a query language.

Finally, object oriented query APIs can be more user-extensible. This is what I like best.

P.S. The code example Chris gives is a bit unfair to object APIs. Hibernate's Criteria API is much less verbose than that, partly due to the support for method chaining (which is barely used in Java, so I betray the time I spent with SmallTalk).

Names, what's in 'um?

Posted by    |       |    Tagged as

Names are important in computing. Really Important. I'm not talking about calling classes sensible things like UpdateUserDetailsCommand in preference to UpdUDetsCd. I mean the names of products themselves. A great name tells us lots about a piece of software: it tells us that the creater has some imagination, a sense of style even. It tells us that this person is serious about the success of the product, that they understand that there is more to making great software than writing great code.

Java is a fantastic name for just about anything. I love the ambiguity; coffee, or the largest island of the Indonesian archipeligo? There is no way I could hear the name Java and not wonder what it is a name for. C# (pronounced cee-hash by those in the know) is twee and derivative. Its pathetic enough trying to impress girls with the fact that I am a Java developer. Imagine if I had to say I was a cee sharp or dot net programmer? Ugggh. But from this we can learn something of the mindset that created these two languages - C# /is/ actually more derivative than Java.

So, if we have eliminated C# as the next step forward in languages, what about the other two possible Java-replacements: Python and Ruby. Well, Python is great, of course. There's that ambiguity once again: funny Englishmen, or snakes. Snakes are cool. Actually, naming things after animals is /always/ cool, even Ants. Ruby grows on me. I don't /love/ it, but as an English name given by a non-native English speaker, I'm happy to give it a pass. I've noticed that non-native speakers tend to name their projects by horrible acronyms.

The Apache project gives us some great examples of both good and bad. Tapestry is a beautiful, evocative name, just perfect for a web presentation framework. Struts is a good (perhaps a little too obvious) pun on framework. Lucene and Avalon are nice enough. Ant we mentioned already. commons-lang is an abomination, being the concatenation of commons (ugh) with that incredibly romantic English word lang. Worse, the concatenation is performed with a hyphen. Names don't have hyphens in them, guys! (Well, apart from Mike's ). Commons could evoke either typical or communal or both; neither interpretation is inspiring in this age. Actually, commons seems to be Jakarta's moniker for stuff we don't have proper names for or, perhaps, stuff that doesn't really deserve to be a real project. Similarly, commons-primitives could /almost/ conjure up some kind of vision of neolithic tribal society, uncorrupted by modern notions of private property .... but it doesn't. To me this indicates something about the respective quality of the projects. Would any truly committed developer, bulding something they love and believe in, possibly call their project commons-lang? Nope. They would call it Tapestry. If they had Howard's sense of theater, that is.

In the realm of distributed computing, we have been given CORBA - a nice, easily pronouncable acronym that sounds like a real English word and even better reminds us of snakes - and the ridiculous WSDL (pronounced whizz-dull, which is an oxymoron). I guess this means that Web Services are just another dead end - surely, nothing useful could ever be named WSDL?

Being a contrarian, I named our project with a verb. Hibernate isn't a thing - it's not even a state of mind; it's something you /do/.

disclaimer: this post is meant in good humour, please don't take offense

Designing "query by criteria"

Posted by    |       |    Tagged as

One of the great improvements in Hibernate 2.1 is that we finally have a mature Criteria query API. For a very long time I let this feature languish because I just wasn't sure what it should really look like. Every QBC API I've looked at is designed differently and there is certainly nothing like a standard API to learn from. I've seen everything from this:

new Criteria(Project.class)
  .addEq("name", "Hibernate")
  .addLike("description", "%ORM%")

to this:

Criteria crit = new Criteria(Project.class)

I don't like either of these approaches because the addition of new types of criterion requires the uncontrolled growth of a single central interface (Criteria, in the first case; Property in the second).

I like the second approach even less because it is very difficult to chain method calls. What should the eq() method return? Well, it seems most reasonable that it would return the receiving object (ie. the property). But it is very unusual to apply multiple criteria to the same property! So we would really prefer it to return the Criteria if we wanted to chain method calls. Well, I don't know about you, but I think that any API that returned the receiver from two calls ago might not be considered intuitive. So we are stuck with that evil temp variable.

I'd seriously consider improving this second approach to look like this:

new Criteria(Project.class)

Which is actually very clean. Unfortunately, the interfaces themselves are quite bizzare: and() is an operation defined by .... Criterion? The and() method returns .... the Criteria? This doesn't feel like a very natural OO design. And I think it would confuse new users. I'll come back to another reason why and and or should not be operations at all.

As a variation upon the first approach, I have seen the following:

new Criteria(Project.class)
  .add( new Equals("name", "Hibernate") )
  .add( new Like("description", "%ORM%") )

This avoids the problem of the Criteria interface growing out of control. But I hate Java constructors almost as much as I hate temp variables! The problem with constructors in Java is that they cannot be given meaningful names. We can't call a constructor EqualsIgnoreCase() if the class is named Equals. Secondly, once we start using constructors, we pretty much permanently nail down the Criterion class hierarchy. We tie client code directly to the concrete classes. I can't change my mind later and decide that Equals and EqualsIgnoreCase should be different classes.

Eventually I ended up being most influenced by the Cayenne query API (whch I presume was in turn influenced by Apple's WebObjects). Cayenne uses a class with static factory methods to create Criterion instances. Actually, Cayenne misnames the criterion class Expression and I stupidly inherited this misnaming in our (Hibernate) factory class. So, we ended up with:

  .add( Expression.eq("name", "Hibernate") )
  .add("description", "%ORM%") )

Notice that this code does not use any concrete classes other than the static factory class - its all interfaces!

The downside of this design is that there are more characters in add( Expression.eq()) than in add( new Eq()) or addEq(). So it is certainly more verbose. It is also noisy. What stands out in the code above is the two occurrences of Expression. But they are the least important thing in the code.

Fortunately for me, JDK1.5 will come along and give us static imports. Static imports have been very unfairly maligned in the past, so let me try to set the record straight. If I add import net.sf.hibernate.expression.Expression.*, the code example above becomes:

  .add( eq("name", "Hibernate") )
  .add( like("description", "%ORM%") )

This is now less verbose and more readable than the version that used constructors. I'm halfway done.

A second problem is the logical combination of criterions. and and or are each associative, but a string of both ands and ors is certainly not associative. So it seemed critically important to me that the precedence of the logical operators is crystal clear from the structure of the code. I hate the following:

  .addAnd( eq("name", "Hibernate") )
  .addAnd( like("description", "%ORM%") )
  .addOr( like("description", "%object/relational%") )

I've seen a number of APIs like this and I still don't have a clue how the previous code is intended to be read. The same problem applies to this variation:

new Criteria(Project.class)

OK, OK, I actually do know that conjunction usually has a higher precendence than disjunction - but I would never, ever write code that depended upon this. It just isn't readable. And we certainly can't always rely upon operator precedence - we do need some way to express grouping. Anyway, I think this problem would affect any API which offers and() and or() as methods. So let's not make and() and or() be operations at all.

By the way, worst of all is this:

new Criteria(Project.class)
   .and( crit.getProperty("description).like("%ORM%") )

And is a symmetrical operation! This symmetry should be obvious.

The solution is to treat Conjunction and Disjunction in exactly the same way as atomic Criterions. Make them Criterions, not operations.

  .add( Expression.disjunction()
      .add( eq("name", "Hibernate") )
      .add( like("description", "%ORM%") )

Well, that's a couple too many parentheses for my taste. I'm considering supporting something like the following in Hibernate:

      .add( eq("name", "Hibernate") )
      .add( like("description", "%ORM%") )

My big problem here is that createDisjunction() would need to return a new instance of Criteria (wrapping a Disjunction) just so that we can call list() without needing a new temp variable. I'm not sure if I like this. Currently Expression.disjunction() just returns an instance of Disjunction directly - and Disjunction implements only the Criterion interface. I guess we're still searching for perfection...


Posted by    |       |    Tagged as

We are taking a close look at SDO. It's an interesting spec that comes a bit out of left field. My reading is that it provides a mechanism for manipulating and especially for externalizing graphs of objects or things that look sufficiently close to objects to be meaningfully represented as a graph. For example, XML.

This is naturally very important to us, since one of the significant things we have tried to achieve with Hibernate is to get away from the notion of location transparency, and reinvent distributed objects as object graphs which may be moved between different processes. Especially, we are interested in the idea that a graph could be retrieved from the persistent store in one process, modified in another, and then have those modifications propagated back to the database in the first process, all with optimistic semantics.

So, our biggest problem in all this is that tracking modifications to typesafe objects precisely is extremely difficult in Java without significant bytecode tricks (which we have been so far unwilling to adopt). SDO bypasses this problem by representing objects in a nontypesafe way (contrary to our notion of transparency). I'm not yet convinced that this is worth dropping the advantages of typesafeness for, though I recognize that the authors of the SDO spec are looking for an approach that abstracts away from POJOs, EJBs, DOMs, whatever.

It has been suggested that Jakarta DynaBeans are a useful analogy when looking at this stuff, but I don't find this very useful at all. We are trying to figure out some kind of relationship to CarrierWave, which is also all about working with object graphs (Christian got excited about CarrierWave a while ago). Our biggest stumbling block so far is that SDO doesn't seem to address one of the main problems solved by CarrierWave: namely specifying where the object graph /ends/. My understanding is that this is left as an exercise for the reader, and for whatever native query language is in use, eg. HQL, XPath, etc. Anyway, whereas DynaBeans are a workaround for a specific limitation in the design of Struts, this is an approach to solving some of the more difficult problems in building distributed systems using domain models.

back to top