Don't lock in the middle tier!

Posted by    |      

One of the reasons we use relational database technology is that existing RDBMS implementations provide extremely mature, scalable and robust concurrency control. This means much more than simple read/write locks. For example, databases that use locking are built to scale efficiently when a particular transaction obtains /many/ locks - this is called /lock escalation/. On the other hand, some databases (for example, Oracle and PostgreSQL) don't use locks at all - instead, they use the multiversion concurrency model. This sophisticated approach to concurrency is designed to achieve higher scalability than is possible using traditional locking models. Databases even let you specify the required level of transaction isolation, allowing you to trade isolation for scalability.

Unfortunately, some Java persistence frameworks (especially CMP engines) assume that they can improve upon the many years of research and development that has gone into these relational systems by implementing their own concurrency control in the Java application. Usually, this takes the form of a comparatively crude locking model, with the locks held in the Java middle tier. There are three main problems with this approach. First, it subverts the concurrency model of the underlying database. If you have spent a lot of money on your Oracle installation, it seems insane to throw away Oracle's sophisticated multiversion concurrency model and replace it with a (less-scalable) locking model. Second, other (non-Java?) applications that share the same database are not aware of the locks. Finally, locks held in the middle tier do not naturally scale to a clustered environment. Some kind of distributed lock will be needed. At best, distributed locking will be implemented using some efficient group communication library like JGroups. At worst (for example, in OJB), the persistence framework will persist the locks to a special database table. Clearly, both of these solutions carry a heavy performance cost. Accordingly, Hibernate was designed to /not require/ any middle-tier locks - even thread synchronization is avoided. This is perhaps the best and least-understood feature of Hibernate and is the key to why Hibernate scales well. So why do other frameworks not just let the database handle concurrency?

Well, the only good justification for holding locks in the middle tier is that we might be using a middle-tier cache. It turns out that the problem of ensuring consistency between the database and the cache is an extremely difficult one and solutions usually do involve some use of middle-tier locking. (Incidently, most applications which use a cache do not solve this problem correctly, even in a non-clustered environment.)

So, for example, when Hibernate integrates with JBoss Cache, the cache implementation must obtain clustered locks internally (again, using JGroups). In Hibernate, we consider it a quality-of-service concern of the cache implementation to provide this kind of functionality. We can do this because Hibernate, unlike many other persistence layers, features a two-level cache architecture. This design separates the transaction-scoped /session cache/ (which does /not/ require middle-tier locking and delegates concurrency concerns to the database) from the process or cluster scoped /second-level cache/ (which /may/ require middle-tier locks). So when the second-level cache is disabled for a particular class, no middle-tier lock is required. Hence, in this case, the scalability of Hibernate is limited only by the scalability of the underlying database. Our design also allows us to consider other, more sophisticated approaches to ensuring consistency between the second-level cache and database - approaches that do not require the use of middle-tier locking. I'll keep this stuff secret for now; it is an active area of investigation!

Back to top