Red Hat

In Relation To Hibernate Shards

In Relation To Hibernate Shards

Multi-tenancy in Hibernate

Posted by    |       |    Tagged as Hibernate ORM Hibernate Shards

This has come up a few times so I thought I'd write up the ways to handle multi-tenancy in Hibernate. This is not an exhaustive list. We wont go into database vendor specific features (Oracle VPD, etc) for example. Generally speaking there are 3 ways to factor multi-tenancy into your database design:

  1. Separate database instances - This approach gives each tenant their own physical database instance.
  2. Separate schemas - This approach uses the same physical database instance for all the tenants, but each gets its own schema (or catalog) within that instance.
  3. Partitioning - This approach uses the same database instance and same schema. In other words a single table holds the data for every tenant. The tenants are partitioned by some form of discriminator value.

The approaches to handling the first two are pretty much the same. So let's look at the third approach first as it requires a much different handling.

Partitioning

To be clear lets look at an example. Lets say that the application in question has a CUSTOMER table:

CUSTOMER (
  ID BIGINT,
  NAME VARCHAR,
  ...
  TENANT_ID VARCHAR
)

Notice the TENANT_ID column as it is the crux to this design. Basically it identifies which tenant the given row belongs to. Choosing this style of design has important ramifications which are beyond the scope of this discussion (like unique keys probably now need to include the TENANT_ID column, etc). For the purposes of this discussion, we are just concerned with how we identify rows as belonging to a particular tenant. Two choices for dealing with this are (1) use of Hibernate Shards or (2) use of the filter feature of Hibernate Core.

This approach has the distinct advantage of being capable of leveraging Hibernate second level caching. As we will see below, that is currently not possible with the other approaches.

Separate data

As I mentioned before, the first two options are pretty similar in terms of handling from JDBC, so therefore pretty similar in terms of handling from Hibernate. Going back to the CUSTOMER table, here we have:

CUSTOMER (
  ID BIGINT,
  NAME VARCHAR,
  ...
)

This time, we have no tenant discriminator as far as column. The discriminator comes from the fact of which tenant's database/schema we are looking at. Again getting into the pros and cons of this approach compared to partitioning is beyond the scope of this discussion. In terms of JDBC, this really just boils down to different connection urls that indicate the tenant we are dealing with at that time. So how can we get Hibernate to manage that for us?

One approach is to define a SessionFactory for each tenant. However, if you have large schemas and/or a large number of tenants and these SessionFactorys all reside in the same memory space, this approach can become very burdensome in terms of the memory footprint.

Another approach is to utilize a feature called application-supplied connections. Notice that from a SessionFactory you can open a Session using a Connecton you supply. However this can get unwieldy. A variation of this is for our application to tell Hibernate which Connection to use for the current context. Internally Hibernate makes use of an SPI contract named ConnectionProvider for obtaining Connections when it needs them. And although this contract does not account for passing in the tenant identifier it's pretty trivial to account for that using a ThreadLocal, JNDI/ENC or some other contextual and accessible manner. For the purpose of illustration, lets assume a DataSource JNDI names based on the tenant for look-ups and that the identifier of the current tenant is statically available from a custom TenantContext class:

public class MyTenantAwareConnectionProvider implements ConnectionProvider {
    public static final String BASE_JNDI_NAME_PARAM = "MyTenantAwareConnectionProvider.baseJndiName";

    private String baseJndiName;

    public void configure(Properties props) {
        baseJndiName = props.getProperty( BASE_JNDI_NAME_PARAM );
    }

    public Connection getConnection() throws SQLException {
        final String tenantId = TenantContext.getTenantId()
        final String tenantDataSourceName = baseJndiName + '/' + tenantId;
        DataSource tenantDataSource = JndiHelper.lookupDataSource( tenantDataSourceName );
        return tenantDataSource.getConnection();
    }

    public void closeConnection(Connection conn) throws SQLException {
        conn.close();
    }

    public boolean supportsAggressiveRelease() {
        // so long as the tenant identifier remains available in TL throughout, we can
        return true;
    }

    public close() {
        // currently nothing to do here
    }
}

The essential idea here is that Hibernate continues with what it normally does, but that we plug in a new behavior here so far as how it obtains connections in relation to our application's understanding of a current tenant. We are using a single SessionFactory and so get the benefit of the memory footprint of just one SessionFactory instead of one per tenant.

It was mentioned before, but bears repeating, that second level caching is problematic using this approach and should be disabled. The reason being that Hibernate does not know that Customer#1 from one tenant and Customer#1 from another tenant are actually different data. For that to work we'd have to encode the notion of tenant id into the cache key we use when storing into the second level cache. That has been discussed as an enhancement, but is not currently implemented.

Article about Hibernate Shards on developerWorks

Posted by    |       |    Tagged as Hibernate Shards

Andy Glover wrote an excellent article about Hibernate Shards a few weeks ago. (It seems we missed linking to it here.)

Hibernate Shards in the blogosphere

Posted by    |       |    Tagged as Hibernate Shards

There has been a couple of interesting blog entries about Hibernate Shards and data sharding / partitioning in the last few days (here and there). Both give a decent five / ten minutes overview of Hibernate Shards.

We are also planning to do a full talk about the project at JBoss World Orlando in february.

Hibernate Search and Shards talk at AJUG Sept 18th

Posted by    |       |    Tagged as Hibernate Shards

I will be giving a talk at the Atlanta JUG next Tuesday (18th).

The talk will go through Hibernate Search: why using it, where to use it, it's internal architecture and how to use it best.

I will also introduce Hibernate Shards, what it is and what it is not :)

Hopefully Q&A will have a decent slice of the time cake.

See you in the Perimeter area :)

Welcome Hibernate Shards!

Posted by    |       |    Tagged as Hibernate Shards

This is a pretty big day for the Hibernate family. We welcome three new top level projects:

  • Hibernate Shards
  • Hibernate Validator
  • Hibernate Search

So total we have made five new releases today .

Hibernate Shards 3.0.0 Beta1

Contributed by Google, Hibernate Shards is a horizontal partitioning solution built on top of Hibernate Core. When you need to distribute (shard) your data across multiple databases, Hibernate Shards is for you (too much data for a single database instance, regional deployment requirements, etc.) Like all Hibernate projects, Hibernate Shards is released under the LGPL license. Big thanks to Max Ross, Maulik Shah Tomislav Nad, and Google :-) for contributing back to the community their pretty impressive Google's 20 percent project.

Check the documentation for more information.

Hibernate Search 3.0.0.Beta1

Hibernate Search is now a top level project independent of Hibernate Annotations. New in this release:

  • out of the box index clustering through JMS - master/slaves model -(maximizing throughput)
  • asynchronous indexing (maximizing application response time)
  • indexing of embedded/associated objects and correlated queries (semantically similar to a SQL JOIN)
  • use of Apache Lucene(tm) 2.1.0 (lot's of performance and scalability improvements)

While marked as Beta because its scope is rapidly growing and some APIs are still subject to change, Hibernate Search is already used by quite a few people, check it out .

Hibernate Validator 3.0.0.GA

Hibernate Validator is also a new top level project independent of Hibernate Annotations. New in this release:

  • run with pure Java Persistence Provider (entity listener provided)
  • more business oriented validators

Check the website and the change log for more information.

Hibernate Annotations and Hibernate EntityManager 3.3.0.GA

A few minor configuration changes (necessary to introduce the previous projects) lead us to an version number increase. This version is however mostly backward compatible with 3.2.x. Some of the new features are listed:

  • transparent event wiring for Hibernate Validator and Hibernate Search
  • performance improvements during cascading in Hibernate EntityManager
  • more SQL customizations as well as fetching and lazy configurations
  • the usual bunch of bug fixes

The past few months have been pretty busy preparing this unique feature rich Hibernate bundle with smooth out of the box experience. Enjoy :-)

back to top