Multi-tenancy in Hibernate

This has come up a few times so I thought I'd write up the ways to handle multi-tenancy in Hibernate. This is not an exhaustive list. We wont go into database vendor specific features (Oracle VPD, etc) for example. Generally speaking there are 3 ways to factor multi-tenancy into your database design:

Separate database instances - This approach gives each tenant their own physical database instance.
Separate schemas - This approach uses the same physical database instance for all the tenants, but each gets its own schema (or catalog) within that instance.
Partitioning - This approach uses the same database instance and same schema. In other words a single table holds the data for every tenant. The tenants are partitioned by some form of discriminator value.

The approaches to handling the first two are pretty much the same. So let's look at the third approach first as it requires a much different handling.

Partitioning

To be clear lets look at an example. Lets say that the application in question has a CUSTOMER table:

CUSTOMER (
  ID BIGINT,
  NAME VARCHAR,
  ...
  TENANT_ID VARCHAR
)

Notice the TENANT_ID column as it is the crux to this design. Basically it identifies which tenant the given row belongs to. Choosing this style of design has important ramifications which are beyond the scope of this discussion (like unique keys probably now need to include the TENANT_ID column, etc). For the purposes of this discussion, we are just concerned with how we identify rows as belonging to a particular tenant. Two choices for dealing with this are (1) use of Hibernate Shards or (2) use of the filter feature of Hibernate Core.

This approach has the distinct advantage of being capable of leveraging Hibernate second level caching. As we will see below, that is currently not possible with the other approaches.

Separate data

As I mentioned before, the first two options are pretty similar in terms of handling from JDBC, so therefore pretty similar in terms of handling from Hibernate. Going back to the CUSTOMER table, here we have:

CUSTOMER (
  ID BIGINT,
  NAME VARCHAR,
  ...
)

This time, we have no tenant discriminator as far as column. The discriminator comes from the fact of which tenant's database/schema we are looking at. Again getting into the pros and cons of this approach compared to partitioning is beyond the scope of this discussion. In terms of JDBC, this really just boils down to different connection urls that indicate the tenant we are dealing with at that time. So how can we get Hibernate to manage that for us?

One approach is to define a SessionFactory for each tenant. However, if you have large schemas and/or a large number of tenants and these SessionFactorys all reside in the same memory space, this approach can become very burdensome in terms of the memory footprint.

Another approach is to utilize a feature called application-supplied connections. Notice that from a SessionFactory you can open a Session using a Connecton you supply. However this can get unwieldy. A variation of this is for our application to tell Hibernate which Connection to use for the current context. Internally Hibernate makes use of an SPI contract named ConnectionProvider for obtaining Connections when it needs them. And although this contract does not account for passing in the tenant identifier it's pretty trivial to account for that using a ThreadLocal, JNDI/ENC or some other contextual and accessible manner. For the purpose of illustration, lets assume a DataSource JNDI names based on the tenant for look-ups and that the identifier of the current tenant is statically available from a custom TenantContext class:

public class MyTenantAwareConnectionProvider implements ConnectionProvider {
    public static final String BASE_JNDI_NAME_PARAM = "MyTenantAwareConnectionProvider.baseJndiName";

    private String baseJndiName;

    public void configure(Properties props) {
        baseJndiName = props.getProperty( BASE_JNDI_NAME_PARAM );
    }

    public Connection getConnection() throws SQLException {
        final String tenantId = TenantContext.getTenantId()
        final String tenantDataSourceName = baseJndiName + '/' + tenantId;
        DataSource tenantDataSource = JndiHelper.lookupDataSource( tenantDataSourceName );
        return tenantDataSource.getConnection();
    }

    public void closeConnection(Connection conn) throws SQLException {
        conn.close();
    }

    public boolean supportsAggressiveRelease() {
        // so long as the tenant identifier remains available in TL throughout, we can
        return true;
    }

    public close() {
        // currently nothing to do here
    }
}

The essential idea here is that Hibernate continues with what it normally does, but that we plug in a new behavior here so far as how it obtains connections in relation to our application's understanding of a current tenant. We are using a single SessionFactory and so get the benefit of the memory footprint of just one SessionFactory instead of one per tenant.

It was mentioned before, but bears repeating, that second level caching is problematic using this approach and should be disabled. The reason being that Hibernate does not know that Customer#1 from one tenant and Customer#1 from another tenant are actually different data. For that to work we'd have to encode the notion of tenant id into the cache key we use when storing into the second level cache. That has been discussed as an enhancement, but is not currently implemented.

In Relation To