ORM Search Validator Reactive Repositories Tools Others

Hibernate Search 4.4.0.Beta1 is ready for downloads! You can get it either from Maven repositories or from Sourceforge.

Index Sharding

Sharding is a common practice among Apache Lucene users, and Hibernate Search supports it since years. It means that we split the index storage in multiple Lucene indexes, while hiding the logical complexity. This is most commonly used to:

Keep individual index sizes reasonable: handy for backups and performance
Specialize individual indexes for different language / terminology (more on this below)
Separate master nodes to scale writing throughput on multiple nodes
Legal requirements to store some data in physically independent media

So far however you would need to configure the number of shards you need in the Hibernate Search configuration, basically requiring advance knowledge of which shards your application would use.

Dynamic Sharding

With the new feature added in this 4.4.0.Beta1 release you don't have to know in advance which shards you might need at runtime. So for example if you are sharding your entities according to description languages, just storing an entity in a new language can trigger the creation of the new index infrastructure, on the fly.

All details can be found on the reference documentation.

With the previous Sharding feature, which we now call static sharding and is deprecated, you might have been used to deal with an array of indexes. Shards where identified by their position in the array. In the new model, shards are identified by a name: a simple String which maps to their IndexManager name.

Implementors will need to create a ShardIdentifierProvider, which fullfills the following needs:

Discover existing shards at boot time

Since the shards are not defined in the configuration, you need to provide a list of known shards via some code. A new mechanism was setup to allow for example to query the database using an Hibernate Session during the initialization phase. See also the AnimalShardIdentifierProvider example implementation.

Discover new shards at runtime

The second operation that a ShardIdentifierProvider needs to provide, is to watch for new shard identifiers and eventually notify the framework.

List the known shard identifiers

Finally the ShardIdentifierProvider implementation will need to keep the record of known shard names; that requires a bit of concurrent code, hopefully the example is going to be of inspiration.

Optionally you can also make your implementation really smart by watching for your custom FullTextFilters being applied to queries, to narrow down to which shards a query should be executed on. See more at Using filters in a sharded environment.

In Relation To

Hibernate Search 4.4.0.Beta1: Index Sharding is now dynamic

Index Sharding

Dynamic Sharding

Discover existing shards at boot time

Discover new shards at runtime

List the known shard identifiers

More links

Projects

Follow us

Contribute and community