Hibernate Search 4.4.0.Beta1 is ready for downloads! You can get it either from Maven repositories or from Sourceforge.
Index Sharding
Sharding is a common practice among Apache Lucene users, and Hibernate Search supports it since years. It means that we split the index storage in multiple Lucene indexes, while hiding the logical complexity. This is most commonly used to:
- Keep individual index sizes reasonable: handy for backups and performance
- Specialize individual indexes for different language / terminology (more on this below)
- Separate master nodes to scale writing throughput on multiple nodes
- Legal requirements to store some data in physically independent media
So far however you would need to configure the number of shards you need in the Hibernate Search configuration, basically requiring advance knowledge of which shards your application would use.
Dynamic Sharding
With the new feature added in this 4.4.0.Beta1 release you don't have to know in advance which shards you might need at runtime. So for example if you are sharding your entities according to description languages, just storing an entity in a new language can trigger the creation of the new index infrastructure, on the fly.
All details can be found on the reference documentation.
With the previous Sharding feature, which we now call static sharding
and is deprecated, you might have been used to deal with an array of indexes. Shards where identified by their position in the array. In the new model, shards are identified by a name: a simple String which maps to their IndexManager name.
Implementors will need to create a ShardIdentifierProvider, which fullfills the following needs:
Discover existing shards at boot time
Since the shards are not defined in the configuration, you need to provide a list of known shards via some code. A new mechanism was setup to allow for example to query the database using an Hibernate Session during the initialization phase. See also the AnimalShardIdentifierProvider example implementation.
Discover new shards at runtime
The second operation that a ShardIdentifierProvider needs to provide, is to watch for new shard identifiers and eventually notify the framework.
List the known shard identifiers
Finally the ShardIdentifierProvider implementation will need to keep the record of known shard names; that requires a bit of concurrent code, hopefully the example is going to be of inspiration.
Optionally you can also make your implementation really smart by watching for your custom FullTextFilters being applied to queries, to narrow down to which shards a query should be executed on. See more at Using filters in a sharded environment.
More links
As usual the issue tracker is JIRA and all code is on GitHub: pull requests and feedback welcome.
For a detailed list of all changes in this release, see the release notes.
The next goal is to work towards a 4.4.0.Final release. If you can help us getting there fast, then we'll finally branch towards the next mayor release and start the transformations needed to support Apache Lucene version 4.