After 14 months of hard work, please welcome Hibernate Search 5 !
Let's have a look at the highlights of why you should be eager to upgrade:
- Upgraded to Lucene 4.10
- Lots of internal improvements, especially performance
- Thanks to Hibernate Search abstraction, most of your code should be upgradable easily despite the massive changes in Lucene APIs
- Numeric properties now indexed as NumericField by default
- Requires JDK 7
- Compatible with Hibernate ORM 4.3 and WildFly 8.x
- Stable
How to get it
Everything you need is available on Hibernate Search's web site. Download the full distribution from here. And don't hesitate to reach us in our forums.
If you are new to Hibernate Search, best is to start with our getting started guide.
Feature list
Let's dive into the feature list.
Lucene 4.10
Hibernate Search 4 has been stuck with the quite outdated 3.6.x version of Apache Lucene, while the Lucene 4 series is introducing lots of improvements. Lucene has now reached version 4.10.3 and is considered stable, reliable and significantly more efficient than previous versions; you can now benefit from all these improvements. Some APIs changed, you might need to make some adjustments to your code such as Analyzer class names, but generally if you were using the Hibernate Search API, the most tricky changes of Lucene are encapsulated and won't affect your code directly.
Why version 5.0
The major number was increased because the Lucene upgrade is a significant change, and because it forced us to break our API compatibility promise which we apply on minor versions. Don't assume that this will require Hibernate ORM at version 5 too: it still depends on Hibernate ORM versions 4.3.x (as did Hibernate Search 4.5) and is still compatible with WildFly 8, and we expect it will be compatible with WildFly 9 as well. It is possible that Hibernate Search 5 will be compatible with ORM version 5; we'll certainly aim for that, but cannot guarantee it.
So if you have an application using Hibernate ORM 4.3.x and Hibernate Search 4.5.x, it should be simple to upgrade as you won't have to upgrade ORM and can focus on changes needed for Search and Lucene only.
Indexing Performance
The indexing engine has been revisited, providing great performance enhancements and also simplifying configuration: you no longer need to configure a number of backend workers.
Both asynchronous indexing and synchronous indexing have been redesigned.
For the asynchronous indexing backend you now have a per-index index_flush_interval property which you can use to limit the time between your updates committed on the database and the related index commit.
The synchronous backend is now able to merge write requests from multiple parallel transactions so to provide both the benefits of batched writes on the index while still having synchronous updates. This new model allows to have performance similar to what was previously only possible when selecting the NRT backend, but doesn't have the drawbacks such as not being compatible with the Infinispan Directory.
OSGi, Apache Karaf, JBoss FUSE
The project code and build has been refactored to produce nice OSGi compatible libraries. We run integration tests with Apache Karaf so our artefacts should be safe to consume via JBoss FUSE. The Lucene jars are still a bit troublesome, but if you have any problem with it please let us know we might be able to find a solution.
JDK 7, 8 and 9 compatibility
Hibernate Search 5 now requires a Java 7 runtime, but we also test regularly with Java 8 and previews of Java 9.
Automatic bridge discovery for property conversion
For those developers defining custom domain types, it's now possible to automatically bind a given Java type to a FieldBridge. You won't have to copy/paste those @FieldBridge annotations all over your model. This feature is explained in the BridgeProvider section of the documentation. You could use it for example to contribute the missing converters for Java 8 Date/Time types.
MoreLikeThis queries
Using the new MoreLikeThis query capabilities you don't have to target specific fields but can provide an instance of an indexed object. This model is also known as query by example
and will trigger a similarity query matching all fields (or a subset of your choice).
A full exaxmple can be seen on this previous blog post.
Dropped dependency to Apache Solr
Until this version Hibernate Search depended on Apache Lucene for most of the work, and also on Lucene's sister project Apache Solr to provide a richer set of analyzers. Since the Lucene project incorporated this functionality from Solr, there is no longer any need to depend on Solr artifacts.
Improved modularity: clean WildFly integration
With requirements such as OSGi support, other projects like CapeDwarf and Infinispan integrating Hibernate Search (but excluding dependencies to Hibernate ORM), advanced needs for the Hibernate OGM project our integration API and modularity was extensively stretched and tested, resulting in lots of improvements which you might not directly notice, but will make it much easier to avoid dependency conflicts with any other library you might use, or integrate nicely in your favorite container / framework.
One example is the new structure of the modules we provide for easy WildFly integration: highly encapsulated, and significantly less dependencies than previous versions.
For example the JGroups backend can use a JGroups version of your choice, and it doesn't need to match the JGroups version of Infinispan even if Hibernate Search is using Infinispan as well (which depends on its own JGroups version); this will not be a problem, and JGroups wouldn't even be exposed to your application so in theory you could be using a third different version of the clustering library in your app directly. In practice you would probably want to keep the versions aligned, but if you prefer otherwise it won't be a problem.
Numeric Fields
Any numeric property, including Calendar and Date types, are now by default indexed as a NumericField. A NumericField is more efficient to perform range queries, so we think this is what you should be using in most cases. Of course it's still possible to explicitly annotate the property to revert to the old behaviour: this is just a change in the defaults.
Please keep this change in mind when running queries, as you'll now need to query these as a NumericField. If you use our Query builder DSL this is going to be correct transparently, but if you use the Lucene native APIs to create queries the results won't match and you won't get any kind of warning.
Migration Guide
We normally keep track of any API change in our wiki's migration guide; that's the right place to look for API / compatibility changes between any specific version.
For a summary of the changes for people jumping from version 4.x to 5.x, we created a new dedicated Migration page on the website which you can find from the Documentation page.
Index Migration
Technically it is possible that this latest version of Lucene could read your existing indexes, but with such a large version increase of Lucene's code, and considering the numeric mapping changes, and the many changes in the Analyzers over time, we highly recommend you replace your old indexes and use the MassIndexer to trigger a fresh rebuilt.
What's next?
We have several interesting plans ahead, but our priority is defined by feedback. Please let us know what you'd need, or even if it works great for you it's nice for us to hear about it and what you do with it. You can get in touch with us with any of these media, especially the forums should be a good starting point.
This is what we hope to work in the near future:
- dynamic defined models (not strictly bound to annotated classes)
- Alternatives to embedded Lucene backends: Apache Solr or ElasticSearch seem to be good candidates for this
- Support for the new Java 8 types
- Integration in WildFly 9
- Support for Forge
- Openshift / Docker / Kubernetes templates and guides
- Improve performance (Always!)
- Improved clustering functionality (master election?) on Infinispan/JGroups
- Take better advantage of the new Lucene 4 capabilities (Faceting, query-time join, etc..) Can you suggest?
This list is long, and I could easily expand. We could really user your help, especially as our small core team is not familiar with many of the other mentioned technologies: even if you don't feel like coding but are in the mood for bleeding edge testing that would be great.