Hibernate Search CR1: JBoss Modules, Spatial improvements, NRT boosting

Hibernate Search v.4.2.0.CR1 has just been released. Mostly bug fixes and performance improvements but also an easier way to deploy in JBoss AS 7 or JBoss EAP 6.

JBoss modules

In JBoss 7 rather than a flat lib directory you'll find modules; each module is organized in its own directory and explicitly defines which jars are exported and which dependencies it has on other modules. It is now possible to bundle Hibernate Search as a JBoss Module, and add it to the other modules of The Application Server.

This gets you smaller and quicker deployments, the ability to share the same artifacts across multiple applications. Other projects depending on Hibernate Search Engine might depend on it, like Infinispan, TorqueBox, ModeShape, CapeDwarf, Cloud-TM, ...(and let us know which others!)

How to benefit from it in your applications? Provided you're deploying on JBoss 7.1.x or EAP6, instead of adding all the Hibernate Search jars to your EAR you can download the pre-packaged module from either:

Sourceforge
Maven: org.hibernate:hibernate-search-modules

Unpack the modules in your JBoss AS directory, this will create modules for Hibernate Search, Apache Lucene and some useful Solr libraries. Hibernate Search is split in two modules:

org.hibernate.search.orm:main For users of Hibernate Search with Hibernate; this will transitively include Hibernate ORM
org.hibernate.search.engine:main For other projects depending just on the internal indexing engine and do now wish other dependencies to Hibernate

Using Manifest

The simplest way to have your application declare a dependency on the Hibernate Search module; just add a single attribute to your application's Manifest:

Dependencies: org.hibernate.search.orm services

Using jboss-deployment-structure.xml

In alternative to the manifest, an XML file can be used: add WEB-INF/jboss-deployment-structure.xml with content:

<jboss-deployment-structure>
    <deployment>
        <dependencies>
            <module name="org.hibernate.search.orm" services="export" />
        </dependencies>
    </deployment>
</jboss-deployment-structure>

More details about modules are described in Class Loading in AS7.

(this is just a new packaging and doesn't affect any library code: assembling applications in the older traditional style is still supported)

Spatial indexing improvements

After the initial round of feedback we changed some methods and hopefully made it easier, at the cost of changing some class names.

Grid to QuadTree

We dropped the grid term in favour of QuadTree, as it better reflects the internals of the algorithm. This is mostly an internal detail, but if you were using any class with the Grid postfix you should be able to find a QuadTree equivalent now.

@Longitue and @Latitude

The attribute names of these annotations changed to be more readable.

Performance

I don't know if these affect you, but if they happen to address your use case the performance boost can be very significant, up to two orders of magnitude.

Near-Real-Time for write-mostly scenarios

I always expected most users would face a read-mostly use case, and the Near-Real-Time IndexManager implementation was biased by this belief: it used to prepare in advance an IndexReader after each write was applied. This is now prepared on-demand and only if needed, so applications who have bursts of write activity will be much faster when using this backend.

At the same time, Lucene 3.6 is behaving slightly differently compared to previous versions so we now have the NRT backend try hard to avoid flushing delete operations unless they are necessary, this also provides a significant performance boost for some scenarios.

The combination of these changes might make it suitable to track visitors activity to perform efficient data mining activities on the index.

MassIndexer

The MassIndexer now might benefit from fetch size hinting to the JDBC driver; this mostly depends on your driver implementation: some ignore it, but shouldn't hurt to try. In my tests on MySQL 5.5 it appeared to provide a small but consistent improvement of around 3.1% indexing time.

Apache Lucene versions

The Apache Lucene community released version 3.6.2 so that's what we target now. This is mostly a bugfix release: no APIs where changed so if you wish so you can stick to versions 3.6 or 3.6.1.

Thanks

Special thanks to Nicolas Helleringer for implementing the new Spatial features and bearing with me, to Michael Simons for carefully reviewing and fixing the Spatial documentation and his feedback to the API, to Guillaume Smet and his team for the very brave debugging and fixing of the MassIndexer.

The usual links:

Download it from Sourceforge or via Maven artifacts
Get in touch on the forums or on the mailing list, or join us for a chat on IRC
Get the spotlight in the next release: have a look at JIRA and get the code from GitHub

In Relation To