Hibernate Search 6.1.0.Alpha1 is out, now with asynchronous, distributed automatic indexing!

We just published Hibernate Search 6.1.0.Alpha1, an alpha release of the next minor version of Hibernate Search.

The main feature of this new version is a new concept of "coordination" to perform automatic indexing in an asynchronous, distributed way. It allows for a new architecture where several risks of out-of-sync indexes are eliminated, and the overhead of automatic indexing on application threads is reduced significantly.

Beyond that, 6.1.0.Alpha1 also includes upgrades to newer versions of Hibernate ORM, Lucene, and Elasticsearch, OpenSearch compatibility, search DSL improvements, conditional mass indexing and more.

What’s new

Hibernate Search 6.1 is still in development: some features are still incomplete or may change in a backward-incompatible way.

Dependency upgrades

Hibernate ORM (HSEARCH-4279): Hibernate Search 6.1 relies on Hibernate ORM 5.5.
Lucene (HSEARCH-4169): The Lucene backend now uses Lucene 8.8.

Elasticsearch (HSEARCH-4235): The Elasticsearch backend now works with Elasticsearch 5.6, 6.8 or 7.13.

Asynchronous, distributed automatic indexing

While it’s technically possible to use Hibernate Search 6.0 and Elasticsearch in distributed applications, it suffers from a few limitations.

The main goal of Hibernate Search 6.1 is to eliminate these limitations by introducing coordination between application nodes to implement asynchronous, distributed automatic indexing.

In HSEARCH-3280, we introduce the very first coordination strategy; more should follow in later versions of Hibernate Search (e.g. HSEARCH-3513). This strategy creates a table in the database to push entity change events to, and relies on a background processor to consume these events and perform automatic indexing.

Clustered architecture with database polling and Elasticsearch backend

Beside eliminating the limitations mentioned above, another advantage of this strategy is that Hibernate Search will no longer trigger lazy-loading or build documents in application threads, which can improve the responsiveness of applications (less work to do on commit).

To learn more about an architecture based on database-polling coordination, head to this section of the documentation. You can also get a quick overview of several architectures here.

To jump right in and try the strategy in a single-node application, just set the following property (you will also need to add tables to your database schema):

hibernate.search.coordination.strategy = database-polling

For multi-node applications, only a fixed number of nodes is supported at the moment, but a more dynamic setup will be supported before 6.1.0.Final is released. Head to this section of the documentation for more information on how to configure coordination.

The database-polling coordination strategy can perfectly well be used with a Lucene backend.

You will still be limited to a single application node, but you will benefit from all the other advantages (data safety, increased application responsiveness, …).

OpenSearch compatibility

Since HSEARCH-4212, Hibernate Search is also compatible with OpenSearch 1.0, the Apache 2.0 licensed fork of Elasticsearch, and regularly tested against it.

To use Hibernate Search with OpenSearch, use the same Maven artifacts, configuration and API that you would have used with Elasticsearch.

The only (minor) difference between using Elasticsearch and OpenSearch is if you configure the Elasticsearch version explicitly: with OpenSearch, you need to prefix the version with opensearch:, e.g. opensearch:1.0.

Search DSL improvements

New terms predicate (HSEARCH-2589)

Matches documents for which a given field contains some terms, any or all of them.

Useful for enum-typed fields, in particular.

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.terms().field( "genre" )
                .matchingAny( Genre.CRIME_FICTION, Genre.SCIENCE_FICTION ) )
        .fetchHits( 20 );

New regexp predicate (HSEARCH-3884)

Matches documents for which a given field contains a word matching the given regular expression.

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.regexp().field( "description" )
                .matching( "r.*t" ) )
        .fetchHits( 20 );

New id projection (HSEARCH-4142)

Returns the identifier of the matched entity.

List<Integer> hits = searchSession.search( Book.class )
        .select( f -> f.id( Integer.class ) )
        .where( f -> f.matchAll() )
        .fetchHits( 20 );

Configurable .missing() behavior for distance sort (HSEARCH-3863)

Distance sorts now allow specifying the behavior when encountering documents with missing values (though only .missing().first()/.missing().last() are supported with Elasticsearch).

GeoPoint center = GeoPoint.of( 47.506060, 2.473916 );
List<Author> hits = searchSession.search( Author.class )
        .where( f -> f.matchAll() )
        .sort( f -> f.distance( "placeOfBirth", center )
                .missing().first() )
        .fetchHits( 20 );

Relative field paths (HSEARCH-4245)

The Search DSL now allows creating factories (SearchPredicateFactory, etc.) that accept relative field paths.

This is mostly useful if you pass factories to reusable methods.

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool()
                .should( f.nested().objectField( "writers" )
                        .nest( matchFirstAndLastName(
                                f.withRoot( "writers" ),
                                "bob", "kane" ) ) )
                .should( f.nested().objectField( "artists" )
                        .nest( matchFirstAndLastName(
                                f.withRoot( "artists" ),
                                "bill", "finger" ) ) ) )
        .fetchHits( 20 );

private SearchPredicate matchFirstAndLastName(SearchPredicateFactory f,
        String firstName, String lastName) {
    return f.bool()
            .must( f.match().field( "firstName" )
                    .matching( firstName ) )
            .must( f.match().field( "lastName" )
                    .matching( lastName ) )
            .toPredicate();
}

Conditional mass indexing

HSEARCH-499 introduces the ability to apply the mass indexer to a subset of your entities, based on an HQL/JPQL "where" clause.

SearchSession searchSession = Search.session( entityManager );
MassIndexer massIndexer = searchSession.massIndexer();
massIndexer.type( Book.class ).reindexOnly( "e.publicationYear <= 2100" );
massIndexer.type( Author.class ).reindexOnly( "e.birthDate < :birthDate" )
        .param( "birthDate", LocalDate.ofYearDay( 2100, 77 ) );
massIndexer.startAndWait();

Named predicates

HSEARCH-3325 adds named predicates, a way to define the search logic as part of a custom binder/bridge.

This is, in a way, the comeback of the "full-text filters" of Hibernate Search 5.

Many thanks to Waldemar Kłaczyński for his work on this feature!

Custom ES index settings

Since HSEARCH-3934, you can provide Hibernate Search with JSON files containing the desired settings of your indexes, and Hibernate Search will automatically push these settings when it creates/updates the indexes.

# To configure the defaults for all indexes:
hibernate.search.backend.schema_management.settings_file = custom/index-settings.json
# To configure a specific index:
hibernate.search.backend.indexes.<index name>.schema_management.settings_file = custom/index-settings.json

Access to Lucene’s `IndexReader`

Thanks to HSEARCH-4065, you can now retrieve an IndexReader when using the Lucene backend:

SearchMapping mapping = Search.mapping( entityManagerFactory );
LuceneIndexScope indexScope = mapping
        .scope( Book.class ).extension( LuceneExtension.get() );
try ( IndexReader indexReader = indexScope.openIndexReader() ) {
    // work with the low-level index reader:
    numDocs = indexReader.numDocs();
}

While generally not necessary, this can be useful for advanced, low-level operations.

Lucene low-level hit caching

Since HSEARCH-3880, Hibernate Search allows configuring the QueryCache and QueryCachingPolicy in the Lucene backend, adding one more performance tweak for advanced Lucene users.

Many thanks to Waldemar Kłaczyński for his work on this feature!

Other improvements and bug fixes

HSEARCH-2599: The configuration of the HTTP client for Elasticsearch can now be customized through a dedicated API.
HSEARCH-3608: Binders (bridges) can now be passed simple, string parameters that do not require custom annotations.
HSEARCH-3771: Mass indexing now supports Hibernate ORM’s dynamic-map entity types.
HSEARCH-3878: Mass indexing now maximizes utilization of database connections, performing up to twice as much loading with the same amount of connections.

Note that throughput will not double, since other bottlenecks exist, but it may get a slight boost.
HSEARCH-4138, HSEARCH-4139: Various performance improvements to automatic indexing.
HSEARCH-4148: @IndexingDependency(derivedFrom = …) can now be applied to implementations of abstract methods on entity types.
HSEARCH-4163: Multi-tenancy no longer requires explicit settings.
HSEARCH-4303: In the right circumstances, insertion/deletion of a contained entity will now trigger reindexing of the containing entity even without the corresponding update in the containing entity, like it used to in Hibernate Search 5.

And more. For a full list of changes since the previous releases, please see the release notes.

How to get this release

All details are available and up to date on the dedicated page on hibernate.org.

Getting started, migrating

For new applications, refer to the getting started guide.

For existing applications, Hibernate Search 6.1 is a drop-in replacement for 6.0, assuming you also upgrade the dependencies. Information about deprecated configuration and API is included in the migration guide.

Feedback, issues, ideas?

To get in touch, use the following channels:

hibernate-search tag on Stackoverflow (usage questions)
User forum (usage questions, general feedback)
Issue tracker (bug reports, feature requests)
Mailing list (development-related discussions)

In Relation To