Hibernate Search 7.1.0.Alpha2 is out

We just published Hibernate Search 7.1.0.Alpha2, a second alpha release of the next minor version of Hibernate Search.

This version brings more vector search capabilities and improvements as well as integrates Elasticsearch’s/OpenSearch’s vector search capabilities.

What’s new

Hibernate Search 7.1 is still in its early stages of development: some features are still incomplete or may change in a backward-incompatible way.

Dependency upgrades

Hibernate ORM (HSEARCH-5060): Hibernate Search now depends on Hibernate ORM 6.4.2.Final. Which brings among other things a couple of fixes to address possible issues with mass indexing when an ORM discriminator multi tenancy is in use.

Lucene (HSEARCH-5043): The Lucene backend now uses Lucene 9.9.1. Which beyond other improvements brings better performance for vector search.

Elasticsearch (HSEARCH-5032): The Elasticsearch backend now is also compatible with Elasticsearch 8.12 as well as with other versions that were already compatible.

Others

HSEARCH-5032: Upgrade to Elasticsearch client 8.12.0
HSEARCH-5057: Upgrade to AWS SDK 2.23.3
HSEARCH-5047: Upgrade to JBeret 2.2.0.Final

Vector search for Lucene and Elasticsearch Backends

This version of Hibernate Search builds on top of the previous alpha and integrates Elasticsearch/OpenSearch vector search capabilities. To recap: vector search provides the tools to search over binary (images, audio or video) or text data: external tools convert that data to vectors (arrays of bytes or floats, also called "embeddings"), which are then used for indexing and queries in Hibernate Search. Hibernate Search introduces a new field type — @VectorField and a new predicate knn, so that the vectors can be indexed and then searched upon.

Vector fields can work with vector data represented as byte or float arrays in the documents. Out of the box byte[] and float[] property types will work with the new field type. For any other entity property types, a custom value bridge or value binder should be implemented. Keep in mind that indexed vectors must be of the same length and that this length should be specified upfront for the schema to be created:

@Entity
@Indexed
public class Book {

    @Id
    private Integer id;

    @VectorField(dimension = 512)
    private float[] coverImageEmbeddings;

    // Other properties ...
}

Searching for vector similarities is performed via a knn predicate:

float[] coverImageEmbeddingsVector = /*...*/

List<Book> hits = searchSession.search( Book.class )
.where( f ->
    // provide the number of similar documents to look for:
    f.knn( 5 )
        // the name of the vector field:
        .field( "coverImageEmbeddings" )
         // matched documents will be the ones whose indexed vector
         // is "most similar" to this vector
        .matching( coverImageEmbeddingsVector )
    // additionally an optional filter can be supplied
    // to provide a regular fulltext search predicate
    .filter( f.match().field( "authors.firstName" ).matching( "arthur" ) )
).fetchHits( 20 );

By its nature a knn predicate will always try to find nearest vectors, even if the found vectors are quite far away from each other, i.e. are not that similar. This may lead to getting irrelevant results returned by the query.

To address this, a knn predicate allows configuring the minimum required similarity:

float[] coverImageEmbeddingsVector = /*...*/

List<Book> hits = searchSession.search( Book.class )
    .where( f ->
        // Create a knn predicate as usual:
        f.knn( 5 ).field( "coverImageEmbeddings" ).matching( coverImageEmbeddingsVector )
        // Specify the required minimum similarity value, to filter out irrelevant results:
        .requiredMinimumSimilarity( 5 ) )
    .fetchHits( 20 );

Note that each backend may have its own specifics and limitations with regard to the vector search. For more details look at the related documentation.

See this section of the reference documentation on vector fields and the one on a knn predicate for more information.

This version of Hibernate Search brings some renaming related to the vector search. In particular:

@VectorField#beamWidth becomes @VectorField#efConstruction as well as all related API/SPI methods (see HSEARCH-5056).
@VectorField#maxConnections becomes @VectorField#m as well as all related API/SPI methods (see HSEARCH-5056).
VectorSimilarity#INNER_PRODUCT becomes VectorSimilarity#DOT_PRODUCT and VectorSimilarity#MAX_INNER_PRODUCT is introduced to better align naming of vector similarity functions between backends (see HSEARCH-5038).

Looking up the capabilities of each field in the metamodel

It is now possible to see which capabilities (predicates/sorts/projections/etc.) are available for a field when inspecting the metamodel:

SearchMapping mapping = /*...*/ ;
// Retrieve a SearchIndexedEntity:
SearchIndexedEntity<Book> bookEntity = mapping.indexedEntity( Book.class );
// Get the descriptor for that index.
// The descriptor exposes the index metamodel:
IndexDescriptor indexDescriptor = bookEntity.indexManager().descriptor();

// Retrieve a field by name
// and inspect its capabilities if such field is present:
indexDescriptor.field( "releaseDate" ).ifPresent( field -> {
    if ( field.isValueField() ) {
        // Get the descriptor for the field type:
        IndexValueFieldTypeDescriptor type = field.toValueField().type();
        // Inspect the "traits" of a field type:
        // each trait represents a predicate/sort/projection/aggregation
        // that can be used on fields of that type.
        Set<String> traits = type.traits();
        if ( traits.contains( IndexFieldTraits.Aggregations.RANGE ) ) {
            // ...
        }
        if ( traits.contains( IndexFieldTraits.Predicates.EXISTS ) ) {
            // ...
        }
        // ...
    }
} );

Other improvements and bug fixes

HSEARCH-5034: Hibernate Search will allow passing BeanReference<? extends T> when registering beans to BeanConfigurationContext.
HSEARCH-5004: Hibernate Search will default to Hibernate ORM’s defaults instead of forcing SqlTypes.CHAR for the OutboxEvent/Agent ID.

And more. For a full list of changes since the previous releases, please see the release notes.

How to get this release

All details are available and up to date on the dedicated page on hibernate.org.

Getting started, migrating

For new applications, refer to the getting started guide:

For existing applications, Hibernate Search 7.1 is a drop-in replacement for 7.0, assuming you also upgrade the dependencies. Information about deprecated configuration and API is included in the migration guide.

Feedback, issues, ideas?

To get in touch, use the following channels:

hibernate-search tag on Stackoverflow (usage questions)
User forum (usage questions, general feedback)
Issue tracker (bug reports, feature requests)
Mailing list (development-related discussions)

In Relation To