We just published Hibernate Search 7.1.0.Alpha2, a second alpha release of the next minor version of Hibernate Search.
This version brings more vector search capabilities and improvements as well as integrates Elasticsearch’s/OpenSearch’s vector search capabilities.
What’s new
Hibernate Search 7.1 is still in its early stages of development: some features are still incomplete or may change in a backward-incompatible way. |
Dependency upgrades
- Hibernate ORM (HSEARCH-5060)
-
Hibernate Search now depends on Hibernate ORM 6.4.2.Final. Which brings among other things a couple of fixes to address possible issues with mass indexing when an ORM discriminator multi tenancy is in use.
- Lucene (HSEARCH-5043)
-
The Lucene backend now uses Lucene 9.9.1. Which beyond other improvements brings better performance for vector search.
- Elasticsearch (HSEARCH-5032)
-
The Elasticsearch backend now is also compatible with Elasticsearch 8.12 as well as with other versions that were already compatible.
- Others
-
-
HSEARCH-5032: Upgrade to Elasticsearch client 8.12.0
-
HSEARCH-5057: Upgrade to AWS SDK 2.23.3
-
HSEARCH-5047: Upgrade to JBeret 2.2.0.Final
-
Vector search for Lucene and Elasticsearch Backends
This version of Hibernate Search builds on top of the previous alpha and integrates Elasticsearch/OpenSearch vector search capabilities.
To recap: vector search provides the tools to search over binary (images, audio or video) or text data:
external tools convert that data to vectors (arrays of bytes or floats, also called "embeddings"),
which are then used for indexing and queries in Hibernate Search.
Hibernate Search introduces a new field type — @VectorField
and a new predicate knn
, so that the vectors can be indexed
and then searched upon.
Vector fields can work with vector data represented as byte
or float
arrays in the documents.
Out of the box byte[]
and float[]
property types will work with the new field type. For any other entity property types,
a custom value bridge
or value binder should be implemented.
Keep in mind that indexed vectors must be of the same length
and that this length should be specified upfront for the schema to be created:
@Entity
@Indexed
public class Book {
@Id
private Integer id;
@VectorField(dimension = 512)
private float[] coverImageEmbeddings;
// Other properties ...
}
Searching for vector similarities is performed via a knn
predicate:
float[] coverImageEmbeddingsVector = /*...*/
List<Book> hits = searchSession.search( Book.class )
.where( f ->
// provide the number of similar documents to look for:
f.knn( 5 )
// the name of the vector field:
.field( "coverImageEmbeddings" )
// matched documents will be the ones whose indexed vector
// is "most similar" to this vector
.matching( coverImageEmbeddingsVector )
// additionally an optional filter can be supplied
// to provide a regular fulltext search predicate
.filter( f.match().field( "authors.firstName" ).matching( "arthur" ) )
).fetchHits( 20 );
By its nature a knn
predicate will always try to find nearest vectors,
even if the found vectors are quite far away from each other, i.e. are not that similar.
This may lead to getting irrelevant results returned by the query.
To address this, a knn
predicate allows configuring the minimum required similarity:
float[] coverImageEmbeddingsVector = /*...*/
List<Book> hits = searchSession.search( Book.class )
.where( f ->
// Create a knn predicate as usual:
f.knn( 5 ).field( "coverImageEmbeddings" ).matching( coverImageEmbeddingsVector )
// Specify the required minimum similarity value, to filter out irrelevant results:
.requiredMinimumSimilarity( 5 ) )
.fetchHits( 20 );
Note that each backend may have its own specifics and limitations with regard to the vector search. For more details look at the related documentation.
See this section of the reference documentation on vector fields
and the one on a knn
predicate
for more information.
This version of Hibernate Search brings some renaming related to the vector search. In particular:
|
Looking up the capabilities of each field in the metamodel
It is now possible to see which capabilities (predicates/sorts/projections/etc.) are available for a field when inspecting the metamodel:
SearchMapping mapping = /*...*/ ;
// Retrieve a SearchIndexedEntity:
SearchIndexedEntity<Book> bookEntity = mapping.indexedEntity( Book.class );
// Get the descriptor for that index.
// The descriptor exposes the index metamodel:
IndexDescriptor indexDescriptor = bookEntity.indexManager().descriptor();
// Retrieve a field by name
// and inspect its capabilities if such field is present:
indexDescriptor.field( "releaseDate" ).ifPresent( field -> {
if ( field.isValueField() ) {
// Get the descriptor for the field type:
IndexValueFieldTypeDescriptor type = field.toValueField().type();
// Inspect the "traits" of a field type:
// each trait represents a predicate/sort/projection/aggregation
// that can be used on fields of that type.
Set<String> traits = type.traits();
if ( traits.contains( IndexFieldTraits.Aggregations.RANGE ) ) {
// ...
}
if ( traits.contains( IndexFieldTraits.Predicates.EXISTS ) ) {
// ...
}
// ...
}
} );
Other improvements and bug fixes
-
HSEARCH-5034: Hibernate Search will allow passing
BeanReference<? extends T>
when registering beans toBeanConfigurationContext
. -
HSEARCH-5004: Hibernate Search will default to Hibernate ORM’s defaults instead of forcing
SqlTypes.CHAR
for theOutboxEvent
/Agent
ID.
And more. For a full list of changes since the previous releases, please see the release notes.
How to get this release
All details are available and up to date on the dedicated page on hibernate.org.
Getting started, migrating
For new applications, refer to the getting started guide:
For existing applications, Hibernate Search 7.1 is a drop-in replacement for 7.0, assuming you also upgrade the dependencies. Information about deprecated configuration and API is included in the migration guide.
Feedback, issues, ideas?
To get in touch, use the following channels:
-
hibernate-search tag on Stackoverflow (usage questions)
-
User forum (usage questions, general feedback)
-
Issue tracker (bug reports, feature requests)
-
Mailing list (development-related discussions)