Red Hat

In Relation To Hibernate Search

In Relation To Hibernate Search

Handling queries on complex types in Hibernate Search

Posted by Emmanuel Bernard    |       |    Tagged as Hibernate Search

Writing queries using complex types can be a bit surprising in Hibernate Search. For these multi-fields types, the key is to target each individual field in the query. Let’s discuss how this works.

What’s a complex type?

Hibernate Search lets you write custom types that take a Java property and create Lucene fields in a document. As long as there is a one property for one field relationship, you are good. It becomes more subtle if your custom bridge stores the property in several Lucene fields. Say an Amount type which has the numeric part and the currency part.

Let’s take a real example from a user using Infinispan's search engine - proudly served by Hibernate Search.

The FieldBridge
public class JodaTimeSplitBridge implements TwoWayFieldBridge {

    /**
     * Set year, month and day in separate fields
     */
    @Override
    public void set(String name, Object value, Document document, LuceneOptions luceneoptions) {
        DateTime datetime = (DateTime) value;
        luceneoptions.addFieldToDocument(
            name+".year", String.valueOf(datetime.getYear()), document
        );
        luceneoptions.addFieldToDocument(
            name+".month", String.format("%02d", datetime.getMonthOfYear()), document
        );
        luceneoptions.addFieldToDocument(
            name+".day", String.format("%02d", datetime.getDayOfMonth()), document
        );
    }

    @Override
    public Object get(String name, Document document) {
        IndexableField fieldyear = document.getField(name+".year");
        IndexableField fieldmonth = document.getField(name+".month");
        IndexableField fieldday = document.getField(name+".day");
        String strdate = fieldday.stringValue()+"/"+fieldmonth.stringValue()+"/"+fieldyear.stringValue();
        DateTime value = DateTime.parse(strdate, DateTimeFormat.forPattern("dd/MM/yyyy"));
        return String.valueOf(value);
    }

    @Override
    public String objectToString(Object date) {
        DateTime datetime = (DateTime) date;
        int year = datetime.getYear();
        int month = datetime.getMonthOfYear();
        int day = datetime.getDayOfMonth();
        String value = String.format("%02d",day)+"/"+String.format("%02d",month)+"/"+String.valueOf(year);
        return String.valueOf(value);
    }
}
The entity using the bridge
[...]
@Indexed
class BlogEntry {
    [...]

    @Field(store=Store.YES, index=Index.YES)
    @FieldBridge(impl=JodaTimeSplitBridge.class)
    DateTime creationdate;
}

Let’s query this field

A naive but intuitive query looks like this.

Incorrect query
QueryBuilder qb = sm.buildQueryBuilderForClass(BlogEntry.class).get();
Query q = qb.keyword().onField("creationdate").matching(new DateTime()).createQuery();
CacheQuery cq = sm.getQuery(q, BlogEntry.class);
System.out.println(cq.getResultSize());

Unfortunately that query will always return 0 result. Can you spot the problem?

It turns out that Hibernate Search does not know about these subfields creationdate.year, creationdate.month and creationdate.day. A FieldBridge is a bit of a blackbox for the Hibernate Search query DSL, so it assumes that you index the data in the field name provided by the name parameter (creationdate in this example).

We have plans in a not so future version of Hibernate Search to address that problem. It will only require you to provide a bit of metadata when you write such advanced custom field bridge. But that’s the future, so what to do now?

Use a single field

I am cheating here but as much as you can, try and keep the one property = one field mapping. Life will be much simpler to you. In this specific JodaTime type example, this is extremely easy. Use the custom bridge but instead of creating three fields (for year, month, day), keep it as a single field in the form of yyyymmdd.

Let’s again use our user real life solution.

A bridge using one field
public class JodaTimeSingleFieldBridge implements TwoWayFieldBridge {

    /**
     * Store the data in a single field in yyymmdd format
     */
    @Override
    public void set(String name, Object value, Document document, LuceneOptions luceneoptions) {
        DateTime datetime = (DateTime) value;
        luceneoptions.addFieldToDocument(
            name, datetime.toString(DateTimeFormat.forPattern("yyyyMMdd")), document
        );
    }


    @Override
    public Object get(String name, Document document) {
        IndexableField strdate = document.getField(name);
        return DateTime.parse(strdate.stringValue(), DateTimeFormat.forPattern("yyyyMMdd"));
    }

    @Override
    public String objectToString(Object date) {
        DateTime datetime = (DateTime) date;
        return datetime.toString(DateTimeFormat.forPattern("yyyyMMdd"));
    }
}

In this case, it would even be better to use a Lucene numeric format field. They are more compact and more efficient at range queries. Use luceneOptions.addNumericFieldToDocument( name, numericDate, document );.

The query above will work as expected now.

But my type must have multiple fields!

OK, OK. I won’t avoid the question. The solution is to disable the Hibernate Query DSL magic and target the fields directly.

Let’s see how to do it based on the first FieldBridge implementation.

Query targeting multiple fields
int year = datetime.getYear();
int month = datetime.getMonthOfYear();
int day = datetime.getDayOfMonth();

QueryBuilder qb = sm.buildQueryBuilderForClass(BlogEntry.class).get();
Query q = qb.bool()
    .must( qb.keyword().onField("creationdate.year").ignoreFieldBridge().ignoreAnalyzer()
                .matching(year).createQuery() )
    .must( qb.keyword().onField("creationdate.month").ignoreFieldBridge().ignoreAnalyzer()
                .matching(month).createQuery() )
    .must( qb.keyword().onField("creationdate.day").ignoreFieldBridge().ignoreAnalyzer()
                .matching(day).createQuery() )
   .createQuery();

CacheQuery cq = sm.getQuery(q, BlogEntry.class);
System.out.println(cq.getResultSize());

The key is to:

  • target directly each field,

  • disable the field bridge conversion for the query,

  • and it’s probably a good idea to disable the analyzer.

It’s a rather advanced topic and the query DSL will do the right thing most of the time. No need to panic just yet.

But in case you hit a complex type needs, it’s interesting to understand what is going on underneath.

Updated Roadmap for Hibernate Search

Posted by Sanne Grinovero    |       |    Tagged as Hibernate Search

The Hibernate Search project roadmap was quite outdated, so after some team chats on IRC and our developer’s mailing list I’ve summarized our plans on the project website.

What’s the plan?

Current 80% progress

Upgrade to Hibernate ORM 5

Coming in Hibernate Search 5.4, which is almost ready, and you could try it already.

Transactional improvements for the JMS backend

Also included in version 5.4, with improved documentation and configuration examples to follow probably in 5.4.

MassIndexer Improvements

The long standing limitation of transaction timeouts was finally resolved.

Apache Lucene 5 support

We’ve been working on this on separate branches, but we won’t merge it in 5.4 as this branch is stable now. The Lucene 5 update code doesn’t look too scary but we plan that to be 5.5 's highlight so that you can focus on one thing at a time: update to Hibernate ORM 5 now, and just after to Apache Lucene 5.

Neither upgrade is particularly complex, but I hope you’ll appreciate the effort to let you update step by step and take a break in between in case there is some surprise which needs extra attention.

Consider that the upgrade to Hibernate ORM 5 implies an upgrade to WildFly 10.

Near Future Intentions

Clustering, backends and reliability

We want to make it much simpler for you to use Hibernate Search in a clustered / cloud deployment. This has been possible since years, but the amount of requests for help are soaring and we have to admit that we can make several things easier. This is going to be an ongoing effort, with small improvements like the usage of JMSXGroupID already in 5.4, improved JMS examples in 5.5, and the driver for several of the improvements I’ve listed in the roadmap for 5.6…​ however depending on your help and feedback you could read that as "5.6 and beyond".

Java 8, Java 9, …​

With Java 8 very popular and Java 9 getting closer I hope that roadmap items such as out of the box indexing for the new Date/Time types don’t need any explanation. They could use some help though! We made it easy to register new bridges for any type so you can define your own custom types and define how they should best be indexed; this implies you can make an independent plug-in package for any type we don’t support yet, and if it’s a popular type like the ones from JSR 310 or JSR 354 we’d love to integrate it.

ElasticSearch and Apache Solr

There has been interest in such integrations for a while, but we failed to make any progress as the problem needs to be broken down in smaller steps. Several points on JIRA and the roadmap might seem only tangential but resolve roadblocks and pave the road to integrate Hibernate with such services. While we’re intimately familiar with Apache Lucene and these Lucene-as-a-service alternatives provide similar features, we don’t have direct experience with these, so some help will be appreciated. The plan is to prefer merging small iterative improvements over stalling development for months, so we’ll see several steps in a 5.6+ series and use that as the foundation to assess how we can deal with API inconsistencies across these slightly different backends.

Next!

After having all the above-listed new features nicely chiseled in our current stable API, I’m confident that to make the most of features like the ElasticSearch integration we’ll need to make changes to the API. This last refinement step will define Hibernate Search 6.0.

As always, these plans might need to change and we’ll always look forward to your suggestions. I’m unable to commit on dates or seasons; I would love to see this all happen before this winter but we’ll need your contribution for that to be realistic.

Hibernate Search is ready for Hibernate ORM 5

Posted by Sanne Grinovero    |       |    Tagged as Hibernate Search

Hibernate Search version 5.4.0.CR1 is now available! It was built and tested with Hibernate ORM 5.0.0.CR2, essentially it’s all ready for ORM 5 and we’ll just be waiting for this to be marked Final.

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-search-orm</artifactId>
    <version>5.4.0.CR1</version>
</dependency>
<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-core</artifactId>
    <version>5.0.0.CR2</version>
</dependency>
<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-entitymanager</artifactId>
    <version>5.0.0.CR2</version>
</dependency>

No longer timeouts when using the MassIndexer in a container

You can now have the MassIndexer set a different timeout for the internal transactions it will start, so if you’re running Hibernate Search in a container like WildFly you no longer have to make a choice between having a deadline of 5 minutes or changing the default timeout of the whole container.

fullTextSession
   .createIndexer( User.class )
   .transactionTimeout( 1800 ) //seconds
   .startAndWait();

Great improvements in the JMS backend

Transactional JMS backend

As explained in more detail in last week’s post, Hibernate Search now provides an option to include its indexing operations within the same transaction as the RDBMS.

In short, it’s enabled by setting this property:

hibernate.search.worker.enlist_in_transaction=true

But keep in mind: it’s a global setting! If you want to use it, all your backends shall be set to use an XA enabled, transactional JMS queue.

Please let us know if you have a great use case which would require us to allow some form of mixed mode.

The JMS message header JMSXGroupID

Hibernate Search will now use the specific JMSXGroupID message header and set it to the index name. That will allow users of the JMS backend to take advantage of message grouping.

Modules versions and running in WildFly

The first WildFly version to use Hibernate ORM 5 is version 10.0.0.Alpha5, which was released the past weekend. So the JBoss Modules we create for this appserver are targeting now WildFly 10, but at least version 10.0.0.Alpha5.

Where to download it from

Everything you need is available on Hibernate Search’s web site. Download the full distribution from here, or get it from Maven Central, and don’t hesitate to reach us in our forums

Hibernate Search, JMS and transactions

Posted by Emmanuel Bernard    |       |    Tagged as Hibernate Search

Hibernate Search sends the indexing requests in the post transaction phase. Until now. The JMS backend can now send its indexing requests transactionally with the database changes. Why is that useful? Read on.

A bit of context

When you change indexed entities, Hibernate Search collects these changes during the database transaction. It then waits for the transaction to be successful before pushing them to the backend.

Hibernate Search has a few backends:

  • lucene: this one uses Lucene to index the entities

  • JMS: this one sends a JMS message with the list of index changes. This JMS queue is then read by a master which uses the Lucene backend.

  • and a few more that are not interesting here

Running the backend after the transaction (in the afterTransaction phase to be specific) is generally what you want. Just to name a few reasons:

  • you don’t want index changes to be executed if you end up rollbacking the database transaction

  • you don’t necessarily want your database changes to fail because the indexing fails: you can always rebuild the index from your database.

  • and most backends don’t support transactions anyways

Hibernate Search lets you enlist an error callback so that you are notified of these indexing problems when they happen and react the way you want (log, raise an exception, retry etc).

So why make the JMS backend join the transaction

If you make the JMS backend join the transaction, then either the database changes happen and the JMS messages are received by the queue, or nothing happens (no database change and no JMS message).

The non transactional approach is still our recommended approach. But there are a few reasons why you want to go transactional.

No code to handle the message failure

It eliminates the need to write an error callback and handle this problematic case.

Simpler exploitation processes

It simplifies your exploitation processes. You can focus on monitoring your JMS queue (rates of messages coming in, rates of messages coming out) which will give you an accurate status of the health of Hibernate Search’s work.

Transactional mass indexing

When doing changes to lots of indexed entities, it is common to use the following pseudo pattern to avoiod OutOfMemoryException

for (int i = 0 ; i < workLoadSize ; i++) {
    // do changes
    if ( i % 500 == 0 ) {
        fullTextSession.flush();
        fullTextSession.flushToIndexes();
        fullTextSession.clear();
    }
}

If you use the transactional JMS backend, then all the message will be sent or none of them.

Make sure your JMS implementation and your JTA transaction manager are smart and don’t keep the messages in memory or you might face an OutOfMemoryException.

More consistent batching frameworks flow

If you use a batching framework like Spring Batch which keeps its "done" status in a database, you have a guarantee that changes, indexing requests and batch status are all consistent.

How to use the feature

This feature is now integrated in master and will be in an Hibernate Search release any time soon.

We kept the configuration as simple as possible. Simply add the following property

hibernate.search.worker.enlist_in_transaction=true

If you try and use this option on a non transactional backend (i.e. not JMS), Hibernate Search will yell at you.

Make sure to use a XA JMS queue and that your database supports XA as we are talking about coordinated transactional systems.

Many thanks to Yoann, one of our customers, who helped us refine the why and how of that feature.

Sanne is going to do a virtual JBoss User Group session Tuesday July 14th at 6PM BST / 5PM UTC / 1PM EDT / 10 AM PDT. He is going to talk about Lucene in Java EE.

He will also describe some projects dear to our heart. If you want to know what Hibernate Search, Infinispan bring to the Lucene table and how they use Lucene internally, that’s the event to be in!

Apache Lucene is the de-facto standard open source library for Java developers to implement full-text-search capabilities.

While it’s thriving in its field, it is rarely mentioned in the scope of Java EE development.

In this talk we will see for which features many developers love Lucene, make some concrete examples of common problems it elegantly solves, and see some best practices about using it in a Java EE stack.

Finally we’ll see how some popular OSS projects such as Hibernate ORM (JPA provider), WildFly (Java EE runtime) and Infinispan (in-memory datagrid, JCache implementor) actually provide great Lucene integration capabilities.

If you are interested, get some more info on Meetup and enlist.

When creating a bug report for any project within the Hibernate family, it’s extremely helpful (and, frankly, required) to have an adequate test case available. This is obviously important to make reproducing the issue as easy as possible. But it’s also vital longer-term. Nearly every bug fix should include a regression test, which is frequently based on the original reproducer (sometimes, it’s the reproducer, verbatim).

To help create useful test cases, we’re opening up a repo with various templates. Please see the READMEs in each project’s subdir for more info: Hibernate Test Case Templates

As a starting point, the repo contains two templates for ORM:

  • ORMUnitTestCase: By far, this one’s the most helpful. ORM includes a built-in unit test framework that does much of the heavy lifting for you. All that’s required is your entities, logic, and any necessary settings. Since we nearly always include a regression test with bug fixes, providing your reproducer using this method simplifies the process. We can then directly commit it, without having to mold it in first. What’s even better? Fork hibernate-orm itself, add your test case directly to a module’s unit tests (using the template class), then submit it as a PR!

  • ORMStandaloneTestCase: This template is standalone and will look familiar. It simply uses a run-of-the-mill ORM setup. Although it’s perfectly acceptable as a reproducer, lean towards ORMUnitTestCase whenever possible.

The eventual goal is to also include templates for Validator, Search, and OGM.

As always, this is open source for a reason! If the templates can be improved in any way, please let us know (either through our JIRA instance or through GitHub Issues). Better yet, send us a pull request!

Hibernate Search 5.3.0.Final now available!

Posted by Sanne Grinovero    |       |    Tagged as Hibernate Search

As suggested last week, today we released Hibernate Search version 5.3.0.Final.

Compared to the previous candidate release, the only changes are some minor clarifications in the documentation.

   <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-search-orm</artifactId>
      <version>5.3.0.Final</version>
   </dependency>

Faceting API changes

The new great faceting integration comes at a small migration cost: remember you now need to use the new @Facet annotation as explained in the the example of the previous post.

What's next?

Barring some maintenance needs on this branch 5.3, we have no plans of other Hibernate Search releases to target Hibernate ORM 4.3.x. The focus is now on Hibernate 5 compatibility.

  • Artefact jars are available on Maven Central under the GAV org.hibernate:hibernate-search-orm:5.3.0.Final
  • Tarballs and zip bundles can be downloaded from our website
  • Feedback is welcome on the forums and emails, IRC

For those of you using Hibernate ORM version 5.0.0.CR1, you can now use the freshly released Hibernate Search 5.4 version 5.4.0.Alpha1.

What's new

Absolutely nothing! This Hibernate Search version is identical in terms of features and API to version 5.3.0.CR1: this should make it easier for you all to upgrade the Hibernate ORM libraries (hibernate-core, hibernate-entitymanager,..) without the distraction of changes because of Hibernate Search: focus on the changes you'll need to apply because of the major version upgrade of Hibernate (if any, as it's not too complex at all).

WildFly compatibility and JBoss Modules

With every release of Hibernate Search we normally also release a set of modules to run the latest version of it on WildFly, but in this case since the updated Hibernate ORM 5 integrations for WildFly have yet to be released, we skipped this step. Fear not, the WildFly integration will be finished soon and we'll then resume releasing such module packs as usual. Not least, this very same version of Hibernate Search will soon be available in WildFly 10, so the modules missing today won't actually be needed at all.

This is a great time to try Hibernate 5

While the latest polish is performed on Hibernate 5, we're all looking forward for feedback from you. It is likely that some more changes will be done, but we consider it good enough already to not expect any regression so please try it and let us know! We're at that sweet spot in which you can still propose changes without the chains of strong API compatibility requirements, but good enough for you to not be wasting time on a quickly changing target.

Versions reminder

This version of Hibernate Search requires:

  • Hibernate ORM 5.0.0.CR1
  • Apache Lucene 4.10.x
  • Java SE 7

Our rules and conventions for versions and compatibility are documented here on the GitHub Wiki.

  • Artefact jars are available on Maven Central under the GAV org.hibernate:hibernate-search-orm:5.4.0.Alpha1
  • Zip and tar bundles are available via our website
  • Feedback is welcome on the forums and emails, IRC

Tonight we released Hibernate Search version 5.3.0.CR1 (candidate release).

We consider this stable, and besides some pending documentation improvements, we'll re-publish the same implementation as 5.3.0.Final in ten days. Last chance to provide feedback! Please try it out.

Brand-new but proven faceting technology

The technology which Hibernate Search uses under the hood to implement this amazing feature has been improved and polished in various years by the Apache Lucene team. So far Hibernate Search has been using an old and proven technique but this had several limitations.

We believe the new implementation is now mature enough for our users too, and probably should have switched earlier if we didn't have other subjects to attack.

From a user's perspective, there are less limitations and the API is the same, with one migration catch: please remember you now need to opt-in the fields you want to use for faceting explicitly! For that, use the @Facet annotation as shown in the example of the previous post and in the Faceting reference documentation.

What's next?

Our immediate focus is to release am experimental tag to support Hibernate ORM 5: remember, Hibernate Search versions from 4.5 up to (and including) 5.3 are only compatible with Hibernate ORM versions 4.3.x.

Of course, we're also looking forward for feedback and will try and schedule any issues you encounter on the latest versions to be fixed ASAP. Suggestions and contributions are welcome as well!

  • Artefact jars are available on Maven Central under the GAV org.hibernate:hibernate-search-orm:5.3.0.CR1
  • Zip and tar bundles are available via our website
  • Feedback is welcome on the forums and emails, IRC

Following on the heels of 5.2.0.Final, Hibernate Search 5.3.0.Beta1 is now out. This time the faceting engine got an overhaul.

This work was long overdue, because there were several shortcomings with the existing implementation. For example, there were limitations with *ToMany associations. Also, the implementation was based on a custom Lucene Collector making use of the FieldCache API. FieldCache will be removed in Lucene 5, so updating the faceting API was also a requirement for upgrading to Lucene 5 in the near future.

What has changed? Actually not much when it comes to the API Hibernate Search exposes. You still create your FacetingRequest using QueryBuilder.facet().... You then enable the facet search by passing it to the FacetManager from which you also then retrieve the list of Facet instances after the query was executed. All this is unchanged and documented in the online documentation.

A few things have changed though. Most notably, you will need to tell Hibernate Search now which properties are used for faceting. You do so by adding @Facet (resp. @Facets) to these properties. The reason for this is, that under the hood the implementation is now based on Lucene's dynamic faceting capabilties. For this to work, we need to index the facet values using the appropriate DocValues type (SortedSetDocValuesFacetField, NumericDocValuesField or DoubleDocValuesField). Below we see the use of the @Facet(s) annotation:

    @Indexed
    public class Quux {
        @DocumentId
        private Integer id;
        
        @Field(analyze = Analyze.NO),
        @Facets({
                @Facet,
                @Facet(name = "string_facet_value", encoding = FacetEncodingType.STRING)
        })
        private double value;
    }
Notice that in this example the value field is configured with two facet annotations. The reason is, that per default numbers will be stored using numeric DocValues types (NumericDocValuesField and DoubleDocValuesField), whereas all other types use string based SortedSetDocValuesFacetField. Numeric values can then only be used with a range facet whereas discrete facets require string values. In the case where you want to use discrete faceting on a numeric field (for example if the field only contains a fixed number of possible values) FacetEncodingType.STRING needs to be used.

This is inline with the fact that Hibernate Search 5.x indexes numbers now per default numerically (see this blog).

A final caveat - there was a change in the default behaviour of includeZeroCounts as part of a facet request. The default was to include zero counts, but has changed now to not include it. Instead it must be explicitly specified (calculating zero counts for discrete facets comes with a performance penalty!):

    FacetingRequest request = queryBuilder( Car.class ).facet()
            .name( "quuxFacetRequest" )
            .onField( "string_facet_value" )
            .discrete()
            .includeZeroCounts( true )
            .createFacetingRequest();

Release info for Hibernate Search 5.3.0.Beta1

  • Full change log is available on JIRA
  • Artefact jars are available on Maven Central under the GAV org.hibernate:hibernate-search-orm:5.3.0.Beta1
  • Zip and tar bundles on SourceForge

Happy faceting!

back to top