Red Hat

In Relation To Hibernate Search

In Relation To Hibernate Search

In the last couple of months we’ve been working to upgrade Hibernate Search to use Apache Lucene 5, to keep up with the greatest releases from the Lucene community.

Today Hibernate Search 5.5.0.Alpha1 is available!

As the version suggests this is the first cut, but we’ll also need your feedback and suggestions to better assess the needed steps to evolve this into a great, efficient and stable Final release.

A stable API

As Uwe Schindler - a great developer from the Lucene community - had suggested me to try, it turns out it was possible to update from Apache Lucene version 4 to 5 (a major upgrade) in a minor upgrade of Hibernate Search: many things improved, some things are slightly different, but overall the API was not affected by any significant change.

You might be affected by some minor changes of the Lucene API if you were writing power-use extensions to Hibernate Search, in that case:

Please let us know if there are difficulties in upgrading: we can try to improve on that, or at least improve the migration guide. If you don’t tell us, we might not know and many other developers will have a hard time!

Improved Performance

Several changes in the Apache Lucene code relate to improved efficiency, in particular of memory usage.

However our aim for this release was functionality and correctness; we didn’t have time to run performance tests, and to be fair even if we had the time nothing beats feedback from the community of users.

We would highly appreciate your help in trying it out and run some performance tests, especially if you can compare with previous versions of Hibernate Search (based on earlier versions of Apache Lucene).

If you think you can try this, it’s a unique opportunity to both contribute and to get our insight and support to analyse and understand your performance reports!

Sorting and Faceting: use the right types!

It appears that when using Apache Lucene 4, if you were running a Numeric Faceting Query but the target field was not Numeric, it would still work - or appear to work.

Similarly, if you Sort a query using the wrong SortType, Apache Lucene 4 appears to be forgiving.

When using Apache Lucene 5 now, and also because of some internal changes we had to do to keep your migration cost easier, the validation on the right types got much stricter.

Please be aware that you’ll get runtime exceptions like the following when mistmaching the types:

Type mismatch: ageForIntSorting was indexed with multiple values per document, use SORTED_SET instead

I’m aware those are confusing, so we’ll working on improving the errors: HSEARCH-1951; still even with those error messages, the meaning is you have to fix your mapping. No shame in that, we figured this problem out as several unit tests were too relaxed ;-)

How to get it

Everything you need is available on Hibernate Search’s web site. Download the full distribution from here, or get it from Maven Central, and don’t hesitate to reach us in our forums or mailing lists.

While Hibernate ORM applies some more polish before releasing the final version, so did the Hibernate Search team.

Hibernate Search version 5.4.0.CR2 is now available! It was built and tested with Hibernate ORM 5.0.0.CR3, again we’re just waiting for that to be final but decided to release another Candidate Release as some fixes and improvements were recently applied.

The "Meridian crossing issue"

The Spatial query feature was affected by a math sign mistake which would reveal itself on distance based queries which would cross the 180' meridian. Thanks to Alan Field for the test and Nicholas Helleringer for debugging and fixing the issue!

Serialization issue on enabling Term Vectors

Kudos to Benoit Guillon, who debugged and fixed a mistake in the Avro based serializer, for which values of Term Vectors would not be indexed when enabling Term Vectors in combination with a remote backend. See also HSEARCH-1936.

Various small improvements

You can now have the MassIndexer set a different timeout for the internal transactions it will start, so if you’re running Hibernate Search in a container like WildFly you no longer have to make a choice between having a deadline of 5 minutes or changing the default timeout of the whole container.

  • Improved validation when choosing an invalid Sort definition

  • The Infinispan module will now get you a warning if your Maven project is using the old artifact ids

  • The performance integration tests have been re-enabled, and fixed to run on latest WildFly and Java8 and Java9

  • Fixed a potential issue with timezone encoding for indexed dates

  • Improved the code generating the path for index storage to handle "." elements in the configured paths

  • Upgraded to Hibernate ORM 5.0.0.CR3 and WildFly 10.0.0.Alpha6 (WildFly only used for testing)

Where to download it from

Everything you need is available on Hibernate Search’s web site. Download the full distribution from here, or get it from Maven Central, and don’t hesitate to reach us in our forums

What’s next?

Soon we’ll be releasing a preview of 5.5 to have you upgrade to Apache Lucene 5.2. The Hibernate Search version 5.4.0.Final will be released when Hibernate ORM 5 is released as stable.

Handling queries on complex types in Hibernate Search

Posted by Emmanuel Bernard    |       |    Tagged as Hibernate Search

Writing queries using complex types can be a bit surprising in Hibernate Search. For these multi-fields types, the key is to target each individual field in the query. Let’s discuss how this works.

What’s a complex type?

Hibernate Search lets you write custom types that take a Java property and create Lucene fields in a document. As long as there is a one property for one field relationship, you are good. It becomes more subtle if your custom bridge stores the property in several Lucene fields. Say an Amount type which has the numeric part and the currency part.

Let’s take a real example from a user using Infinispan's search engine - proudly served by Hibernate Search.

The FieldBridge
public class JodaTimeSplitBridge implements TwoWayFieldBridge {

    /**
     * Set year, month and day in separate fields
     */
    @Override
    public void set(String name, Object value, Document document, LuceneOptions luceneoptions) {
        DateTime datetime = (DateTime) value;
        luceneoptions.addFieldToDocument(
            name+".year", String.valueOf(datetime.getYear()), document
        );
        luceneoptions.addFieldToDocument(
            name+".month", String.format("%02d", datetime.getMonthOfYear()), document
        );
        luceneoptions.addFieldToDocument(
            name+".day", String.format("%02d", datetime.getDayOfMonth()), document
        );
    }

    @Override
    public Object get(String name, Document document) {
        IndexableField fieldyear = document.getField(name+".year");
        IndexableField fieldmonth = document.getField(name+".month");
        IndexableField fieldday = document.getField(name+".day");
        String strdate = fieldday.stringValue()+"/"+fieldmonth.stringValue()+"/"+fieldyear.stringValue();
        DateTime value = DateTime.parse(strdate, DateTimeFormat.forPattern("dd/MM/yyyy"));
        return String.valueOf(value);
    }

    @Override
    public String objectToString(Object date) {
        DateTime datetime = (DateTime) date;
        int year = datetime.getYear();
        int month = datetime.getMonthOfYear();
        int day = datetime.getDayOfMonth();
        String value = String.format("%02d",day)+"/"+String.format("%02d",month)+"/"+String.valueOf(year);
        return String.valueOf(value);
    }
}
The entity using the bridge
[...]
@Indexed
class BlogEntry {
    [...]

    @Field(store=Store.YES, index=Index.YES)
    @FieldBridge(impl=JodaTimeSplitBridge.class)
    DateTime creationdate;
}

Let’s query this field

A naive but intuitive query looks like this.

Incorrect query
QueryBuilder qb = sm.buildQueryBuilderForClass(BlogEntry.class).get();
Query q = qb.keyword().onField("creationdate").matching(new DateTime()).createQuery();
CacheQuery cq = sm.getQuery(q, BlogEntry.class);
System.out.println(cq.getResultSize());

Unfortunately that query will always return 0 result. Can you spot the problem?

It turns out that Hibernate Search does not know about these subfields creationdate.year, creationdate.month and creationdate.day. A FieldBridge is a bit of a blackbox for the Hibernate Search query DSL, so it assumes that you index the data in the field name provided by the name parameter (creationdate in this example).

We have plans in a not so future version of Hibernate Search to address that problem. It will only require you to provide a bit of metadata when you write such advanced custom field bridge. But that’s the future, so what to do now?

Use a single field

I am cheating here but as much as you can, try and keep the one property = one field mapping. Life will be much simpler to you. In this specific JodaTime type example, this is extremely easy. Use the custom bridge but instead of creating three fields (for year, month, day), keep it as a single field in the form of yyyymmdd.

Let’s again use our user real life solution.

A bridge using one field
public class JodaTimeSingleFieldBridge implements TwoWayFieldBridge {

    /**
     * Store the data in a single field in yyymmdd format
     */
    @Override
    public void set(String name, Object value, Document document, LuceneOptions luceneoptions) {
        DateTime datetime = (DateTime) value;
        luceneoptions.addFieldToDocument(
            name, datetime.toString(DateTimeFormat.forPattern("yyyyMMdd")), document
        );
    }


    @Override
    public Object get(String name, Document document) {
        IndexableField strdate = document.getField(name);
        return DateTime.parse(strdate.stringValue(), DateTimeFormat.forPattern("yyyyMMdd"));
    }

    @Override
    public String objectToString(Object date) {
        DateTime datetime = (DateTime) date;
        return datetime.toString(DateTimeFormat.forPattern("yyyyMMdd"));
    }
}

In this case, it would even be better to use a Lucene numeric format field. They are more compact and more efficient at range queries. Use luceneOptions.addNumericFieldToDocument( name, numericDate, document );.

The query above will work as expected now.

But my type must have multiple fields!

OK, OK. I won’t avoid the question. The solution is to disable the Hibernate Query DSL magic and target the fields directly.

Let’s see how to do it based on the first FieldBridge implementation.

Query targeting multiple fields
int year = datetime.getYear();
int month = datetime.getMonthOfYear();
int day = datetime.getDayOfMonth();

QueryBuilder qb = sm.buildQueryBuilderForClass(BlogEntry.class).get();
Query q = qb.bool()
    .must( qb.keyword().onField("creationdate.year").ignoreFieldBridge().ignoreAnalyzer()
                .matching(year).createQuery() )
    .must( qb.keyword().onField("creationdate.month").ignoreFieldBridge().ignoreAnalyzer()
                .matching(month).createQuery() )
    .must( qb.keyword().onField("creationdate.day").ignoreFieldBridge().ignoreAnalyzer()
                .matching(day).createQuery() )
   .createQuery();

CacheQuery cq = sm.getQuery(q, BlogEntry.class);
System.out.println(cq.getResultSize());

The key is to:

  • target directly each field,

  • disable the field bridge conversion for the query,

  • and it’s probably a good idea to disable the analyzer.

It’s a rather advanced topic and the query DSL will do the right thing most of the time. No need to panic just yet.

But in case you hit a complex type needs, it’s interesting to understand what is going on underneath.

Updated Roadmap for Hibernate Search

Posted by Sanne Grinovero    |       |    Tagged as Hibernate Search

The Hibernate Search project roadmap was quite outdated, so after some team chats on IRC and our developer’s mailing list I’ve summarized our plans on the project website.

What’s the plan?

Current 80% progress

Upgrade to Hibernate ORM 5

Coming in Hibernate Search 5.4, which is almost ready, and you could try it already.

Transactional improvements for the JMS backend

Also included in version 5.4, with improved documentation and configuration examples to follow probably in 5.4.

MassIndexer Improvements

The long standing limitation of transaction timeouts was finally resolved.

Apache Lucene 5 support

We’ve been working on this on separate branches, but we won’t merge it in 5.4 as this branch is stable now. The Lucene 5 update code doesn’t look too scary but we plan that to be 5.5 's highlight so that you can focus on one thing at a time: update to Hibernate ORM 5 now, and just after to Apache Lucene 5.

Neither upgrade is particularly complex, but I hope you’ll appreciate the effort to let you update step by step and take a break in between in case there is some surprise which needs extra attention.

Consider that the upgrade to Hibernate ORM 5 implies an upgrade to WildFly 10.

Near Future Intentions

Clustering, backends and reliability

We want to make it much simpler for you to use Hibernate Search in a clustered / cloud deployment. This has been possible since years, but the amount of requests for help are soaring and we have to admit that we can make several things easier. This is going to be an ongoing effort, with small improvements like the usage of JMSXGroupID already in 5.4, improved JMS examples in 5.5, and the driver for several of the improvements I’ve listed in the roadmap for 5.6…​ however depending on your help and feedback you could read that as "5.6 and beyond".

Java 8, Java 9, …​

With Java 8 very popular and Java 9 getting closer I hope that roadmap items such as out of the box indexing for the new Date/Time types don’t need any explanation. They could use some help though! We made it easy to register new bridges for any type so you can define your own custom types and define how they should best be indexed; this implies you can make an independent plug-in package for any type we don’t support yet, and if it’s a popular type like the ones from JSR 310 or JSR 354 we’d love to integrate it.

ElasticSearch and Apache Solr

There has been interest in such integrations for a while, but we failed to make any progress as the problem needs to be broken down in smaller steps. Several points on JIRA and the roadmap might seem only tangential but resolve roadblocks and pave the road to integrate Hibernate with such services. While we’re intimately familiar with Apache Lucene and these Lucene-as-a-service alternatives provide similar features, we don’t have direct experience with these, so some help will be appreciated. The plan is to prefer merging small iterative improvements over stalling development for months, so we’ll see several steps in a 5.6+ series and use that as the foundation to assess how we can deal with API inconsistencies across these slightly different backends.

Next!

After having all the above-listed new features nicely chiseled in our current stable API, I’m confident that to make the most of features like the ElasticSearch integration we’ll need to make changes to the API. This last refinement step will define Hibernate Search 6.0.

As always, these plans might need to change and we’ll always look forward to your suggestions. I’m unable to commit on dates or seasons; I would love to see this all happen before this winter but we’ll need your contribution for that to be realistic.

Hibernate Search is ready for Hibernate ORM 5

Posted by Sanne Grinovero    |       |    Tagged as Hibernate Search Releases

Hibernate Search version 5.4.0.CR1 is now available! It was built and tested with Hibernate ORM 5.0.0.CR2, essentially it’s all ready for ORM 5 and we’ll just be waiting for this to be marked Final.

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-search-orm</artifactId>
    <version>5.4.0.CR1</version>
</dependency>
<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-core</artifactId>
    <version>5.0.0.CR2</version>
</dependency>
<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-entitymanager</artifactId>
    <version>5.0.0.CR2</version>
</dependency>

No longer timeouts when using the MassIndexer in a container

You can now have the MassIndexer set a different timeout for the internal transactions it will start, so if you’re running Hibernate Search in a container like WildFly you no longer have to make a choice between having a deadline of 5 minutes or changing the default timeout of the whole container.

fullTextSession
   .createIndexer( User.class )
   .transactionTimeout( 1800 ) //seconds
   .startAndWait();

Great improvements in the JMS backend

Transactional JMS backend

As explained in more detail in last week’s post, Hibernate Search now provides an option to include its indexing operations within the same transaction as the RDBMS.

In short, it’s enabled by setting this property:

hibernate.search.worker.enlist_in_transaction=true

But keep in mind: it’s a global setting! If you want to use it, all your backends shall be set to use an XA enabled, transactional JMS queue.

Please let us know if you have a great use case which would require us to allow some form of mixed mode.

The JMS message header JMSXGroupID

Hibernate Search will now use the specific JMSXGroupID message header and set it to the index name. That will allow users of the JMS backend to take advantage of message grouping.

Modules versions and running in WildFly

The first WildFly version to use Hibernate ORM 5 is version 10.0.0.Alpha5, which was released the past weekend. So the JBoss Modules we create for this appserver are targeting now WildFly 10, but at least version 10.0.0.Alpha5.

Where to download it from

Everything you need is available on Hibernate Search’s web site. Download the full distribution from here, or get it from Maven Central, and don’t hesitate to reach us in our forums

Hibernate Search, JMS and transactions

Posted by Emmanuel Bernard    |       |    Tagged as Hibernate Search

Hibernate Search sends the indexing requests in the post transaction phase. Until now. The JMS backend can now send its indexing requests transactionally with the database changes. Why is that useful? Read on.

A bit of context

When you change indexed entities, Hibernate Search collects these changes during the database transaction. It then waits for the transaction to be successful before pushing them to the backend.

Hibernate Search has a few backends:

  • lucene: this one uses Lucene to index the entities

  • JMS: this one sends a JMS message with the list of index changes. This JMS queue is then read by a master which uses the Lucene backend.

  • and a few more that are not interesting here

Running the backend after the transaction (in the afterTransaction phase to be specific) is generally what you want. Just to name a few reasons:

  • you don’t want index changes to be executed if you end up rollbacking the database transaction

  • you don’t necessarily want your database changes to fail because the indexing fails: you can always rebuild the index from your database.

  • and most backends don’t support transactions anyways

Hibernate Search lets you enlist an error callback so that you are notified of these indexing problems when they happen and react the way you want (log, raise an exception, retry etc).

So why make the JMS backend join the transaction

If you make the JMS backend join the transaction, then either the database changes happen and the JMS messages are received by the queue, or nothing happens (no database change and no JMS message).

The non transactional approach is still our recommended approach. But there are a few reasons why you want to go transactional.

No code to handle the message failure

It eliminates the need to write an error callback and handle this problematic case.

Simpler exploitation processes

It simplifies your exploitation processes. You can focus on monitoring your JMS queue (rates of messages coming in, rates of messages coming out) which will give you an accurate status of the health of Hibernate Search’s work.

Transactional mass indexing

When doing changes to lots of indexed entities, it is common to use the following pseudo pattern to avoiod OutOfMemoryException

for (int i = 0 ; i < workLoadSize ; i++) {
    // do changes
    if ( i % 500 == 0 ) {
        fullTextSession.flush();
        fullTextSession.flushToIndexes();
        fullTextSession.clear();
    }
}

If you use the transactional JMS backend, then all the message will be sent or none of them.

Make sure your JMS implementation and your JTA transaction manager are smart and don’t keep the messages in memory or you might face an OutOfMemoryException.

More consistent batching frameworks flow

If you use a batching framework like Spring Batch which keeps its "done" status in a database, you have a guarantee that changes, indexing requests and batch status are all consistent.

How to use the feature

This feature is now integrated in master and will be in an Hibernate Search release any time soon.

We kept the configuration as simple as possible. Simply add the following property

hibernate.search.worker.enlist_in_transaction=true

If you try and use this option on a non transactional backend (i.e. not JMS), Hibernate Search will yell at you.

Make sure to use a XA JMS queue and that your database supports XA as we are talking about coordinated transactional systems.

Many thanks to Yoann, one of our customers, who helped us refine the why and how of that feature.

Sanne is going to do a virtual JBoss User Group session Tuesday July 14th at 6PM BST / 5PM UTC / 1PM EDT / 10 AM PDT. He is going to talk about Lucene in Java EE.

He will also describe some projects dear to our heart. If you want to know what Hibernate Search, Infinispan bring to the Lucene table and how they use Lucene internally, that’s the event to be in!

Apache Lucene is the de-facto standard open source library for Java developers to implement full-text-search capabilities.

While it’s thriving in its field, it is rarely mentioned in the scope of Java EE development.

In this talk we will see for which features many developers love Lucene, make some concrete examples of common problems it elegantly solves, and see some best practices about using it in a Java EE stack.

Finally we’ll see how some popular OSS projects such as Hibernate ORM (JPA provider), WildFly (Java EE runtime) and Infinispan (in-memory datagrid, JCache implementor) actually provide great Lucene integration capabilities.

If you are interested, get some more info on Meetup and enlist.

When creating a bug report for any project within the Hibernate family, it’s extremely helpful (and, frankly, required) to have an adequate test case available. This is obviously important to make reproducing the issue as easy as possible. But it’s also vital longer-term. Nearly every bug fix should include a regression test, which is frequently based on the original reproducer (sometimes, it’s the reproducer, verbatim).

To help create useful test cases, we’re opening up a repo with various templates. Please see the READMEs in each project’s subdir for more info: Hibernate Test Case Templates

As a starting point, the repo contains two templates for ORM:

  • ORMUnitTestCase: By far, this one’s the most helpful. ORM includes a built-in unit test framework that does much of the heavy lifting for you. All that’s required is your entities, logic, and any necessary settings. Since we nearly always include a regression test with bug fixes, providing your reproducer using this method simplifies the process. We can then directly commit it, without having to mold it in first. What’s even better? Fork hibernate-orm itself, add your test case directly to a module’s unit tests (using the template class), then submit it as a PR!

  • ORMStandaloneTestCase: This template is standalone and will look familiar. It simply uses a run-of-the-mill ORM setup. Although it’s perfectly acceptable as a reproducer, lean towards ORMUnitTestCase whenever possible.

The eventual goal is to also include templates for Validator, Search, and OGM.

As always, this is open source for a reason! If the templates can be improved in any way, please let us know (either through our JIRA instance or through GitHub Issues). Better yet, send us a pull request!

Hibernate Search 5.3.0.Final now available!

Posted by Sanne Grinovero    |       |    Tagged as Hibernate Search

As suggested last week, today we released Hibernate Search version 5.3.0.Final.

Compared to the previous candidate release, the only changes are some minor clarifications in the documentation.

   <dependency>
      <groupId>org.hibernate</groupId>
      <artifactId>hibernate-search-orm</artifactId>
      <version>5.3.0.Final</version>
   </dependency>

Faceting API changes

The new great faceting integration comes at a small migration cost: remember you now need to use the new @Facet annotation as explained in the the example of the previous post.

What's next?

Barring some maintenance needs on this branch 5.3, we have no plans of other Hibernate Search releases to target Hibernate ORM 4.3.x. The focus is now on Hibernate 5 compatibility.

  • Artefact jars are available on Maven Central under the GAV org.hibernate:hibernate-search-orm:5.3.0.Final
  • Tarballs and zip bundles can be downloaded from our website
  • Feedback is welcome on the forums and emails, IRC

For those of you using Hibernate ORM version 5.0.0.CR1, you can now use the freshly released Hibernate Search 5.4 version 5.4.0.Alpha1.

What's new

Absolutely nothing! This Hibernate Search version is identical in terms of features and API to version 5.3.0.CR1: this should make it easier for you all to upgrade the Hibernate ORM libraries (hibernate-core, hibernate-entitymanager,..) without the distraction of changes because of Hibernate Search: focus on the changes you'll need to apply because of the major version upgrade of Hibernate (if any, as it's not too complex at all).

WildFly compatibility and JBoss Modules

With every release of Hibernate Search we normally also release a set of modules to run the latest version of it on WildFly, but in this case since the updated Hibernate ORM 5 integrations for WildFly have yet to be released, we skipped this step. Fear not, the WildFly integration will be finished soon and we'll then resume releasing such module packs as usual. Not least, this very same version of Hibernate Search will soon be available in WildFly 10, so the modules missing today won't actually be needed at all.

This is a great time to try Hibernate 5

While the latest polish is performed on Hibernate 5, we're all looking forward for feedback from you. It is likely that some more changes will be done, but we consider it good enough already to not expect any regression so please try it and let us know! We're at that sweet spot in which you can still propose changes without the chains of strong API compatibility requirements, but good enough for you to not be wasting time on a quickly changing target.

Versions reminder

This version of Hibernate Search requires:

  • Hibernate ORM 5.0.0.CR1
  • Apache Lucene 4.10.x
  • Java SE 7

Our rules and conventions for versions and compatibility are documented here on the GitHub Wiki.

  • Artefact jars are available on Maven Central under the GAV org.hibernate:hibernate-search-orm:5.4.0.Alpha1
  • Zip and tar bundles are available via our website
  • Feedback is welcome on the forums and emails, IRC
back to top