Hibernate Search is a library that integrates Hibernate ORM with Apache Lucene or Elasticsearch by automatically indexing entities, enabling advanced search functionality: full-text, geospatial, aggregations and more. For more information, see Hibernate Search on hibernate.org.

Hibernate Search 4.3.0.Beta1 is now available both in Maven repositories and from Sourceforge.

What's new?

  • Performance boosts for the NRT backend
  • Spatial API is getting nicer
  • Modules for deploying on JBoss improved (bugfixes)
  • Compatible with JBoss EAP 6.1

More details can be found on this JIRA filter.

Performance improvements for NRT users

We got a brand new performance testsuite, so we started to play with it and spotted some interesting optimisation opportunities which had eluded us in previous tests. The NRT backend (near-real-time) was affected by some unnecessary locking contention, which could in some scenarios result in significant slowdowns.

So what kind of fix are we talking about? Let's see the performance results of the new tests on the latest Final release first:

Performance Report: Hibernate Search 4.2.0.Final

SUMMARY
    Name   : FileSystemNearRealTimeTestScenario

    Memory usage (total-free):
        before : 37MB
        after  : 40MB

TASKS
    10000x InsertBookTask                      | sum 25:24.769 | avg 00:00.152
    10000x UpdateBookRatingTask                | sum 25:01.950 | avg 00:00.150
    10000x UpdateBookTotalSoldTask             | sum 22:54.125 | avg 00:00.137
    10000x QueryBooksByAuthorTask              | sum 20:22.324 | avg 00:00.122
    10000x QueryBooksByAverageRatingTask       | sum 30:21.692 | avg 00:00.182
    10000x QueryBooksByBestRatingTask          | sum 39:56.530 | avg 00:00.239
    10000x QueryBooksByNewestPublishedTask     | sum 27:02.078 | avg 00:00.162
    10000x QueryBooksBySummaryTask             | sum 27:19.568 | avg 00:00.163
    10000x QueryBooksByTitleTask               | sum 27:49.037 | avg 00:00.166
    10000x QueryBooksByTotalSoldTask           | sum 26:01.403 | avg 00:00.156

TEST CONFIGURATION
    threads              : 10
    measured cycles      : 10000
    warmup cycles        : 100
    initial book count   : 1000000
    initial author count : 10000

Let's see now how much this improved.

Performance Report: Hibernate Search 4.3.0.Beta1

SUMMARY
    Name   : FileSystemNearRealTimeTestScenario

    Memory usage (total-free):
        before : 38MB
        after  : 40MB

TASKS
    10000x InsertBookTask                      | sum 04:53.440 | avg 00:00.029
    10000x UpdateBookRatingTask                | sum 04:32.154 | avg 00:00.027
    10000x UpdateBookTotalSoldTask             | sum 04:41.969 | avg 00:00.028
    10000x QueryBooksByAuthorTask              | sum 01:58.408 | avg 00:00.011
    10000x QueryBooksByAverageRatingTask       | sum 12:02.741 | avg 00:00.072
    10000x QueryBooksByBestRatingTask          | sum 12:26.415 | avg 00:00.074
    10000x QueryBooksByNewestPublishedTask     | sum 12:01.274 | avg 00:00.072
    10000x QueryBooksBySummaryTask             | sum 07:08.790 | avg 00:00.042
    10000x QueryBooksByTitleTask               | sum 02:03.112 | avg 00:00.012
    10000x QueryBooksByTotalSoldTask           | sum 11:54.997 | avg 00:00.071

[same configuration]

And here comes the traditional disclaimer: don't expect the exact same performance benefit to apply to your application. Other applications are very likely to benefit from this but the scale will be different. This is why I am not sharing hardware details, they are not relevant: suffice it to say these tests where run in same conditions, so they comparable among each other.

We can't test all applications out there but I think I can state as an educated guess that I don't expect there to be cases in which performance could worsen. Improvements are likely to be measurable for any application using the near-real-time IndexManager, and could be even better than these figures if you have higher contention (more threads), slower storage performance, or significantly larger indexes.

Thanks for this

I would like to express gratitude for these exciting figures to the whole Apache Lucene development team for having created the Near-Real-Time improvements in Lucene, which we're building on to provide this feature, and to Tomas Hradec from the JBoss QA team for creating the performance tests which nailed the problem and allowed us to make the measuring needed for these improvements.

If anyone wants to contribute tests, even performance ones, we'll be glad to play with them and use them as a base for future improvements.

As usual, the issue tracker is JIRA and all code is on GitHub: pull requests and any kind of feedback welcome.

Stay tuned and test this quickly as the Final release will arrive very quickly! We're planning a CR (Candidate Release) next week.


Back to top