Hibernate Search 4.3.0.Beta1 is now available both in Maven repositories and from Sourceforge.
What's new?
- Performance boosts for the NRT backend
- Spatial API is getting nicer
- Modules for deploying on JBoss improved (bugfixes)
- Compatible with JBoss EAP 6.1
More details can be found on this JIRA filter.
Performance improvements for NRT users
We got a brand new performance testsuite, so we started to play with it and spotted some interesting optimisation opportunities which had eluded us in previous tests. The NRT backend (near-real-time) was affected by some unnecessary locking contention, which could in some scenarios result in significant slowdowns.
So what kind of fix are we talking about? Let's see the performance results of the new tests on the latest Final release first:
Performance Report: Hibernate Search 4.2.0.Final
SUMMARY Name : FileSystemNearRealTimeTestScenario Memory usage (total-free): before : 37MB after : 40MB TASKS 10000x InsertBookTask | sum 25:24.769 | avg 00:00.152 10000x UpdateBookRatingTask | sum 25:01.950 | avg 00:00.150 10000x UpdateBookTotalSoldTask | sum 22:54.125 | avg 00:00.137 10000x QueryBooksByAuthorTask | sum 20:22.324 | avg 00:00.122 10000x QueryBooksByAverageRatingTask | sum 30:21.692 | avg 00:00.182 10000x QueryBooksByBestRatingTask | sum 39:56.530 | avg 00:00.239 10000x QueryBooksByNewestPublishedTask | sum 27:02.078 | avg 00:00.162 10000x QueryBooksBySummaryTask | sum 27:19.568 | avg 00:00.163 10000x QueryBooksByTitleTask | sum 27:49.037 | avg 00:00.166 10000x QueryBooksByTotalSoldTask | sum 26:01.403 | avg 00:00.156 TEST CONFIGURATION threads : 10 measured cycles : 10000 warmup cycles : 100 initial book count : 1000000 initial author count : 10000
Let's see now how much this improved.
Performance Report: Hibernate Search 4.3.0.Beta1
SUMMARY Name : FileSystemNearRealTimeTestScenario Memory usage (total-free): before : 38MB after : 40MB TASKS 10000x InsertBookTask | sum 04:53.440 | avg 00:00.029 10000x UpdateBookRatingTask | sum 04:32.154 | avg 00:00.027 10000x UpdateBookTotalSoldTask | sum 04:41.969 | avg 00:00.028 10000x QueryBooksByAuthorTask | sum 01:58.408 | avg 00:00.011 10000x QueryBooksByAverageRatingTask | sum 12:02.741 | avg 00:00.072 10000x QueryBooksByBestRatingTask | sum 12:26.415 | avg 00:00.074 10000x QueryBooksByNewestPublishedTask | sum 12:01.274 | avg 00:00.072 10000x QueryBooksBySummaryTask | sum 07:08.790 | avg 00:00.042 10000x QueryBooksByTitleTask | sum 02:03.112 | avg 00:00.012 10000x QueryBooksByTotalSoldTask | sum 11:54.997 | avg 00:00.071 [same configuration]
And here comes the traditional disclaimer: don't expect the exact same performance benefit to apply to your application. Other applications are very likely to benefit from this but the scale will be different. This is why I am not sharing hardware details, they are not relevant: suffice it to say these tests where run in same conditions, so they comparable among each other.
We can't test all applications out there but I think I can state as an educated guess that I don't expect there to be cases in which performance could worsen. Improvements are likely to be measurable for any application using the near-real-time IndexManager, and could be even better than these figures if you have higher contention (more threads), slower storage performance, or significantly larger indexes.
Thanks for this
I would like to express gratitude for these exciting figures to the whole Apache Lucene development team for having created the Near-Real-Time improvements in Lucene, which we're building on to provide this feature, and to Tomas Hradec from the JBoss QA team for creating the performance tests which nailed the problem and allowed us to make the measuring needed for these improvements.
If anyone wants to contribute tests, even performance ones, we'll be glad to play with them and use them as a base for future improvements.
The usual links
As usual, the issue tracker is JIRA and all code is on GitHub: pull requests and any kind of feedback welcome.
Stay tuned and test this quickly as the Final release will arrive very quickly! We're planning a CR (Candidate Release) next week.