Hibernate Search 4.3.0.Beta1 is now available both in Maven repositories and from Sourceforge.
What's new?
- Performance boosts for the NRT backend
- Spatial API is getting nicer
- Modules for deploying on JBoss improved (bugfixes)
- Compatible with JBoss EAP 6.1
More details can be found on this JIRA filter.
Performance improvements for NRT users
We got a brand new performance testsuite, so we started to play with it and spotted some interesting optimisation opportunities which had eluded us in previous tests. The NRT backend (near-real-time) was affected by some unnecessary locking contention, which could in some scenarios result in significant slowdowns.
So what kind of fix are we talking about? Let's see the performance results of the new tests on the latest Final release first:
Performance Report: Hibernate Search 4.2.0.Final
SUMMARY
Name : FileSystemNearRealTimeTestScenario
Memory usage (total-free):
before : 37MB
after : 40MB
TASKS
10000x InsertBookTask | sum 25:24.769 | avg 00:00.152
10000x UpdateBookRatingTask | sum 25:01.950 | avg 00:00.150
10000x UpdateBookTotalSoldTask | sum 22:54.125 | avg 00:00.137
10000x QueryBooksByAuthorTask | sum 20:22.324 | avg 00:00.122
10000x QueryBooksByAverageRatingTask | sum 30:21.692 | avg 00:00.182
10000x QueryBooksByBestRatingTask | sum 39:56.530 | avg 00:00.239
10000x QueryBooksByNewestPublishedTask | sum 27:02.078 | avg 00:00.162
10000x QueryBooksBySummaryTask | sum 27:19.568 | avg 00:00.163
10000x QueryBooksByTitleTask | sum 27:49.037 | avg 00:00.166
10000x QueryBooksByTotalSoldTask | sum 26:01.403 | avg 00:00.156
TEST CONFIGURATION
threads : 10
measured cycles : 10000
warmup cycles : 100
initial book count : 1000000
initial author count : 10000
Let's see now how much this improved.
Performance Report: Hibernate Search 4.3.0.Beta1
SUMMARY
Name : FileSystemNearRealTimeTestScenario
Memory usage (total-free):
before : 38MB
after : 40MB
TASKS
10000x InsertBookTask | sum 04:53.440 | avg 00:00.029
10000x UpdateBookRatingTask | sum 04:32.154 | avg 00:00.027
10000x UpdateBookTotalSoldTask | sum 04:41.969 | avg 00:00.028
10000x QueryBooksByAuthorTask | sum 01:58.408 | avg 00:00.011
10000x QueryBooksByAverageRatingTask | sum 12:02.741 | avg 00:00.072
10000x QueryBooksByBestRatingTask | sum 12:26.415 | avg 00:00.074
10000x QueryBooksByNewestPublishedTask | sum 12:01.274 | avg 00:00.072
10000x QueryBooksBySummaryTask | sum 07:08.790 | avg 00:00.042
10000x QueryBooksByTitleTask | sum 02:03.112 | avg 00:00.012
10000x QueryBooksByTotalSoldTask | sum 11:54.997 | avg 00:00.071
[same configuration]
And here comes the traditional disclaimer: don't expect the exact same performance benefit to apply to your application. Other applications are very likely to benefit from this but the scale will be different. This is why I am not sharing hardware details, they are not relevant: suffice it to say these tests where run in same conditions, so they comparable among each other.
We can't test all applications out there but I think I can state as an educated guess that I don't expect there to be cases in which performance could worsen. Improvements are likely to be measurable for any application using the near-real-time IndexManager, and could be even better than these figures if you have higher contention (more threads), slower storage performance, or significantly larger indexes.
Thanks for this
I would like to express gratitude for these exciting figures to the whole Apache Lucene development team for having created the Near-Real-Time improvements in Lucene, which we're building on to provide this feature, and to Tomas Hradec from the JBoss QA team for creating the performance tests which nailed the problem and allowed us to make the measuring needed for these improvements.
If anyone wants to contribute tests, even performance ones, we'll be glad to play with them and use them as a base for future improvements.
The usual links
As usual, the issue tracker is JIRA and all code is on GitHub: pull requests and any kind of feedback welcome.
Stay tuned and test this quickly as the Final release will arrive very quickly! We're planning a CR (Candidate Release) next week.