A few people have asked me to publish my slides on Bean Validation and Hibernate Search. Here we go :)
- Bean Validation: Declare once validate anywhere. A reality?
- Hibernate Search: Human Heaven and Database Savior in the Cloud
Speaking of conferences, I will be presenting Hibernate Search and the magic of analyzers at Jazoon (Zurich) on Thursday 25th at 11:30. See you there.
Aaron and I will be talking about Hibernate Search and how it can complement your database when you need to scale big, like in... ahem a cloud. It's on Wednesday, June 3 at 9:45 am in Hall 134. I know it's early, someone in the program committee did not like us so much ;)
I will also do an author signing session of Hibernate Search in Action the same day Wed, June 3 at the JavaOne bookstore.
I will also discuss Bean Validation (JSR-303), what it does and how it integrates in Java EE 6 (which I will demo on stage) and any other architecture. This will be Thursday, June 4 at 13:30 in Hall 134. The latest version of the spec is always available here at least till we make it final. Hibernate Validator 4, the reference implementation is well underway, give it a try.
With work on version 3.2 of Hibernate Search well underway and a range of very interesting features in the pipeline (eg programmatic configuration API, bulk indexing and dynamic boosting), we decided to provide some of the bug fixes also for the 3.1 version of Hibernate Search. Hence here is Hibernate Search 3.1.1 GA. On top of several bug fixes which are listed in the release notes we also upgraded Lucene from 2.4 to 2.4.1.
We recommend users of version 3.1 to upgrade to 3.1.1 to come into the benefits of these bug fixes.
You can download the release here.
I am please to announce the GA release of Hibernate Search 3.1. This release focuses on performance improvement and code robustness but also add interesting new features focused on usability:
- An analyzer configuration model to declaratively use and compose features like phonetic approximation, n-gram approximation, search by synonyms, stop words filtering, elision correction, unaccented search and many more.
- A lot of performance improvements at indexing time (including reduced lock contention, parallel execution).
- A lot of performance improvements at query time (including I/O reduction both for Lucene and SQL, and better concurrency).
- Additional new features both for indexing and querying (including support for term vector, access to scoped analyzer at query time and access to results explanation).
A more detailed overlook of the features follows
- Support for declarative analyzer composition through the Solr library
Analyzers can now be declaratively composed as a tokenizer and a set of filters. This enable easy composition of the following features: phonetic approximation, n-gram approximation, search by synonyms, stop words filtering, elision correction, unaccented search and so on.
- Support for dynamic analyzers
Allows a given entity to defined of the analyzer used at runtime. A typical use case is multi-language support where the language varies from one entity instance to an other.
Indexing performance has been enhanced and new controls have been added
- Better control over massive manual indexing (flushToIndexes())
- Support for term vector
- Support for custom similarity
- Better control over index writing (RAM consumption, non-compound file format flag, ...)
- Better support for large index replication
- Improve contention and lock window during indexing
- Reduce the number of index opening/closing
- Indexing is done in parallel per directory
New useful features have been added to queries and performance has been improved.
- Expose entity-scoped and named analyzer for easy reuse at query time
- Filter results (DocIdSet) can now be cached declaratively (default)
- Query results Explanation is now exposed for better debugging information
- Reduces the number of database roundtrips on multi-entity searches
- Faster Lucene queries on indexes containing a single entity type (generally the case)
- Better performance on projected properties (no noticeable overhead compared to raw Lucene calls)
- Reduction of I/O consumption on Lucene by reading only the necessary fields of a document (when possible)
- Reduction of document reads (during pagination and on getResultSize() calls)
- Faster index reopening (keeps unchanged segments opened)
- Better index reader concurrency (use of the read only flag)
- Migration to Lucene 2.4 (and its performance improvements)
- Upgrade to Hibernate Core 3.3
- Use SLF4J as log facade
- Fix a few race conditions on multi core machines
- Resources are properly discarded at SessionFactory.close()
- Fix bug related to embedded entities and @Indexed (HSEARCH-142)
- Filter instances are properly cached now
- And more (see the change logs)
Many thanks to everyone who contributed to the development and test of this release. Especially, many thanks to Sanne and Hardy who worked tirelessly pushing new features and enhancements and fixing bugs while John and I where finishing Hibernate Search in Action.
We have some features on the work that we could not put in 3.1, so stay tuned.
Hibernate Search 3.1.0.CR1 is out. Download it here.
One of the main work was to align more closely with the new Lucene features:
- read-only IndexReader for better concurrency at search time
- use of DocIdSet rather than BitSet in filter implementations for greater flexibility
- explicit use of Lucene's commit()
We have also added a few performance tricks:
- avoid reading unnecessary fields from Lucene when possible
- use Hibernate queries when projecting the object instance (as opposed to rely on batch size)
@DocumentId is now optional and defaults to the property marked as @Id. Scores are no longer normalized (ie no longer <= 1)
Many thanks to Hardy for literally hammering out fixes after fixes and making the release happen on time.
I just thought that the release of Hibernate Search 3.1.0.Beta2 would be a good time to announce another clustering possibility for Hibernate Search - Terracotta clustering. Why would one use Terracotta? Well, there are several potential benefits of Terracotta clustering over the default JMS clustering currently used by Hibernate Search?
- Updates to the index are immediately visible to all nodes in the cluster
- You don't have the requirement of a shared file system
- The faster RAMDirectory is used instead of the slower FSDirectory
But let's get started. You can download the code for the following example here or you can just download the binary package. At the moment the code is not yet part of the Search codebase, but probably it will at some stage.
First you will need to download and install Terracotta. I am using the 2.6.2 release. Just unpack the release into a arbitrary directory. I am using /opt/java/terracotta. Next you will the main Compass jar. You can use this jar. Place this jar into the modules directory of your terracotta installation. This solution does not rely on any Compass classes per se, but utilizes a custom RAMDirectoy implementation - org.compass.needle.terracotta.TerracottaDirectory. This is required since Lucene's RAMDirectory is not Terracotta clusterable out of the box. Let's start the terracotta server now. Switch into the bin directory of your terracotta installation and run ./start-tc-server.sh. Check the log to see whether the server started properly.
Next download and extract hsearch-demo-1.0.0-SNAPSHOT-dist.tar.gz. The dist package currently assumes that you have a mysql database running with a database hibernate and a username/password of hibernate/hibernate. You can change these settings and use a different database if you build the dist package from the source, but more to this later. The dist further assumes that you have installed Terracotta under /opt/java/terracotta. If this is not the case you can change the repository node in config/tc-config.xml. Provided that you have a running mysql database and tc-config.xml properly reflects your terracotta installation directory things should be as easy as just typing ./run.sh. The scripts will ask you whether you want to start a standalone application or a terracotta clustered one. Just press 't' to start a terracotta clustered app. You should get up a Swing JTable:
Press the index button to create an initial index. The data model is based on the former Seam sample DVD store application. Once the index is created just search for example for Tom. You should get a list of DVDs in the table. Experiment a little with the application and different queries. When you are ready start a second instance of the application by running ./run.sh again. You won't have to create the index again. In the second instance the DVDs should be searchable right away. You can also edit the title field of a DVD in one application and search for the updated title in the other. Also try closing both applications and restarting a new instance. Again DVDs should be searchable right away. The Terracotta server keeps a persistent copy of the clustered Lucene directory.
Ok, now it is time to build the application from the source. This will allow you to actually inspect the code and change things like database settings. Donwload hsearch-demo-1.0.0-SNAPSHOT-project.tar.gz and unpack the tarball. Import the maven project in your preferred IDE. To build the project you will need to define the following repositories in your settings.xml:
<repository> <id>jboss</id> <url>http://repository.jboss.com/maven2</url> </repository> <repository> <id>compass-project.org</id> <url>http://repo.compass-project.org</url> </repository>
If you want to use a different database you can add/modify the profiles section in pom.xml. Also have a look at src/main/scripts/tc-config.xml and adjust any settings which differ in your setup. Once you are happy with everything just run mvn assembly:assembly to build your own version of the application.
I basically just started experimenting with this form of clustering and there are still several open questions:
- How does it perform compared to the JMS clustering?
- What are the limits for the RAMDirectory size?
- How can I add failover capabilities?
I am planning to do some more extensive performance tests shortly. Stay tuned in case you are interested.
P.S. It would be great if someone actually tries this out and let me know if it works. As said, it's still work in progress. Any feedback is welcome :)
Hibernate Search 3.1 beta2 is out with a significant focus on performance improvements, scalability and API clean up.
Here is the main area of work:
- Upgrade to Lucene 2.4 which opened up a lot of optimization possibilities on the Hibernate Search side.
- Inserts and deletes are now done in a single index opening rather than two.
- The window of locking has been reduced a lot during writes, especially on transactions involving several entities.
- Filter caching configuration has been simplified.
- Expose scoped analyzer for a given class: queries can now use the same analyzer used at indexing time transparently.
- Properly genericize the API (no more raw type used)
- Fix a few bugs around the Solr analyzer integration and moved to Solr 1.3.
- Fix various bugs including the long standing HSEARCH-142.
We have incorporated a lot of enhancement based on our work on the book Hibernate Search in Action and some genius performance ideas from Sanne. This version is still a beta because we still have a few optimization and enhancements in our pocket but CR1 should come out mid november-ish.
Let us know what you think.
Just came back from Nürnberg where I was invited to present Hibernate Search at the Herbstcampus conference. It was the first year for this conference with the hope of making it an established yearly event. Given that already in the first year over 200 people showed up Herbstcampus might be on the right track.
While in Nürnberg I started putting together a simple Hibernate Search demo. What I wanted was just the bare bone minimum to get things running. I ended up with a simple Swing GUI with a couple of buttons and a JTable. In case you are interested check it out on the Hibernate Wiki.
On the culinary side I had the luck that this week was also the yearly Nürnberger Altstadtfest. I ended up spending a couple of hours walking around the stalls and sampling some Nürnberger Rostbratwürste and Lebkuchen. Yummi :)
Finally some pictures from Nürnberg:
It has been a long time since an Hibernate Search release but we have not been lazy. We are pleased to announce 3.1.0 Beta1 with tons of new features and enhancements. This release uses Lucene 2.3.x and works with Hibernate Core 3.3, Hibernate Annotations 3.4 and Hibernate EntityManager 3.4. Here is a list of some of the major new features and enhancements:
- more flexible analyzer support (see below)
- the Hibernate Search engine is no longer tied to Hibernate Core (see below)
- performance enhancements on projections (Hibernate Search is now as fast as pure Lucene)
- performance enhancements in the object loading algorithm (when multiple object types are requested)
- better memory management on large index copies
- better mass indexing approach by explicitly flushing changes to indexes via a programmatic API (deprecating the old batch_size approach)
- better resource sharing through the shared-segments reader provider strategy
- better and more transparent filter caching solution
- access to more Lucene features including term position, similarity and query explanations
- simplification of configuration (events)
- more built in bridges
Hibernate Search let's you define analyzers declaratively and decouple tokenizer and token filters usage thanks to the Solr analyzer framework. It is now very easy to index a field for phonetic, synonym, snowball (stemming) and many more. A small dependency bug has leaked in this beta1 version. You will need to replace apache-solr-analzers.jar by a full solr distribution jar you can download at apache.org if you ant to use @AnalyzerDef on some filters.
The core engine is now abstracted form Hibernate Core thanks to the job done by Navin, our Google Summer of Code student. Hibernate Search is now the JBoss Cache full-text search engine (more on that in a later post) and is now open to support alternative data stores (including other ORMs).
We will likely post new entries to zoom on some of these features.
Hibernate Search in Action already reflects most of the new features and will describe all of them in the near future.