One of the innovations we have brought to Hibernate Search is an alternative way to define the mapping information: a programmatic API.
The traditional way to map an entity into Hibernate Search is to use annotations. And it's perfectly fine for 95% of the use cases. In some cases though, some people had had a need for a more dynamic approach:
- they use a metamodel to generate or customize what is indexed in their entities and need to reconfigure things either on redeployment or on the fly based on some contextual information.
- they ship a product to multiple customers that require some customization.
For a while, people with this requirement have asked for an XML format equivalent to what annotations could do. Now the problem with XML is that:
- it's very verbose in it's way to duplicate the structural information of your code
<class name="Address"> <property name="street1"> <field> <analyzer definition="ngram"/> </field> </property> <!-- ... --> </class>
- while XML itself is type-safe, XML editors are still close to stone age, and developers writing XML in notepad are unfortunately quite common
- even if XML is type-safe, one cannot refactor the Java code and expect to get compile time errors or even better automatic integrated refactoring. For example, if I rename Address to Location, I still need to remember to change this in my xml file
- and finally, dynamically generating an XML stream to cope with the dynamic reconfiguration use case is not what I would call an intuitive solution
So we took a different road.
Instead of writing the mapping in XML, let's write it in Java. And to make things easier let's use a fluent contextual API (have intuitive method names, only expose the relevant operations).
SearchMapping mapping = new SearchMapping(); mapping .analyzerDef( "ngram", StandardTokenizerFactory.class ) .filter( LowerCaseFilterFactory.class ) .filter( NGramFilterFactory.class ) .param( "minGramSize", "3" ) .param( "maxGramSize", "3" ) .entity(Address.class) .indexed() .property("addressId", METHOD) .documentId() .property("street1", METHOD) .field() .field() .name("street1_ngram") .analyzer("ngram") .property("country", METHOD) .indexedEmbedded() .property("movedIn", METHOD) .dateBridge(Resolution.DAY);
As you can see, it's very easy to figure out what is going on here. But something you cannot see in this example is that your IDE only offers the relevant methods contextually. For example, unless you have just declared a property(), you won't be able to add a field() to it. Likewise, you can set an analyzer on a field, only if you are defining a field. It's like the dynamic languages fluent APIs be better ;)
The next step is to associate the programmatic mapping object to the Hibernate configuration.
//in Hibernate native Configuration configuration = ...; configuration.setProperty( "hibernate.search.model_mapping", mapping ); SessionFactory factory = configuration.buildSessionFactory();
//in JPA Map<String,String> properties = new HashMap<String,String)(1); properties.put( "hibernate.search.model_mapping", mapping ); EntityManagerFactory emf = Persistence.createEntityManagerFactory( "userPU", properties );
The beauty of this API is that it's very easy for XML fan boys to create their own XML schema descriptors and use the programmatic API when parsing the XML stream. More interestingly, an application can expose specific configuration options (via a simple configuration file, a UI or any other form) and use this configuration to customize the mapping programmatically.
Please give this API a try, tell us what works and what does not, we are still figuring out things to make it as awesome as possible :)
Many thanks to Amin Mohammed-Coleman for taking my half done initiative and polishing it up.
You can get Hibernate Search 3.2 Beta 1 here, the complete API documentation is present in the distribution; chapter 4.4.
It has been quite some time since the latest Hibernate Search release, but we are happy to announce the first beta release of version 3.2.0 with tons of bug fixes and exciting new features. In fact there are so many new features that we are planning to write a whole series of blog entries covering the following topics:
- The new API for programmatic configuration of Hibernate Search via org.hibernate.search.cfg.SearchMapping.
- Ability to rebuild the index in parallel threads using the MassIndexer API. This can be as simple as fullTextSession.createIndexer().startAndWait(), but of course there are plenty of options to fine-tune the behavior.
- Clustering via JGroups as an alternative to the existing JMS solution. The values for the hibernate.search.worker.backend option are jgroupsSlave and jgroupsMaster in this case.
- Dynamic boosting via the new @DynamicBoost annotation.
Most of these new features are already documented in the Hibernate Search documentation available in the distribution packages. However, there might be still some gaps in the documentation. If you find any let us know via the Forum or Jira. Smaller new features are:
- New built-in field bridges for the java.util.Calendar and java.lang.Character
- Ability to configure Lucene's LockFactory using hibernate.search.<index>.locking_strategy with the values simple, native, single or none.
- Ability to share IndexWriter instances across transactions. This can be activated via the hibernate.search.<indexname>.exclusive_index_use flag.
Of course we also fixed several bugs of which the following are worth mentioning explicitly:
- HSEARCH-391 Multi level embedded objects don't get an index update
- HSEARCH-353 Removing an entity and adding another with same PK (in same TX) will not add second entity to index
For a full changelog see the Jira release notes. Last but not least, Hibernate Search depends now on Hibernate Core 3.5 beta2 and Lucene 2.4.1 and is aligned with JPA 2.0 CR1.
Special thanks to our contributors Sanne Grinovero and Amin Mohammed-Coleman who put a lot of effort into this release.
A few people have asked me to publish my slides on Bean Validation and Hibernate Search. Here we go :)
- Bean Validation: Declare once validate anywhere. A reality?
- Hibernate Search: Human Heaven and Database Savior in the Cloud
Speaking of conferences, I will be presenting Hibernate Search and the magic of analyzers at Jazoon (Zurich) on Thursday 25th at 11:30. See you there.
Aaron and I will be talking about Hibernate Search and how it can complement your database when you need to scale big, like in... ahem a cloud. It's on Wednesday, June 3 at 9:45 am in Hall 134. I know it's early, someone in the program committee did not like us so much ;)
I will also do an author signing session of Hibernate Search in Action the same day Wed, June 3 at the JavaOne bookstore.
I will also discuss Bean Validation (JSR-303), what it does and how it integrates in Java EE 6 (which I will demo on stage) and any other architecture. This will be Thursday, June 4 at 13:30 in Hall 134. The latest version of the spec is always available here at least till we make it final. Hibernate Validator 4, the reference implementation is well underway, give it a try.
With work on version 3.2 of Hibernate Search well underway and a range of very interesting features in the pipeline (eg programmatic configuration API, bulk indexing and dynamic boosting), we decided to provide some of the bug fixes also for the 3.1 version of Hibernate Search. Hence here is Hibernate Search 3.1.1 GA. On top of several bug fixes which are listed in the release notes we also upgraded Lucene from 2.4 to 2.4.1.
We recommend users of version 3.1 to upgrade to 3.1.1 to come into the benefits of these bug fixes.
You can download the release here.
I am please to announce the GA release of Hibernate Search 3.1. This release focuses on performance improvement and code robustness but also add interesting new features focused on usability:
- An analyzer configuration model to declaratively use and compose features like phonetic approximation, n-gram approximation, search by synonyms, stop words filtering, elision correction, unaccented search and many more.
- A lot of performance improvements at indexing time (including reduced lock contention, parallel execution).
- A lot of performance improvements at query time (including I/O reduction both for Lucene and SQL, and better concurrency).
- Additional new features both for indexing and querying (including support for term vector, access to scoped analyzer at query time and access to results explanation).
A more detailed overlook of the features follows
- Support for declarative analyzer composition through the Solr library
Analyzers can now be declaratively composed as a tokenizer and a set of filters. This enable easy composition of the following features: phonetic approximation, n-gram approximation, search by synonyms, stop words filtering, elision correction, unaccented search and so on.
- Support for dynamic analyzers
Allows a given entity to defined of the analyzer used at runtime. A typical use case is multi-language support where the language varies from one entity instance to an other.
Indexing performance has been enhanced and new controls have been added
- Better control over massive manual indexing (flushToIndexes())
- Support for term vector
- Support for custom similarity
- Better control over index writing (RAM consumption, non-compound file format flag, ...)
- Better support for large index replication
- Improve contention and lock window during indexing
- Reduce the number of index opening/closing
- Indexing is done in parallel per directory
New useful features have been added to queries and performance has been improved.
- Expose entity-scoped and named analyzer for easy reuse at query time
- Filter results (DocIdSet) can now be cached declaratively (default)
- Query results Explanation is now exposed for better debugging information
- Reduces the number of database roundtrips on multi-entity searches
- Faster Lucene queries on indexes containing a single entity type (generally the case)
- Better performance on projected properties (no noticeable overhead compared to raw Lucene calls)
- Reduction of I/O consumption on Lucene by reading only the necessary fields of a document (when possible)
- Reduction of document reads (during pagination and on getResultSize() calls)
- Faster index reopening (keeps unchanged segments opened)
- Better index reader concurrency (use of the read only flag)
- Migration to Lucene 2.4 (and its performance improvements)
- Upgrade to Hibernate Core 3.3
- Use SLF4J as log facade
- Fix a few race conditions on multi core machines
- Resources are properly discarded at SessionFactory.close()
- Fix bug related to embedded entities and @Indexed (HSEARCH-142)
- Filter instances are properly cached now
- And more (see the change logs)
Many thanks to everyone who contributed to the development and test of this release. Especially, many thanks to Sanne and Hardy who worked tirelessly pushing new features and enhancements and fixing bugs while John and I where finishing Hibernate Search in Action.
We have some features on the work that we could not put in 3.1, so stay tuned.
Hibernate Search 3.1.0.CR1 is out. Download it here.
One of the main work was to align more closely with the new Lucene features:
- read-only IndexReader for better concurrency at search time
- use of DocIdSet rather than BitSet in filter implementations for greater flexibility
- explicit use of Lucene's commit()
We have also added a few performance tricks:
- avoid reading unnecessary fields from Lucene when possible
- use Hibernate queries when projecting the object instance (as opposed to rely on batch size)
@DocumentId is now optional and defaults to the property marked as @Id. Scores are no longer normalized (ie no longer <= 1)
Many thanks to Hardy for literally hammering out fixes after fixes and making the release happen on time.
I just thought that the release of Hibernate Search 3.1.0.Beta2 would be a good time to announce another clustering possibility for Hibernate Search - Terracotta clustering. Why would one use Terracotta? Well, there are several potential benefits of Terracotta clustering over the default JMS clustering currently used by Hibernate Search?
- Updates to the index are immediately visible to all nodes in the cluster
- You don't have the requirement of a shared file system
- The faster RAMDirectory is used instead of the slower FSDirectory
But let's get started. You can download the code for the following example here or you can just download the binary package. At the moment the code is not yet part of the Search codebase, but probably it will at some stage.
First you will need to download and install Terracotta. I am using the 2.6.2 release. Just unpack the release into a arbitrary directory. I am using /opt/java/terracotta. Next you will the main Compass jar. You can use this jar. Place this jar into the modules directory of your terracotta installation. This solution does not rely on any Compass classes per se, but utilizes a custom RAMDirectoy implementation - org.compass.needle.terracotta.TerracottaDirectory. This is required since Lucene's RAMDirectory is not Terracotta clusterable out of the box. Let's start the terracotta server now. Switch into the bin directory of your terracotta installation and run ./start-tc-server.sh. Check the log to see whether the server started properly.
Next download and extract hsearch-demo-1.0.0-SNAPSHOT-dist.tar.gz. The dist package currently assumes that you have a mysql database running with a database hibernate and a username/password of hibernate/hibernate. You can change these settings and use a different database if you build the dist package from the source, but more to this later. The dist further assumes that you have installed Terracotta under /opt/java/terracotta. If this is not the case you can change the repository node in config/tc-config.xml. Provided that you have a running mysql database and tc-config.xml properly reflects your terracotta installation directory things should be as easy as just typing ./run.sh. The scripts will ask you whether you want to start a standalone application or a terracotta clustered one. Just press 't' to start a terracotta clustered app. You should get up a Swing JTable:
Press the index button to create an initial index. The data model is based on the former Seam sample DVD store application. Once the index is created just search for example for Tom. You should get a list of DVDs in the table. Experiment a little with the application and different queries. When you are ready start a second instance of the application by running ./run.sh again. You won't have to create the index again. In the second instance the DVDs should be searchable right away. You can also edit the title field of a DVD in one application and search for the updated title in the other. Also try closing both applications and restarting a new instance. Again DVDs should be searchable right away. The Terracotta server keeps a persistent copy of the clustered Lucene directory.
Ok, now it is time to build the application from the source. This will allow you to actually inspect the code and change things like database settings. Donwload hsearch-demo-1.0.0-SNAPSHOT-project.tar.gz and unpack the tarball. Import the maven project in your preferred IDE. To build the project you will need to define the following repositories in your settings.xml:
<repository> <id>jboss</id> <url>http://repository.jboss.com/maven2</url> </repository> <repository> <id>compass-project.org</id> <url>http://repo.compass-project.org</url> </repository>
If you want to use a different database you can add/modify the profiles section in pom.xml. Also have a look at src/main/scripts/tc-config.xml and adjust any settings which differ in your setup. Once you are happy with everything just run mvn assembly:assembly to build your own version of the application.
I basically just started experimenting with this form of clustering and there are still several open questions:
- How does it perform compared to the JMS clustering?
- What are the limits for the RAMDirectory size?
- How can I add failover capabilities?
I am planning to do some more extensive performance tests shortly. Stay tuned in case you are interested.
P.S. It would be great if someone actually tries this out and let me know if it works. As said, it's still work in progress. Any feedback is welcome :)
Hibernate Search 3.1 beta2 is out with a significant focus on performance improvements, scalability and API clean up.
Here is the main area of work:
- Upgrade to Lucene 2.4 which opened up a lot of optimization possibilities on the Hibernate Search side.
- Inserts and deletes are now done in a single index opening rather than two.
- The window of locking has been reduced a lot during writes, especially on transactions involving several entities.
- Filter caching configuration has been simplified.
- Expose scoped analyzer for a given class: queries can now use the same analyzer used at indexing time transparently.
- Properly genericize the API (no more raw type used)
- Fix a few bugs around the Solr analyzer integration and moved to Solr 1.3.
- Fix various bugs including the long standing HSEARCH-142.
We have incorporated a lot of enhancement based on our work on the book Hibernate Search in Action and some genius performance ideas from Sanne. This version is still a beta because we still have a few optimization and enhancements in our pocket but CR1 should come out mid november-ish.
Let us know what you think.
Just came back from Nürnberg where I was invited to present Hibernate Search at the Herbstcampus conference. It was the first year for this conference with the hope of making it an established yearly event. Given that already in the first year over 200 people showed up Herbstcampus might be on the right track.
While in Nürnberg I started putting together a simple Hibernate Search demo. What I wanted was just the bare bone minimum to get things running. I ended up with a simple Swing GUI with a couple of buttons and a JTable. In case you are interested check it out on the Hibernate Wiki.
On the culinary side I had the luck that this week was also the yearly Nürnberger Altstadtfest. I ended up spending a couple of hours walking around the stalls and sampling some Nürnberger Rostbratwürste and Lebkuchen. Yummi :)
Finally some pictures from Nürnberg: