The latest Hibernate Search beta v. 4.2.0.Beta2 is available!

In this iteration we introduce Apache Tika integration, Spatial Queries are now able to sort on distance, and as usual a list of less noticeable improvements.

Apache Tika integration

Apache Tika allows you to extract text and index any kind of documents, like MP3 metadata, PDF text, office files. You can annotate a Blob field if loading the media files from a database, or have the String field point to a resource or file path.

public class Book {

	Integer id;
	Blob content;

	@Id @GeneratedValue
	public Integer getId() {
		return id;

	public void setId(Integer id) { = id;

	@Lob @Basic(fetch = FetchType.LAZY)
	@Field @TikaBridge // <- just add the TikaBridge as an adaptor to make the Blob indexable as any
	public Blob getContent() {
		return content;

	public void setContent(Blob content) {
		this.content = content;

The @TikaBridge annotation supports more options to tune the kind of text extraction; refer to the documentation for more details. Consider this feature experimental for now: we didn't add an option to make the text extraction asynchronous yet, so we might need to change the API to introduce that.

Spatial Queries sorted by distance

Thanks to all of Nicolas's Helleringer work, it's now easy to

  • Return the distance from the search center to each hit (via a projection)
  • Apply a sort criteria on the distance

Let's see an example from our large collection of self-documenting examples (the testsuite!):

QueryBuilder builder = em.getSearchFactory().buildQueryBuilder().forEntity( Cafe.class ).get(); luceneQuery = builder.spatial()
    .onCoordinates( "location" )
    .within( 100, Unit.KM )
        .ofLatitude( centerLatitude )
        .andLongitude( centerLongitude )

FullTextQuery hibQuery = em.createFullTextQuery( luceneQuery, Cafe.class );

Sort distanceSort = new Sort( new DistanceSortField( centerLatitude, centerLongitude, "location" ) );

hibQuery.setSort( distanceSort );

hibQuery.setProjection( FullTextQuery.THIS, FullTextQuery.SPATIAL_DISTANCE );

hibQuery.setSpatialParameters( centerLatitude, centerLongitude, "location" );

List results = hibQuery.getResultList();

Several more reasons to upgrade

  • Apache Lucene upgraded to version 3.6.1
  • JMS and JMX integrations improved
  • The MassIndexer now correctly applies EntityIndexingInterceptor
  • Lower memory usage
  • Spatial Queries improved
  • Improved some classloaders for better integration with other libraries

The complete list of changes can be found here. Check the Migration Guide.

It has been a while since 4.2.0.Beta1 but the summer is over, so try these quickly as we'll move to the Final soon! As always, feedback is very welcome.

19. Oct 2012, 08:16 CET | Link

Yah! Against all odds, we are progressing this week :)

19. Oct 2012, 09:32 CET | Link
"The MassIndexer now correctly applies EntityIndexingInterceptor" - good, thanks!
19. Oct 2012, 19:50 CET | Link
The MassIndexer now correctly applies EntityIndexingInterceptor

So how does this actually work? I always thought that the MassIndexer worked with raw data and the EntityIndexingInterceptor works with entity references. BTW that captcha is darn hard

20. Oct 2012, 01:48 CET | Link
I always thought that the MassIndexer worked with raw data and the EntityIndexingInterceptor works with entity references.

It's partially raw, not all of it. It's a concurrent pipeline with different stages of transformation; at some point it actually incarnates the usual entity so we have a chance to apply interception and avoid a good deal of work - but not all of it. It's also nice to take advantage of 2nd level caching.

If HSEARCH-499 was resolved it could even skip some data loading in the two first stages, but these are usually very quick anyway while the complexity would rise significantly. I'd need time for some experiments, or someone to volunteer trying the different approaches out.

22. Mar 2014, 09:40 CET | Link

Apache Tika permits you to concentrate content and file any sort of reports, in the same way as Mp3 metadata, PDF content, office indexes. You can comment a Blob field if stacking the media documents from a database, or have the String field point to an asset or index way.file conversion service

02. Apr 2014, 16:28 CET | Link

Additionally it is seen as the popular alternative for your people participate in the game industry or will be in the game vocation. This minimizes the particular too much warmth build-up and also lessens the particular moisture which leads for the trouble of Candida sourcing underneath the feet. To truly realize why guys ought to wear the particular bottom socks, you need to comprehend the particular downsides from the frequent regular socks. With all the frequent socks the digits obtain constantly within a confined place and lots of warmth assimilated the following. Heat which in turn builds up inside will cause the particular too much moisture sourcing and with continuing friction involving the digits the particular dissipated warmth and also the too much moisture will cause the breitling replica watches issues similar to blisters and Candida. Your bottom socks pertaining to guys stay clear of each one of these issues and it also offers a person the particular ease and comfort that you constantly seek out.

Post Comment