Help

The latest Hibernate Search beta v. 4.2.0.Beta2 is available!

In this iteration we introduce Apache Tika integration, Spatial Queries are now able to sort on distance, and as usual a list of less noticeable improvements.

Apache Tika integration

Apache Tika allows you to extract text and index any kind of documents, like MP3 metadata, PDF text, office files. You can annotate a Blob field if loading the media files from a database, or have the String field point to a resource or file path.

@Entity
@Indexed
public class Book {

	Integer id;
	Blob content;

	@Id @GeneratedValue
	public Integer getId() {
		return id;
	}

	public void setId(Integer id) {
		this.id = id;
	}

	@Lob @Basic(fetch = FetchType.LAZY)
	@Field @TikaBridge // <- just add the TikaBridge as an adaptor to make the Blob indexable as any
	public Blob getContent() {
		return content;
	}

	public void setContent(Blob content) {
		this.content = content;
	}
}

The @TikaBridge annotation supports more options to tune the kind of text extraction; refer to the documentation for more details. Consider this feature experimental for now: we didn't add an option to make the text extraction asynchronous yet, so we might need to change the API to introduce that.

Spatial Queries sorted by distance

Thanks to all of Nicolas's Helleringer work, it's now easy to

  • Return the distance from the search center to each hit (via a projection)
  • Apply a sort criteria on the distance

Let's see an example from our large collection of self-documenting examples (the testsuite!):

QueryBuilder builder = em.getSearchFactory().buildQueryBuilder().forEntity( Cafe.class ).get();

org.apache.lucene.search.Query luceneQuery = builder.spatial()
    .onCoordinates( "location" )
    .within( 100, Unit.KM )
        .ofLatitude( centerLatitude )
        .andLongitude( centerLongitude )
    .createQuery();

FullTextQuery hibQuery = em.createFullTextQuery( luceneQuery, Cafe.class );

Sort distanceSort = new Sort( new DistanceSortField( centerLatitude, centerLongitude, "location" ) );

hibQuery.setSort( distanceSort );

hibQuery.setProjection( FullTextQuery.THIS, FullTextQuery.SPATIAL_DISTANCE );

hibQuery.setSpatialParameters( centerLatitude, centerLongitude, "location" );

List results = hibQuery.getResultList();

Several more reasons to upgrade

  • Apache Lucene upgraded to version 3.6.1
  • JMS and JMX integrations improved
  • The MassIndexer now correctly applies EntityIndexingInterceptor
  • Lower memory usage
  • Spatial Queries improved
  • Improved some classloaders for better integration with other libraries

The complete list of changes can be found here. Check the Migration Guide.

It has been a while since 4.2.0.Beta1 but the summer is over, so try these quickly as we'll move to the Final soon! As always, feedback is very welcome.

4 comments:
 
19. Oct 2012, 08:16 CET | Link

Yah! Against all odds, we are progressing this week :)

ReplyQuote
19. Oct 2012, 09:32 CET | Link
MDMD
"The MassIndexer now correctly applies EntityIndexingInterceptor" - good, thanks!
 
19. Oct 2012, 19:50 CET | Link
The MassIndexer now correctly applies EntityIndexingInterceptor

So how does this actually work? I always thought that the MassIndexer worked with raw data and the EntityIndexingInterceptor works with entity references. BTW that captcha is darn hard

 
20. Oct 2012, 01:48 CET | Link
I always thought that the MassIndexer worked with raw data and the EntityIndexingInterceptor works with entity references.

It's partially raw, not all of it. It's a concurrent pipeline with different stages of transformation; at some point it actually incarnates the usual entity so we have a chance to apply interception and avoid a good deal of work - but not all of it. It's also nice to take advantage of 2nd level caching.

If HSEARCH-499 was resolved it could even skip some data loading in the two first stages, but these are usually very quick anyway while the complexity would rise significantly. I'd need time for some experiments, or someone to volunteer trying the different approaches out.

Post Comment