Red Hat

In Relation To Emmanuel Bernard

In Relation To Emmanuel Bernard

Full Text search for Hibernate goes final

Posted by    |       |    Tagged as Hibernate Search

The Hibernate Search team is pleased to announce version 3.0 final. Hibernate Search provides full text search (google-like) capabilities to Hibernate domain model objects. Based on Apache Lucene, Hibernate Search focuses on ease of use and ease of configuration, lowering the barrier to entry of Lucene and its integration with a domain model.

Key features include:

  • Transparent index synchronization: This feature eliminates the need to manually update the index on data change. Events generated by Hibernate Core will trigger the update transparently for the application. Index updates are scoped per transaction to match the application transactional behavior.
  • Seamless integration with the Hibernate and Java Persistence query model: Hibernate Search embraces both the Hibernate and Java Persistence semantic and APIs. As a result, switching from a Hibernate Query Language (HQL) query to a full text query requires only minimal changes to the application.
  • Out-of-the-box asynchronous clustering mode: Handles clustered applications, this out of the box mode also handles gracefully indexing load peaks, avoiding potential contentions on online systems.
  • Product extensibility: Developers can extend Hibernate Search with a series of extension points for deep index interaction customization that helps edge case applications meet their performance and architectural requirements and constraints.

Some additional noticeable features:

  • query filter (similar to the Hibernate Filter feature): useful for security, temporal data, category filtering etc transparently cached for the user
  • join-style query: ability to query based on associated entities
  • query projection: avoid database roundtrips if the relevant data is also stored in the index
  • access to the result score, boost, total number of results and other Lucene metadata
  • ability to manually (re)index and purge data form the index
  • index sharding: sharing the same index for several classes or splitting (sharding) a given class into several indexes. It is useful for performance when the index becomes /very/ big.
  • transparently optimized access to Lucene both for index update and queries
  • native access to the Lucene resources

Many thanks to the community for having over the past year shown support, enthusiasm and helped the product maturation both from a feature set and stability point of view. You can download Hibernate Search or walk through the documentation and the getting started section. Happy searching :)

Hibernate Search and Shards talk at AJUG Sept 18th

Posted by    |       |    Tagged as Hibernate Shards

I will be giving a talk at the Atlanta JUG next Tuesday (18th).

The talk will go through Hibernate Search: why using it, where to use it, it's internal architecture and how to use it best.

I will also introduce Hibernate Shards, what it is and what it is not :)

Hopefully Q&A will have a decent slice of the time cake.

See you in the Perimeter area :)

Release Candidate for Hibernate Search 3.0.0

Posted by    |       |    Tagged as Hibernate Search

Hibernate Search 3.0.0.CR1 is now out. This release is mainly the last bits of new features and polishing before the final version. The next cycle will be dedicated to bug fixes (of any bug that pops up), as well as test suite and documentation improvements.

Thanks to Hardy for the new getting started guide (this should ease the path for newcomers), and to John for hammering the last features we wanted in the GA version:

  • /manual indexing/ you can disable event based indexing: useful when the
  • /purge/ you can remove an entity from the index without affecting the database. This is especially useful if you take care of the indexing manually (using a timestamp method for example)

The next version should be the GA release unless some complex bugs are discovered.

Check the changelogs for a detailed change list.

Hibernate Search 3.0 Beta 4: new features bandwagon

Posted by    |       |    Tagged as Hibernate Search

Hibernate Search has a new beta out and comes with a bunch of interesting new features:

  • Named filters: custom filters on query results (transparently cacheable)
  • Automatic index optimization
  • Access to query metadata (Score, ...)
  • Support for the Java Persistence API
  • Index Sharding (indexing an entity into several underlying Lucene indexes)

Named filters

Based on Lucene filters, named filters provide the ability to apply custom filter restrictions to the query results. Enabled by name and parameters (very much like Hibernate Core filters), filters are cacheable to improve performance. Some noticeable use cases are security, temporal data, restriction by population, query within query results.

Automatic index optimization

Hibernate Search can transparently optimize your index after a certain amount of operations (add, delete) or transactions.

Query result metadata

The projection API has been enhanced to return query specific data like the document score (relevance) and a few other metadata.

Support for the Java Persistence API

There is now a FullTextEntityManager and FullTextQuery (extending javax.persistence.Query). No need to access entityManager.getDelegate() anymore.

Index sharding

In extreme cases, Lucene indexes need to be split into several physical indexes. Hibernate Search can now index a given entity to several underlying Lucene indexes.

And a few more

There are a few more additional features:

  • Ability to index a given property in multiple different fields with different settings (without the need for a custom FieldBridge)
  • Fine grained analyzers (global, per entity, per property or per field)
  • Expose Lucene merge factor, max merge doc and max buffered docs
  • Ships with Lucene 2.2

Thanks to John Griffin and Hardy Ferentschik for stepping up on this release. The feature set is up to what was envisioned for the final release (much more actually) and has proven very stable. We expect a short CR cycle and the GA soon after.

Check out here for more info. The download page is here .

Hibernate Search 3.0.0.Beta2 is out with a bunch of new interesting features:

  • shared Lucene IndexReader, significantly increasing the performances especially in read mostly applications
  • ability to customize the object graph fetching strategy
  • properties projection from the Lucene index
  • ability to sort queries (thanks to Hardy Ferentschik)
  • expose the total number of matching results regardless of pagination

Performance

Hibernate Search can now share IndexReaders across queries and threads, making them much more efficient especially on applications where the number of reads is much higher than the number of updates. Opening and warming a Lucene IndexReader can be a relatively costly operation compared to a single query. To enable sharing, just add the following property

hibernate.search.reader.strategy shared

Object loading has been enhanced as well. You can for example define the exact fetching strategy used to load the expected object graph, pretty much like you would do it in a Criteria or HQL query, allowing per use case optimization.

Some use cases do not mandate a fully loaded object. Hibernate Search now allow properties projection from the Lucene index. At the cost of storing the value in the index, you can now retrieve a specific subset of properties. The behavior is similar to HQL or Criteria query projections.

fullTextQuery.setIndexProjection( "id", "summary", "body", "mainAuthor.name" ).list();

Query flexibility

I already talked about the customizable fetching strategy and projection.

The default sorting on Hibernate Search queries is per relevance, but you can now customize this strategy and sort per field(s).

Regardless of the pagination process, it is interesting to know the total number of matching elements. While costly in plain SQL, this information is provided out-of-the-box by Apache Lucene. getResultSize() is now exposed by the FullTextQuery. From this information, you can for example:

  • implement the search engine feature '1-10 of about 888,000,000'
  • implement a fast pagination navigation
  • implement a multi step search engine (gradually enabling approximations if the restricted query(ies) return no or not enough results)

For more information, check the the release notes and download the bundle . Hibernate Search 3.0.0.Beta2 is compatible with Hibernate Core 3.2.x (from 3.2.2), Hibernate Annotations 3.3.x and Hibernate EntityManaher 3.3.x.

Groovy is Seamed

Posted by    |       |    Tagged as Seam

With the new Groovy 1.1 beta out and its support for Java 5 annotations, wouldn't it be great to be able to write Seam applications in Groovy? Indeed it is great and you can do that with JBoss Seam (in CVS HEAD at the time of writing).

What is supported, how does it work?

You can write any entity and any action in Groovy. By simply annotating your Groovy classes with Seam annotations, they become Seam components.

@Scope(ScopeType.SESSION)
@Name("bookingList")
class BookingListAction implements Serializable
{
    @In EntityManager em
    @In User user
    @DataModel List<Booking> bookings
    @DataModelSelection Booking booking
    @Logger Log log

    @Factory public void getBookings()
    {
        bookings = em.createQuery('''
                select b from Booking b
                where b.user.username = :username
                order by b.checkinDate''')
            .setParameter("username", user.username)
            .getResultList()
    }
    
    public void cancel()
    {
        log.info("Cancel booking: #{bookingList.booking.id} for #{user.username}")
        Booking cancelled = em.find(Booking.class, booking.id)
        if (cancelled != null) em.remove( cancelled )
        getBookings()
        FacesMessages.instance().add("Booking cancelled for confirmation number #{bookingList.booking.id}", new Object[0])
    }
}

En passant, you can use Groovy to write your Entities, Hibernate support them out of the box. No constraint, no limitation, no XML ;-)

@Entity
@Name("hotel")
class Hotel implements Serializable
{
    @Id @GeneratedValue
    Long id

    @Length(max=50) @NotNull
    String name

    @Length(max=100) @NotNull
    String address

    @Length(max=40) @NotNull
    String city

    @Length(min=2, max=10) @NotNull
    String state

    @Length(min=4, max=6) @NotNull
    String zip

    @Length(min=2, max=40) @NotNull
    String country

    @Column(precision=6, scale=2)
    BigDecimal price

    @Override
    String toString()
    {
        return "Hotel(${name},${address},${city},${zip})"
    }
}

Groovy files are compiled by the groovyc compiler in your build system ; they then appear like regular classes to the container.

Push it even Groovier

Let's go further, when Seam is in development mode, the .groovy files can be deployed as is with no groovyc build time compilation involved. Like hot redeployable Java Seam components, deploy (and I mean copy) your .groovy files in your WEB-INF/dev directory.

No need to restart the application (not even speaking of the container): the next hit will reload the Groovy classes transparently providing a pretty smooth development environment. Fast development time, fast deployment time.

This mode is currently limited to Seam JavaBean components: EJB 3.0 Session Beans and Entities do not (yet) support hot redeployment. We are considering enhancing the JBoss EJB 3 container to get rid of this limitation (you will still hit this limitation in other containers though).

How can I set that up?

By using seam-gen, you can generate a ready to use development environment supporting Groovy in a minute.

./seam setup
# use project type WAR, the rest is at your will

./seam new-project
# that's it

And you are done, feel free to write .groovy code in either src/model or src/action Remember in Seam development mode, you don't have to restart the application when you change code in src/action (whether it be Groovy or Java). A simple ./seam explode (to copy the Groovy files) will do the trick.

For a complete working Groovy project, have a look at the groovybooking project in JBoss Seam examples (CVS HEAD at the time of writing).

This feature (already available in CVS HEAD) is expected for the next major JBoss Seam release: we still have some more surprises in our bag :-)

NB: if you are interested in Groovy and are in San Francisco tonight, don't miss the G2One , I will be there if you have any question.

Hibernate and Seam at JavaOne 2007

Posted by    |       |    Tagged as Seam

Hibernate and JBoss Seam will be covered by some of the JBoss folks at JavaOne.

Hibernate Search

TS-4746 - Hibernate Search: Googling Your Java Technology-Based Persistent Domain Model

I will describe the problem addressed, how Hibernate Search backed by Apache Lucene provides a solution, the internal architecture and some tips and tricks. I will also demo how to add Hibernate Search to an application.

Hibernate Validator and JSR-303

TS-4112 - Declarative Programming: Tighten Enterprise JavaBeans (EJB) 3.0 and JSR 303 Beans Validation

I will provide an update about JSR-303 Bean Validation and Hibernate Validator, how it will fit into the Java SE/Java EE world. I will also demo how to use it today with EJB 3.0 interceptors.

JBoss Seam

TS-4089 - Web Beans Update

Gavin will talk about Web Beans, the standardization process around JBoss Seam.

BOF-4400 - Improve and Expand JavaServer Faces Technology with JBoss Seam

BOF-9792 - Rapid Seam Application Development with the NetBeans IDE

Michael will also talk about JBoss Seam, the product. These tracks are more focused on JBoss Seam ease of use and Rapid Application Development (this is what JBoss Seam is about after all ;- ) )

Check the JavaOne website or the JBoss website for more info.

I think a party is planed too, see you there.

Welcome Hibernate Shards!

Posted by    |       |    Tagged as Hibernate Shards

This is a pretty big day for the Hibernate family. We welcome three new top level projects:

  • Hibernate Shards
  • Hibernate Validator
  • Hibernate Search

So total we have made five new releases today .

Hibernate Shards 3.0.0 Beta1

Contributed by Google, Hibernate Shards is a horizontal partitioning solution built on top of Hibernate Core. When you need to distribute (shard) your data across multiple databases, Hibernate Shards is for you (too much data for a single database instance, regional deployment requirements, etc.) Like all Hibernate projects, Hibernate Shards is released under the LGPL license. Big thanks to Max Ross, Maulik Shah Tomislav Nad, and Google :-) for contributing back to the community their pretty impressive Google's 20 percent project.

Check the documentation for more information.

Hibernate Search 3.0.0.Beta1

Hibernate Search is now a top level project independent of Hibernate Annotations. New in this release:

  • out of the box index clustering through JMS - master/slaves model -(maximizing throughput)
  • asynchronous indexing (maximizing application response time)
  • indexing of embedded/associated objects and correlated queries (semantically similar to a SQL JOIN)
  • use of Apache Lucene(tm) 2.1.0 (lot's of performance and scalability improvements)

While marked as Beta because its scope is rapidly growing and some APIs are still subject to change, Hibernate Search is already used by quite a few people, check it out .

Hibernate Validator 3.0.0.GA

Hibernate Validator is also a new top level project independent of Hibernate Annotations. New in this release:

  • run with pure Java Persistence Provider (entity listener provided)
  • more business oriented validators

Check the website and the change log for more information.

Hibernate Annotations and Hibernate EntityManager 3.3.0.GA

A few minor configuration changes (necessary to introduce the previous projects) lead us to an version number increase. This version is however mostly backward compatible with 3.2.x. Some of the new features are listed:

  • transparent event wiring for Hibernate Validator and Hibernate Search
  • performance improvements during cascading in Hibernate EntityManager
  • more SQL customizations as well as fetching and lazy configurations
  • the usual bunch of bug fixes

The past few months have been pretty busy preparing this unique feature rich Hibernate bundle with smooth out of the box experience. Enjoy :-)

Hibernate Search: Hibernate Apache Lucene integration

Posted by    |       |    Tagged as Hibernate Search

The most significant part, by far, of Hibernate Annotations 3.2.1 is the complete rewriting and feature expansion of Hibernate Search formerly known as Hibernate Lucene.

Long story short

Hibernate Search allows you to search your domain model (google it) without the hassle and mismatches introduced by the full text technology. Indexing is done automatically, the mapping between the object model and the index documents is described through annotations, the querying capability is integrated with the regular Hibernate querying system. Hibernate Search use Apache Lucene underneath and lowers the barrier to entry to such a technology technology.

What is it all about

In a few words, bringing Google search capabilities to your domain model.

For most people, queries are synonym of SQL query, and this is indeed the case of most applications. There is a spectrum of queries, however, that are not handled by SQL (at least without proprietary extension): free text search, proximity search, phrase search, synonyms, approaching terms, result by relevance... Full-text search engines solve these classes of problems.

Full text search queries involve two steps. Indexing, ie maintain coherence between the database information and the full text index information. Querying, the ability to query in a free form the indexed information.

Integrating such a search capability to a system is not that easy. In most systems, a mismatch exists between the data structure used by the application core and the data structure used by the full text search. For applications using ORM such as Hibernate, the former is mostly designed around the object model, while the latter is designed around the notion of documents containing several fields of strings. Handling this mismatch and maintaining the data synchronized between both part of the system tend to be too tedious for a massive adoption.

Hibernate Search aims to tackle the mismatch complexity for you, and to lower the barrier to entry of full text technology such as Apache Lucene in most applications.

Indexing

Hibernate Search is a glue code between Hibernate and Apache Lucene. Apache Lucene is a fantastic full-text index Java library, and the de facto standard in the open source world. Hibernate Search listen to any changes made to the domain model thanks to the Hibernate event model. All modifications made to your persistent objects will be propagated to the Apache Lucene index transparently. Under the hood, Hibernate Search is optimizing indexing by batching the works. The current implementation queue the work per transaction. Other pluggable implementations will be possible shortly. Finally, you can force indexing of a given set of objects, which is particularly useful when initializing the index. How indexes are organized is pretty much up to you, you can have one Directory (index) per entity type (recommended) or share the same Directory for several entities.

Metadata

Indexing means translating the java object attributes to a (potentially degraded) string representation. The bridge between properties and index fields is driven by annotations metadata and defaulted to a built-in set of bridges. Flexibility is provided by the ability to use a custom bridge (very similar to the notion of /UserType/).

Searching

One of the mismatch is that full text queries in Apache Lucene returns /Documents/ and not regular domain objects. Hibernate Search implements /org.hibernate.Query/ and gives you a unified query model regardless of the query engine (criteria, HQL, SQL, Lucene). In particular, you have access to pagination and all the query APIs like /scroll()/, /list()/ etc... All queries will return managed objects (ie attached to a session), that you will be able to use and modify at will like a regular hibernate managed object (because it /is/ a regular managed object). You can decide to query on all entities or only a subset of them. Like in Hibernate, querying on a subset of entities is polymorphic.

Try it out

Hibernate Search goal is really to lower the barrier to entry of full text engine technology. Apache Lucene is often criticized by beginners for its low level API and inherent complexity. Hibernate Search make it simple to use, removing some of the complexity, but let you access all the power and flexibility of Apache Lucene if you need to. Hibernate Search is part of Hibernate Annotations. Check it out and download it , using it is much simpler than explaining it :-)

FindBugs(tm) actually does it

Posted by    |       |    Tagged as

I heard about FindBugs(tm) while listening to one of the Java Posse podcast. Since Hibernate Annotations and Hibernate EntityManager are very close to their respective final releases, I decided to give it a shot.

FindBugs(tm) is basically a static code analyser that tries to find bugs in your code through some pattern recognition. I have been working in the past with both static and dynamic code analysers and I have been pretty disappointed by their false positive ratios (basically a non-bug considered as a bug), and the complexity of their set up process. FindBugs was a refreshing experience.

The cool stuff about it is, well, there are several cool stuffs:

  • it's a no brainer to set up and run
  • the amount of false positive is surprisingly low
  • it works at the bytecode level, so you don't have to have the source code (more later)

Set up and Run

I haven't read the documentation, just downloaded the package and run the .bat file. A couple of clicks and I was analysing Hibernate Annotations and Hibernate EntityManager . No fancy Ant integration required (you can, but you don't have to), no fancy IDE dependency (you can, but you don't have to), no fancy command line requiring you to RTFM (you can, ahem you should, but you don't have to). The provided GUI does the job pretty smoothly, even if a package filtering feature would have been really cool (more later).

A pretty low false positive ratio

THE thing that usually kills such a product is the amount of false positive bug claims. You end up scanning hundreds of warnings without paying attention to them and trash the whole product after 30 minutes. FindBugs has a pretty low false positive ratio, which is very good. And if the warning end up being a false positive, there are usually some good reasons that worth a second look at your code. I must admit I am pretty proud of me ;-) Of course in HAN and HEM, I found some bugs and suboptimal constructs (no worries, I fixed them), but much less than my expectations.

Work at the bytecode level

That is probably what makes it easy to use (and hard to develop), FindBugs(tm) works at the bytecode level, not the source level (it highlights the source line if the sources are available). So pointing a jar or a directory containing your compiled classes is enough. Actually what I did, was pointing to my project root directory, and the job was done.

So while analysing Hibernate Annotations and Hibernate EntityManager , I ended up analysing a bunch of jars (Oh Filter, where art thou?), and I can tell you some guys out there should take a look at FindBugs(tm) , this include a bunch of JDBC drivers and well known in-memory Database backed by some big company(ies) ;-)

Give it a try

It's free, it's easy to set up, it's going to take two hours of your time and save you much more.

back to top