Red Hat

In Relation To Hibernate OGM

In Relation To Hibernate OGM

Updated OGM kitchensink example

Posted by    |       |    Tagged as Hibernate OGM

Last year I have been giving an introduction to OGM and OpenShift at JUDCon London using a modified version of the kitchensink quickstart. Time has gone and it was time to give the demo a revamp.

Thanks to Sanne the code got updated to use the latest app server (AS 7.1.1.Final Brontes) and naturally the latest OGM and Search versions. Another change is that now Infinispan is not only used to persist the entity data via OGM, but it is also used for storing the Lucene indexes via the Infinispan directory provider (see persistence.xml of the example project). Sanne added also a bunch of new tests show casing Arquillian and different ways to bundle the application. Definitely worth checking out!

Personally, I had a look at the project setup and made some changes there. The original demo assumed you had a local app server installation as a prerequisite on your machine. It then used the jboss-as-maven-plugin to deploy the webapp. Unfortunately, this plugin does not allow me to start and stop the server and it seems redundant to require a local install if the Arquillian tests already download an AS instance (yes, I could run the test against the local instance as well, but think for example continuous integration where I want to manage/control the WHOLE ENVIRONMENT).

In the end I decided to give the cargo plugin another go. A lot has happened there and it supports not only JBoss 7.x, but it also offers a so called artifact installer which allows to download the app server as a managed maven dependency. The relevant pom.xml settings look like this:

          ...
          <plugin>
                <groupId>org.codehaus.cargo</groupId>
                <artifactId>cargo-maven2-plugin</artifactId>
                <version>1.2.1</version>
                <configuration>
                    <container>
                        <containerId>jboss71x</containerId>
                        <artifactInstaller>
                            <groupId>org.jboss.as</groupId>
                            <artifactId>jboss-as-dist</artifactId>
                            <version>7.1.1.Final</version>
                        </artifactInstaller>
                    </container>
                </configuration>
                <executions>
                    <execution>
                        <id>install-container</id>
                        <phase>initialize</phase>
                        <goals>
                            <goal>install</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.codehaus.gmaven</groupId>
                <artifactId>gmaven-plugin</artifactId>
                <version>1.3</version>
                <executions>
                    <execution>
                        <id>copy-modules</id>
                        <phase>initialize</phase>
                        <goals>
                            <goal>execute</goal>
                        </goals>
                        <configuration>
                            <source>
                                def toDir = new File(project.properties['jbossTargetDir'], 'modules')
                                def fromDir = new File(project.basedir, '.openshift/config/modules')
                                log.info('Copying OGM module from ' + fromDir + ' to ' + toDir)
                                ant.copy(todir: toDir) {
                                fileset(dir: fromDir) {
                                exclude(name: 'README')
                                }
                                }
                            </source>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            ...|
 

I am using cargo:install in the initialize phase to install the app server into the target directory. This way I can install a custom module (via the gmaven plugin) before the tests get executed and/or before I start the application via a simple:

    $ mvn package cargo:run

Neat, right?

You find all the code on GitHub in the ogm-kitchensink. The README gives more information about the main maven goals.

Enjoy, Hardy

The OpenBlend conference in Ljubljana, Slovenia will be held the 15th September in the fabulous setting of the Ljubljana Castle.

Since it was incredibly complex to plan my travel to get there, I'll make it worth the effort by having two talks:

  1. Introducing Hibernate OGM: porting JPA applications to NoSQL
  2. Introduction to Byteman and The Jokre

Both projects are very young, in fact I think this is going to be the first time we reveal (1) the Jokre - a very innovative optimisation engine - and Hibernate OGM is definitely a hot topic.

I also look forward to see the other talks of the day, meet team mates such as Bela Ban from JGroups and Infinispan, Adam Warski the creator of Hibernate Envers (but presenting Torquebox & CDI), Aleš Justin the Weld lead and master of the conference, and everyone else meeting there: above all, it's always nice to hear what people do or would like to do with the tools we build, and meeting more people willing to join the open source effort.

1- please don't cheat by downloading the source code yet: it's pointless, you won't understand it. If you do, please add some comments to the code.

This september in Paris, we are lucky to host both the OpenWorldForum and Open Source Developer Conference. And they had the common sense to host them at the same time in the same place. It's September 22th-24th in Paris. I believe it's free but requires registration.

I will personally be there and put to work:

  • I will be presenting Hibernate OGM on Friday afternoon at OSDC
  • I will be part of a panel on Code and Cloud at the OpenWorldForum
  • Vittorio Amos Ziparo a co-member of the Cloud-TM initiative will be talking about how he uses Infinispan to store data for a MMOG

There are other Red Hat folks put to work:

  • Jim Meyering : ‎Goodbye World! The perils of relying on output streams in C
  • Christophe Fergeau : ‎SPICE, affichage distant de machines virtuelles

If you are around Paris at that time and do / use Open Source software, you know what to do.

As every month, we're having a JBoss user group meeting in Newcastle. Next Tuesday 12th July at 6pm I'll be presenting

Hibernate Search and OGM: taking advantage of NoSQL leaving out the complexity

and as always I'm looking forward to talk to all attendees: we'll follow up with free drinks and food to discuss about anything: open questions, suggestions, opportunities to tell your use cases in person, meet other developers.

Discussing code changes is also an option, so don't miss this opportunity to make sure your favourite tools are better fit for your needs!

full abstract and venue details here

Reminder: the location is in a highly secured building, if you arrive late make sure to get in touch so that someone can open you the door.

For more cool developer oriented events in Newcastle, see the JBug homepage

Hibernate OGM: birth announcement

Posted by    |       |    Tagged as Events Hibernate OGM

This is a pretty exciting moment, the first public alpha release of a brand new project: Hibernate OGM. Hibernate OGM stands for Object Grid Mapping and its goal is to offer a full-fledged JPA engine storing data into NoSQL stores. This is a rather long blog entry so I've split it into distinct sections from goals to technical detail to future.

Note that I say it's the first public alpha because the JBoss World Keynote 2011 was powered by Hibernate OGM Alpha 1. Yes it was 100% live and nothing was faked, No it did not crash :) Sanne explained in more detail how we used Hibernate OGM in the demo. This blog entry is about Hibernate OGM itself and how it works.

Why Hibernate OGM

One of the objectives of the Infinispan team is to offer a higher level API for object manipulation. One thing leading to another, we have experimented with JPA and Hibernate Core's engine to see if this could fit the bill: looks like it does. The vision has quickly expanded though and Hibernate OGM is now an independent project aiming to offer JPA on top of other NoSQL stores too.

At JBoss, we strongly believe that provided tools become available, developers, applications and whole corporations will exploit new data usage patterns and create value out of it. We want to speed up this adoption / experimentation and bring it to the masses. NoSQL is like sex for teenagers: everybody is talking about it but few have actually gone for it. It's not really surprising, NoSQL solutions are inherently complex, extremely diverse and come with quite different strengths and weaknesses: going for it implies a huge investment (in time if not money). (One of) JBoss's goal is to help lower the barrier of entry and Hibernate OGM is right inline with this idea.

We want to simplify the programmatic model of various NoSQL approaches by offering a familiar one to many Java developers: JPA. The good thing about a familiar and common programmatic model is that it's easy to try out one data backend and move to another down the road without affecting dramatically the application design.

So when should you use Hibernate OGM? We are not claiming (that would be foolish) that all NoSQL use cases can and will be addressed by JPA and Hibernate OGM. However, if you build a domain model centric application, this will be a useful tool for you. We will also expand Hibernate OGM feature set and use cases to address different areas (like fronting a RDBMS with a NoSQL store). Besides, Hibernate OGM is not an all or nothing tool. You can very well use it for some of the datastore interactions and fall back to your datastore native API for more advanced features or custom query mechanism. No lock-in here.

How does it work

Basically, we are reusing the very mature Hibernate Core engine and trick it into storing data into a grid :) We plug into Hibernate Core via two main contracts named Loader and Persister. As you could expect, these load and persist/update/delete entities and collections.

Early in the project, we have decided to keep as much of the goodies of the relational model as possible:

  • store entities as tuples
  • keep the notion of primary key and foreign key (foreign keys violation are not enforced though - yet)
  • keep an (optional) indirection layer between the application data model and the datastore data model
  • limit ourselves to core data types to maximize portability

Entities are stored as tuples which essentially is a Map<String,Object> where the key is the column name and the value is the column value. In most cases, the property is mapped to a column but this can be de-correlated (@Column). An entity tuple is accessible via a single key lookup. Associations as a bit trickier because unlike RDBMs, many NoSQL stores and Grid specifically do not have build-in query engines and thus cannot query for associated data easily. Hibernate OGM stores navigational information to go from a given entity to its association information. This is achieved by a single key lookup. The drawback here is that writing requires several key lookup / update operations.

Here is a schema representing the object model, the relational model and finally how data is stored in the data grid.

All this means that we do not store entity objects in the grid but rather a dehydrated version of them. And this is a good thing. Blindly serializing entities would have led to many issues including:

  • When entities are pointing to other entities, are you storing the whole graph?
  • How do you guarantee object identity or even consistency amongst duplicated objects?
  • What happens in case of class schema change?
  • How do you query your data?
  • How to replicate class definitions to other nodes?

Unless you store relatively simple and short lived data, never go for blind serialization. Hibernate OGM does this smart dehydration for you.

For more information on how we did things, please have a look at the architecture section of Hibernate OGM's reference documentation.

What is supported today and what about tomorrow?

Today Hibernate OGM is quite stable for everything CRUD (Create, Read, Update, Delete) and of course extremely stable for all the JPA object lifecycle rules as we simply reuse Hibernate Core's engine. Here is an excerpt on the things supported but for the full detail, please go check the reference documentation that list what is not supported.

  • CRUD operations for entities
  • properties with simple (JDK) types
  • embeddable objects
  • entity hierarchy
  • identifier generators (TABLE and all in-memory based generators today)
  • optimistic locking
  • @ManyToOne, @OneToOne, @OneToMany and @ManyToMany associations
  • bi-directional associations
  • Set, List and Map support for collections
  • most Hibernate native APIs (like Session) and JPA APIs (like EntityManager)
  • same bootstrap model found in JPA or Hibernate Core: in JPA, set <provider> to org.hibernate.ogm.jpa.HibernateOgmPersistence and you're good to go

Since Hibernate OGM is a JPA implementation, you simply configure and use it like you would use Hibernate Core and you map your entities with the JPA annotations. Same thing, nothing new here. Check our the getting started section in our reference documentation.

What we do not support today is JP-QL. Support for simple-ish queries is due very soon and will be based on Hibernate Search indexing and querying capabilities. However, you can already use Hibernate Search directly with Hibernate OGM. You can even store your index in Infinispan, in Voldemort or in Cassandra (and others).

Also, the NoSQL solution initially supported is Infinispan. The reasons behind it are quite simple:

  • we had to start with one
  • key/value is a simple model
  • we can kick some JBoss employees in the nuts if we find a bug or need a feature
  • we have a very good in-house knowledge about Infinispan's isolation and transaction model and it happens to be quite close to the relational model

That being said, we do intend to support other NoSQL engines especially other key/value engines very soon. We have some abstractions in place already so they do need polishing. If you know a NoSQL solution and want to write a dialect for Hibernate OGM, please come talk to us and contribute.

Future

We are at the very beginning of the project and the future is literally being shaped. Here are a few thing we have in plan:

  • support for other key/value pair systems
  • support for other NoSQL engine
  • declarative denormalization: we have focused on feature so far, performance check and association denormalization is planned)
  • support for complex JP-QL queries including to-many joins and aggregation
  • fronting existing JPA applications

This is a community driven effort, if one of these areas interest you or if you have other ideas, let us know!

How to download and come contribute!

You can download the jars on JBoss.org's Maven repository or of course download the distribution (I'd recommend you download the distribution as it contains a consistent documentation with the release you wish to use).

Speaking of documentation, we tried to initially release with a fairly advanced reference documentation. In particular, the getting started and the architecture sections are quite complete: check it out.

Last but not least, Hibernate OGM is in its infancy, if you're interested in contributing or have ideas on how things should be done, let us know and speak up!

Many thanks to the team, to external contributors as well as the Cloud-TM project members for their support and tests.

Emmanuel and the team.

Visualizing data structures is not easy, and I'm confident that a great deal of success of the exceptionally well received demo we presented at the JBoss World 2011 keynote originated from the nice web UIs projected on the multiple big screens. These web applications were effectively visualizing the tweets flowing, the voted hashtags highlighted in the tagcloud, and the animated Infinispan grid while the nodes were dancing on an ideal hashweel visualizing the data distribution among the nodes.

So I bet that everybody in the room got a clear picture of the fact that the data was stored in Infinispan, and by live unplugging a random server everybody could see the data reorganize itself, making it seem a simple and natural way to process huge amounts of data. Not all technical details were explained, so in this and the following post we're going to detail what you did not see: how was the data stored, how could Drools filter the data, how could all visualizations load the grid stored data, and still be developed in record time?

JPA on a Grid

All those applications were simply using JPA: Java Persistence API. Think about the name: it's clearly meant to address all persistence needs of Java applications; in fact while it's traditionally associated with JDBC databases, we have just shown it's not necessarily limited to those databases: our applications were running an early preview of the Hibernate Object/Grid Mapper: Hibernate OGM, and the simple JPA model was mapped to Infinispan, a key/value store.

Collecting requirements

The initial plan didn't include Hibernate OGM, as it was very experimental yet, it was never released nor even tagged before, but it was clear that we wanted to use Infinispan: to store and to search the tweets. Kevin Conner was the architect envisioning the technical needs, who managed to push each involved developer to do its part, and finally assembled it all into a working application in record time; so he came to Emmanuel and me with a simple list of requirements:

  • we want to showcase Infinispan
  • we want to store Tweets, many of them, in real time coming in from a live stream from Twitter
  • we need to be able to iterate them all in time order, to rollback the stream and re-process it again (as you can see in the demo recording, we had a fake cheater and want to apply stricter rules to filter out invalid tweets at a second stage, without loosing the originally collected tweets).
  • we need to know which projects are voted the most: people are going to express preferences via hashtags in their tweets
  • we want to know who's voting the most
  • it has to perform well, potentially on a big set of data

Using Lucene

So, to implement these search requirements, you have to consider that being Infinispan a key/value store, performing queries is not as natural as you would do on a database. Infinispan currently provides two main approaches: to use the Query module or to define some simple Map/Reduce tasks.

Also, consider those requirements. Using SQL, how were we going to count all tweets containing a specific hashtag, extract this count for all collected hashtags, and sort them by frequency? On a relational database, that would have been a very inefficient query which involves at least a full table scan, possibly a scan per hashtag, and it would have required a prior list of hashtags to look for. We wanted to extract the most frequently mentioned tags, we didn't really know what to look for as people were free to vote for anything.

A totally different approach is to use an inverted index: every time you save a tweet, you tokenize it, extract all terms and so keep a list of terms with pointers to the containing tweets, and store the frequency as well. That's exactly how full-text search engines like Lucene work; in addition to that Lucene is able to apply state-of-the-art optimizations, caches and filtering capabilities. Both our Infinispan Query and Hibernate Search provide nice and easy integrations with Lucene (they actually share the same engine, one geared towards Infinispan users and one to Hibernate and JPA users).

To count for who voted the most is a problem which is technically comparable to counting for term frequencies, so again Lucene would be perfect. Sorting all data on a timestamp is not a good reason to introduce Lucene, but still it's able to do that pretty well too, so Lucene would indeed solve all query needs for this application.

Hibernate OGM with Hibernate Search

So Infinispan Query could have been a good choice. But we opted for Hibernate OGM with Search as they would provide the same indexing features, but on top of that using a nice JPA interface. Also I have to admit that Hibernate OGM was initially discarded as it was lacking an HQL query parser: my fault, being late with implementing it, but in this case it was not a problem as all queries we needed were better solved using the full text queries, which are not defined via HQL.

Model

So how does our model look like? Very simple, it's a single JPA entity, enhanced with some Hibernate Search annotations.

@Indexed(index = "tweets")
@Analyzer(definition = "english")
@AnalyzerDef(name = "english",
	tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
	filters = {
		@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
		@TokenFilterDef(factory = LowerCaseFilterFactory.class),
		@TokenFilterDef(factory = StopFilterFactory.class, params = {
				@Parameter(name = "words", value = "stoplist.properties"),
				@Parameter(name = "resource_charset", value = "UTF-8"),
				@Parameter(name = "ignoreCase", value = "true")
		})
})
@Entity
public class Tweet {
	
	private String id;
	private String message = "";
	private String sender = "";
	private long timestamp = 0L;
	
	public Tweet() {}
	
	public Tweet(String message, String sender, long timestamp) {
		this.message = message;
		this.sender = sender;
		this.timestamp = timestamp;
	}

	@Id
	@GeneratedValue(generator = "uuid")
	@GenericGenerator(name = "uuid", strategy = "uuid2")
	public String getId() { return id; }
	public void setId(String id) { this.id = id; }

	@Field
	public String getMessage() { return message; }
	public void setMessage(String message) { this.message = message; }

	@Field(index=Index.UN_TOKENIZED)
	public String getSender() { return sender; }
	public void setSender(String sender) { this.sender = sender; }

	@Field
	@NumericField
	public long getTimestamp() { return timestamp; }
	public void setTimestamp(long timestamp) { this.timestamp = timestamp; }

}

Note the uuid generator for the identifier: that's currently the most efficient one to use in a distributed environment. On top of the standard @Entity, @Indexed enables the Lucene indexing engine, the @AnalyzerDef and Analyzer specifies the text cleanup we want to apply to the indexed tweets, @Field selects the property to be indexed, @NumericField makes sure the numeric sort will be performed efficiently treating the indexed value really as a number and not as an additional keyword: always remember that Lucene is focused on natural language matching.

Example queries

This is going to look like a bit verbose as I'm expanding all functions for clarity:

public List<Tweet> allTweetsSortedByTime() {

	//this is needed only once but we want to show it in this context:
	QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity( Tweet.class ).get();

	//Define a Lucene query which is going to return all tweets:
	Query query = queryBuilder.all().createQuery();

	//Make a JPA Query out of it:
	FullTextQuery fullTextQuery = fullTextEntityManager.createFullTextQuery( query );

	//Currently needed to have Hibernate Search work with OGM:
	fullTextQuery.initializeObjectsWith( ObjectLookupMethod.PERSISTENCE_CONTEXT, DatabaseRetrievalMethod.FIND_BY_ID );

	//Specify the desired sort:
	fullTextQuery.setSort( new Sort( new SortField( "timestamp", SortField.LONG ) ) );

	//Run the query (or alternatively open a scrollable result):
	return fullTextQuery.getResultList();
}

Download it

To see the full example, I pushed a complete Maven project to github. It includes a test of all queries and all project details needed to get it running, such as Infinispan and JGroups configurations, the persistence.xml needed to enable HibernateOGM.

Please clone it to start playing with OGM: https://github.com/Sanne/tweets-ogm

And see you on IRC, the Hibernate Search forums, or the brand new Hibernate OGM forums for any clarification.

How are entities persisted in the grid?

Emmanuel is going to blog about that soon, keep an eye on the blog! Update: blog on OGM published

back to top