Red Hat

In Relation To Infinispan

In Relation To Infinispan

Sanne is going to do a virtual JBoss User Group session Tuesday July 14th at 6PM BST / 5PM UTC / 1PM EDT / 10 AM PDT. He is going to talk about Lucene in Java EE.

He will also describe some projects dear to our heart. If you want to know what Hibernate Search, Infinispan bring to the Lucene table and how they use Lucene internally, that’s the event to be in!

Apache Lucene is the de-facto standard open source library for Java developers to implement full-text-search capabilities.

While it’s thriving in its field, it is rarely mentioned in the scope of Java EE development.

In this talk we will see for which features many developers love Lucene, make some concrete examples of common problems it elegantly solves, and see some best practices about using it in a Java EE stack.

Finally we’ll see how some popular OSS projects such as Hibernate ORM (JPA provider), WildFly (Java EE runtime) and Infinispan (in-memory datagrid, JCache implementor) actually provide great Lucene integration capabilities.

If you are interested, get some more info on Meetup and enlist.

JBoss Developer Day in London

Posted by Sanne Grinovero    |       |    Tagged as Events Hibernate Search Infinispan

We'll have a full day dedicated to developers in London the 5th of December. It's a free event with an high concentration of technical people and passionate developers.

Infinispan & Hibernate integrations

I'll introduce you to several different integration strategies to make the most of both Hibernate and Infinispan, but I'll also be available to discuss technical questions for either project.

Many more interesting talks

There are many interesting talks, covering cool subjects like the latest news in BRMS space, JBoss Fuse & Camel, OpenShift, Arquillian and JBoss Tools, JBoss performance tuning, HTML5 built on JBoss... the full agenda can be found here.

Get in touch

As always myself and all my colleagues look forward for interactive sessions and lots of open discussions. Feel free to reach out to discuss anything related to the coolest technologies, in the sessions or after the talks.

If you can make it, please register, help me to advertise the event, and see you there.

Sanne

If you travel in the JBoss universe you should be aware that Red Hat Summit and JUDCon are taking place in Boston between June 10th and June 14th. If you want to meet the persons behind the code, that's a pretty good deal.

Sanne, Emmanuel (that's me), Gavin and Stef are going to give various talks on various Hibernate projects and on Ceylon.

Ceylon workshop

Monday June 10th at JUDCon, Stef, Gavin and I are doing a full day workshop on Ceylon. Come with your laptop and code with us. More info.

Hibernate Search live coding

Tuesday June 11th at JUDCon I will be adding full-text search, faceting and geolocation queries to an application live. I am working hard on this code and not at all on my slides, this should be quite interesting. It is based of the famous TicketMonster app. More info.

Hibernate and Infinispan / JBoss Data Grid

At Red Hat Summit Thursday June 13th, Sanne will talk about the use cases that really benefit from mixing Infinispan / JBoss Data Grid with the Hibernate suite of projects (Hibernate ORM, Hibernate Search and Hibernate OGM). You will see the big work we did to make 1+1=3 with these projects. More info.

Hibernate State of the Union

At Red Hat Summit Wednesday June 12th, I will be giving an overview of what's new in the Hibernate sphere (ORM, Search, OGM, Envers, Validator). A good one to keep your knowledge fresh or if you believe that Hibernate is a JPA compliant ORM and that's it :) More info.

Anyways, feel free to ping us if you are there, we accept chats as long as there is beer, whisky or any kind of good technical content :)

Past week I returned from my trip to Bengaluru, where we had one of our great developers conferences.

JUDCon India 2013

As always at these events the best part was the people attending: a good mix of new users and experts, but all having in common a very healthy curiosity and not intimidated at all, so proposing a terrific amount of questions, discussions and for my long trip home a lot of things to think about.

Presentations

I had the honor to present several topics:

  • Hibernate Search: queries for Hibernate and Infinispan
  • Infinispan in 50 minutes
  • Cross data center replication with Infinispan
  • Measuring performance and capacity planning for Data Grids
  • Participating on the JBoss experts panel

The talk about Hibernate Search was a last minute addition: by shuffling the agenda a bit we could insert the additional subject and given the amount of nice feedback I'm happy we did.

The big denormalization problem

An expert Hibernate Search user asked me what would happen when having a domain model connecting User types to Addresses, when you have many Users and the city name changes. He actually knew what would happen, but was looking for alternatives to compensate for the problem; since Lucene requires denormalization, all User instances in the Lucene index need to be updated, triggering a reload of all Users living in the particular city. Yes that might be a problem! But that is not something happening frequently in model schemas right? I stated that in this example, it would take a city to change name! Well that caused a good amount of laugher as Bangalore just changed it's official name to the old traditional Bengaluru.. so since they where using Hibernate Search and this was an unexpected behaviour when the city changed name - having more than 8 million inhabitants - the public registry had some servers working very hard!

Obviously this needed specific testing and possibly better warnings from out part. Such problems are a natural consequence of denormalization and need to be addressed with ad-hoc solutions; in this case I'd suggest using a synonym and register the two names as same in the context of searching by configuring the Synonym support in the used Analyzer: the city name would need a single record change in the database and no reindexing would be needed.

Hibernate OGM

While I'm part of the OGM team, I had no need to talk about OGM as well because there where other speakers on the subject already. I greatly enjoyed listening to the other presentors, Ramya Subash and Shekhar Gulati: they where extremely well prepared and even with the most complex questions there was no need for me to help out.

To all attending and especially all those I've been talking to, thank you so much it was very interesting and I very much appreciate all the feedback. As always feel free to get more questions flowing on our Hibernate forums or Infinispan forum, and you're all welcome to participate more by sending tests or patches.

Data Grid, Why?

Posted by Strong Liu    |       |    Tagged as Infinispan

NOTE: this post is translated from howtojboss.com and author is Shane K Johnson.

(本文翻译自howtojboss.com, 原文作者为Shane K Johnson.)

为什么需要使用数据网格呢? 本文旨在回到这个问题.

首先, 它是进化的产物.

本地缓存 > 集群缓存 > 分布式缓存(数据网格)

使用分布式缓存的原因中包括了为什么使用缓存集群, 而使用缓存集群的原因中包括了为什么使用本地缓存.(译注: 这句话感觉上真怪.)

性能

访问本地缓存中的一个对象比直接访问远端数据存储引擎(例如数据库)要快很多.

直接访问一个已经存在的对象比从数据创建一个对象要快.

  • 数据可能已经被存储在某(几)个地方了
  • 数据可能需要通过多条查询来被获取
  • 数据可能很复杂

另外, 数据网格支持一些性能调优特性 可能不被集群缓存所支持. 例如, 应用程序可以根据数据之间关联关系的紧密程度来确保相互关联的对象被保存在相同的缓存节点上.

更进一步, JBoss Data Grid还有一些自己所特有的性能调优方法, 例如, 它可以被配置成使用异步通讯并且提供了一个异步API.

一致性

本地缓存只有在应用程序被部署到单一的应用服务器上的时候才有意义, 如果它被部署到了多台应用服务器上的话, 那么本地缓存一点意义都没有, 问题出在过期数据. 集群缓存通过复制和让缓存数据失效来解决这个问题的.

除了支持JTA事物之外, 数据网格还支持XA(分布式)和两阶段提交事物.

最后, JBoss Data Grid还支持额外支持某些其它数据网格产品可能不支持的保证数据一致性的特性, 例如它支持事物处理恢复, 和基于版本号的更新或删除.

可伸缩性

集群缓存和数据网格的区别就在于可伸缩性. 数据网格是可伸缩的. 缓存数据是通过动态的分区被分发的. 结果就是, 增加一个缓存节点即提高了吞吐量也提高了容量.

JBoss Data Grid通过使用一致性Hash算法, 最小化的降低了增加或者删除一个结点所带来的结点(译注: 推荐阅读这篇文章, 或者这篇, 是中文的), 当增加或者删除一个结点的时候, 只有一部分数据被重新移动已达到平衡. 所以, 增加或者删除一个结点只会对数据网格中的一部分结点产生影响, 而别的算法就很可能会影响到数据网格中的所有结点了.

独立性

另外一个集群缓存和数据网格之间的不同点即是否支持独立访问了.

如果把一个数据网格集成进应用程序里面的话, 那么它就和应用程序耦合在一起了, 也就是, 当扩展这个内置的数据网格的时候, 同事也需要扩展应用程序, 结果, 扩展网格的同时, 增加了与之关联的应用程序(和应用服务器)的管理成本.

如下面的例子, 一个web应用被部属到了多个应用服务器, 并且它使用了内置的数据网格系统, 当这个数据网格缓存了足够多的内容的时候, 它就需要被扩容了.

这意味着什么呢?

需要安装并配置一个新的应用服务器, 然后把应用也部属到这个新的应用服务器中.

这意味着?

IT人员需要多管理一个应用服务器, 增加了管理成本.

  • 如果应用被重新部署, 内置的数据网格结点就被重新部署.*
  • 如果数据网格升级, 应用升序也得跟着升级(并且重新部署)

* 一个数据网格结点被重新部署的话, 那么整个数据网格的拓扑结构是会变化两次的: 一次是当结点被移出的时候, 另外一次是结点被部属的时候. 并且, 一旦数据网格拓扑结构发生变化, 网格内的数据会在结点之间被移动已达到平衡的(尽管只是一部分数据). 所以, 重新部属一个应用程序会对其它节点上运行着的数据网格产生影响, 并且是两次.

如果数据网格需要被调整的运行更快而应用不需要呢?

部属应用到额外的应用服务器, 尽管瓶颈并不在应用这里的话, 这样合理么? 考虑了资源利用率了么? 仅仅为了增加数据网格的吞吐量就增加应用服务器, 尽管会出现应用服务器空载, 这样又合理么?

解决方案就是使用独立的数据网格.

如下面的例子. 同样的, 一个web应用被部署到多个应用服务器, 但是, 这里, 它使用一个独立的数据王哥系统. 这时候, 数据网格达到了它所支持的最大容量, 需要被扩容.

这又意味着?

一个新的数据网格结点被安装并且配置, 就这么简单.

这样的架构允许数据网格能够独立于应用服务器而被独立的扩展. 也让数据网格的服务器能够被指派与应用服务器不同的资源. 例如, 一个数据网格结点服务器可能需要更多的内存但是更少的CPU, 相比于应用服务器的服务器.

这样的架构也让数据网格的基础架构能够独立于应用服务器的被惯例和调整.

数据网格能够独立于应用而被升级, 应用的重新部署也不会对数据网格本身产生任何影响.

基础架构

对比在基础架构中作为顶级系统的独立数据网格和内置于其它系统之中的二级数据网格服务.

举个例子, 一个企业有一个应用程序部属于应用服务器集群当中, 并且这个应用程序内置了一个数据网格系统. 接着, 这个企业有…

  • 增加了一个基于ESB的服务, 这个服务内部也内置了一个数据网格系统
  • 增加了一个门户, 部属于Portal平台, 它也内置了数据网格
  • 增加了业务流程服务跑在规则管理系统之上, 同样的, 它也内置了数据网格

能看出问题吧.

现在, 这个企业拥有多个独立的(内置)数据网格需要被管理, 也就增加了管理成本. 如果它们缓存的相同的数据(例如客户信息), 那么, 如同使用本地缓存的应用被部署到多台应用服务器一样, 面临着数据过期的风险. 如果数据被一个数据网格更新了, 那么, 在别的数据网格当中的相同数据也就没有意义了, 并且, 如果都存储的一样的数据的话, 那么数据网格的效率也是个问题, 这样, 只有他们容量总和的一部分是有效被利用的, 如同缓存集群一样, 重复的数据.

解决方案就是使用在基础架构中作为顶级系统的独立数据网格服务.

当然, 使用内置的数据网格服务也有其自身的优点, 可能会也可能不会超过使用独立缓存服务所带来的好处.

最后, 扼要重述, 使用数据网格的好处是它的可扩展性, 和独立性, 并且, 作为顶级基础设施组件, 它能够同时提供本地缓存和集群缓存所能够提供的性能和一致性.

Back from Berlin

Posted by Sanne Grinovero    |       |    Tagged as Events Infinispan

Berlin Buzzwords Barcamp

The main conference was introduced by a barcamp event on Sunday afternoon and night, in a fascinating location!

c-base

The barcamp was at c-base, which I initially had mistaken for a creative design company or an underground disco. Kosch, an Infinispan user and contributor, welcomed me with a nice glass of mead and corrected my blind guess: it actually is a massive space ship being built by hackers in the underground of the city. This place pours with hard core hackers culture, staffs 400 members in this huge place, full of self made droids, LDAP-verifying doors, advanced equipment all over up to self made 3d printers, scanners, arcade video games and of course connections to The Matrix.

meetings and discussions

There were a lot of people from the Apache communities, I have been talking almost all the evening with Lucene developers, but also listened to experiences people had with HBase, Cassandra, MongoDB, Solr, ElasticSearch, and of course our very own Hibernate Search, Infinispan and JBoss AS.

A recurring subject was the need to use multiple of these datastores in a better integrated way, mostly it was about integrating {bigdataX} with Lucene.

SQL vs. NoSQL

So this place was packed with NoSQL zealots. You can imagine this strong pack, excited and a bit drunk too. Perfect timing for some members of the SQL standards to show up! They had some reasonable objections to the NoSQL expression, most notably that all these alternative engines would need, could be standardized in the new revision of the specification. The answer from Chris Harris was hilarious: you're missing the point.

The Berlin Buzzwords 2012 conference

The main conference started with many interesting talks, from the keynote from colleague Leslie Hawthorn, but the buzz of the hallway track continued very strongly for me. I've met amazing people and had interesting chats with a lot of users of our technologies. It was easy to meet a lot of known community members, I've been talking with many users I don't remember the name of, but also with Shay Banon (Elastic Search founder), Grant Ingersoll (Chief Scientist at Lucid Imagination, and well known contributor of several Apache projects), Uwe Schindler (Lucene), Robert Muir (Lucene, also Lucid Imagination), Michael Busch (Lucene, at Twitter), Nick Burch (Apache Tika, Alfresco), Christian Moen (Lucene, creator of the awesome Japanese analyzers), Karel Minarik (Ruby client for Elastic Search), Simon Willnauer (Lucene, and conference organizer), Chris Harris (MongoDB), Martijn van Groningen (JOIN implementations on Lucene) and colleagues Mircea Markus (Infinispan) and Lukáš Vlček (search.jboss.org, Elastic Search).

Updatable fields for Lucene

Andrzej Bialecki had an amazing talk on the codecs coming in Lucene 4, and explained how fields could be made update-able. There are some patches already but there is still lots of work to do, and he is inviting users to help out: LUCENE-3837.

JOINs in Lucene

Martijn van Groningen is working on JOIN functionality in Lucene, it would be very interesting if someone could experiment with support for it in Hibernate Search: such a feature is highly requested and would be very useful for Hibernate OGM too.

How is Infinispan different than key/value store X?

This was a frequent question people had to me. The main point - besides supporting transactions - is that it focuses on in-memory while still preserving high availability. It's a good idea to use it together with {your favourite other store here} for disk persistence. Why? Our tests just breached the one million operations/sec, and there is still much we can improve...

The people there

The conference was great, as it somehow managed to keep marketing low and keep the spotlight on the developers, the people, and the stuff that really matters. An example of this was that most talks were in 20 minutes slots, forcing speakers to focus very strictly on the juicy aspects and leave everything else out for face to face discussions in the halls. That worked amazingly well for me. I'm glad for all the chats I had with everyone, so thank you all!

Berlin Buzzwords coming soon

I'll be at Berlin Buzzwords 2012, to meet with the awesome community of people interested in scalable search, NoSQL and bigdata in the cloud.

Infinispan Lucene Directory

Of particular interest to Hibernate Search and Infinispan Query users and contributors, I've been given the opportunity to talk about the Infinispan Lucene Directory we built as an extension to the Infinispan project: the capability to store and efficiently replicate Lucene indexes in the Infinispan grid. Of course, this Directory implementation doesn't depend on Hibernate Search or Infinispan Query and can be used to solve the reliable replication problem with Lucene indexes in any other application using Lucene. In fact its development was initially sponsored by Sourcesense to replicate JIRA instances and is now evolving in the Infinispan project as a high performance alternative to the traditional Directory implementations.. for more details come to my talk or come talking to me at any time.

The conference

It's the first time for me to go to Berlin Buzzwords, but I've heard excellent feedback from the past editions so I'm really looking forward it: the program is full of amazing titles, and many interesting speakers and no doubt attendees to talk with.

About other JBoss people going, you might meet Mircea Markus from the Infinispan team and Lukáš Vlček our ElasticSearch expert and the man behind our new search.jboss.org.

Looking forward to meet you all there!

Hibernate Search version 4.1.0.Beta1 was tagged; the most essential change compared to January's release 4.1.0.Alpha1 was HSEARCH-1034, made to allow Infinispan Query to use the fluent Programmatic Mapping API as already available to Hibernate users.

More changes are being developed: stay tuned for new MassIndexer improvements, some new performance improving tricks, and a fierce discussion is going on to provide a new pragmatic way to define index mappings starting from the Query use cases.

Integrations with Infinispan

The Infinispan project released a new milestone version 5.1.1.FINAL, which is relevant to Hibernate Search users in many ways:

  • Hibernate Search can use Infinispan to distribute the index among several clustered nodes.
  • JBoss AS 7.1 will use this version as the fundamental clustering technology.
  • Hibernate OGM can map JPA entities to Infinispan instead of a database, and use Hibernate Search as query engine and replicate the indexes storing them in Infinispan.
  • Infinispan Query uses the Hibernate Search Engine component to make it possible to search across the values stored in Infinispan. All you need to do is add the dependency to infinispan-query, enable indexing in the configuration and either annotate the objects you store in the grid like you would do with Hibernate Search entitites, or define the mappings using the programmatic API.

More details on Infinispan Query can be found in the Infinispan reference, but if you're familiar with Hibernate Search there's not much to learn as they share most features and configuration options as defined on the Hibernate Search reference manual.

JavaOne, JUDCon and Devoxx 2011

Posted by Pete Muir    |       |    Tagged as CDI Events Infinispan

This autumn I'm speaking at JavaOne (2nd - 6th October in San Francisco), JUDCon London (31st October, 1st November) and Devoxx (14th - 18th November).

JavaOne
  • Introducing Contexts and Dependency Injection 1.1 - technical session in which I'll overview some of the changes coming in CDI 1.1
  • CDI Today and Tomorrow - panel session on CDI with David Blevins, Arun Gupta, Sivakumar Thyagarajan and Reza Rahman
  • Making Java EE Cloud-Friendly: JSR 347, Data Grids for the Java Platform - BOF with Manik Surtani
JUDCon London
  • Java EE in the Cloud - a technical session in which I'll show you how to use Java EE in the cloud, using Red Hat's OpenShift Platform-as-a-Service
  • Using Infinispan as a remote data store - a technical session with Galder Zamarreño in which we'll show you how to use Infinispan as a remote data store on Red Hat's OpenShift Platform-as-a-Service, with a client app written using CDI.

JUDCon is the official JBoss Users and Developers Conference, and is great value at £100 for a day - so if you near London, I recommend registering today!

Devoxx

Enjoy!

Hibernate Search 3.4.0.CR2

Posted by Sanne Grinovero    |       |    Tagged as Hibernate Search Infinispan

We decided to insert another candidate release in the roadmap for two improvements which where too good to leave out

  • Lucene 3.1
  • Smart object graph analysis to skip expensive operations

As usual download links are here, as are instructions for Maven users. In case you spot some issue, the issue tracker didn't move either, or use the forums for questions.

using Apache Lucene 3.1

Finally released, we've been waiting long for it so that in just a week we where able to provide you with a compatible version of Hibernate Search.

As it seems the usual business with Lucene, many APIs changed. The good news is that it seems Hibernate Search was able to shield users from all breaking changes: code-wise, it's a drop-in replacement to previous versions.

Some things to consider during the migration:

  • It's possible that some Analyzers from Lucene and Solr extensions where moved around to other jars, but if you're depending to hibernate-search-analyzers via Maven, again it looks like you shouldn't need to change anything.
  • The max_field_length option is not meaningful anymore, see the docs on how to implement something similar if needed.
  • Hibernate Search 3.4.0.CR2 actually requires Lucene 3.1

more performance optimizations

Besides the nice boost inherited from the updated Lucene, our internal engine also got smarter.

It figures possible work to skip in the objects graph, being now much better when reacting to collections update events. See HSEARCH-679 for the hairy details, and many thanks to Tom Waterhouse for a complex functional test and the hard work of convincing me of the importance of this improvement.

Infinispan integration

There are several interactions between Hibernate Search and Infinispan, above the most obvious usage of Infinispan as a second level cache you can also:

Cluster indexes via Lucene

Nothing changed in our code, just a reminder that it's possible to replicate or distribute the Lucene indexes on an Infinispan grid, and it is compatible with both Infinispan 4.2.1.FINAL and with 5.0.0.BETA1

Infinispan Query

In Infinispan 5 the query engine is Hibernate Search itself: the integration just got much better, making it easier to use and exposing all features from latest Search versions, including for example Faceting and clustering via Infinispan itself. More improvements coming, especially documentation.

join us for JUDCon!

I'm going to talk about this integration at JUDCon 2011, in Boston, May 2-3 during the talk Advanced Queries on the Infinispan Data Grid, see you there!

back to top