In this post, I’d like you to meet Jonathan Bregler, a software developer at SAP working on the SAP HANA database.
Hi, Jonathan. Would you like to introduce yourself?
Hi Vlad. Thanks for having me. My name is Jonathan Bregler. I’m a software developer working at SAP in Walldorf, Germany.
I started out as a Java developer working on projects with various open source components, such as Spring, Tomcat, OSGi, CXF, and of course Hibernate. I’ve always been interested in databases and how to use them efficiently, which is why I’ve been fascinated by Hibernate from the very beginning.
Since I wanted to know more about the internal workings of databases and get some hands-on experience, I joined SAP as part of their SAP HANA development team. Over the past few years, I’ve been working on several aspects of the SAP HANA database, such as software logistics, porting the database to IBM PowerPC, and the SAP HANA Deployment Infrastructure.
Last year, I got the opportunity to help improve the support for SAP HANA in popular Java open source frameworks such as Hibernate. Given my background, this opportunity seemed tailor-made for me, so I took it and never looked back.
Can you tell us how SAP HANA works and how it differs from other database systems?
SAP HANA differs from most other databases in that it’s an in-memory columnar database. This unique design enables the processing of transactional and analytical workloads at the same time, thus making SAP HANA a true translytical data platform. Here are some benefits that SAP HANA offers.
In-memory, columnar, massively parallel database processing
The SAP HANA platform permits online transaction processing and online analytical processing workloads using a single data copy on a single platform.
It stores data in high-speed memory, organizes it in columns, and partitions and distributes it among multiple servers. This delivers faster queries that aggregate data more efficiently while avoiding costly full-table scans, materialized views, and analytic indexes.
Data modeling and stored procedures
SAP HANA offers a native language called SQLScript that lets you build stored procedures and use advanced capabilities to build complex logic that runs inside the database.
It includes a business function library with built-in parameter-driven financial functions. In addition, it also includes a framework that lets you build custom algorithms and run them securely inside the database. Core data services, graphical calculation views, and decision tables further simplify and accelerate the creation of database logic.
The use of open standards lets you exchange spatial information with third-party spatial solutions to develop enterprise-wide location intelligence.
SAP HANA includes base maps with political boundaries and points of interest to accelerate development of modern location-aware business applications. Third-party spatial solutions can also use SAP HANA as a high-performance, in-memory data store for managing and processing spatial data.
SAP HANA lets you store and process highly connected data using a dynamic data model called property graph. Storing and querying graph data is supported through SQL and the openCypher project’s Cypher query language.
A graph provides full transactional consistency and guaranteed ACID compliance without replicating live transaction data. Native graph algorithms are provided to uncover relationships in your data in real time.
You can also combine graph data processing with additional advanced analytical processing functionality in SAP HANA, such as text analytics, predictive, and spatial. The graph viewer delivered as part of SAP HANA helps you visualize and explore graph data.
Predictive analytics and machine learning
Predictive analysis with SAP HANA includes native high-performance predictive algorithms for both expert and automated modes.
Additionally, you can run open-source R scripts on SAP HANA through integration with R Server and build machine learning applications with integration to TensorFlow.
Some of the predictive algorithms run on streaming, spatial, and series data and are self-improving.
You can use SQL to locate text quickly across multiple columns and binary files, such as Adobe PDF files, HTML, RTF, MSG, Microsoft Office documents, and flat text files. SAP HANA lets you run both full-text and advanced fuzzy searches for 32 languages.
Text analytics in SAP HANA include advanced natural-language processing and entity extraction capabilities, such as segmentation, stemming, tagging, and sentiment analysis.
SAP HANA also extracts what are often referred to as triples – sequences of subject, verb, and object. These functionalities help extract meaning from unstructured data and transform it into structured data for analysis. SAP HANA also supports text-mining algorithms to mine relevant keywords in a body of documents.
You can capture and process streams of events from many sources in real time using the highly scalable smart data streaming engine inside SAP HANA.
SAP HANA supports an SQL-like processing language to combine streams with contextual data and analyze the result on the fly.
To improve scalability, SAP HANA comes with a streaming-light component you can deploy on the streaming data source to analyze and filter streams before they reach SAP HANA.
Data from the Internet of Things and data from sensors arrive in a time-series format.
SAP HANA processes time-series data and other kinds of series data efficiently to discover trends over a period.
SAP HANA lets you build enterprise-class non-SQL (NoSQL) applications with the support to store schema-flexible data in JSON format. You can combine JSON data with structured data and query or analyze it using SQL.
Data integration, replication, and quality
SAP HANA supports comprehensive features to handle all data integration scenarios. These include real-time data replication as well as bulk-load processing, data transformation, cleansing services, and data enrichment services.
Adapters are available for loading data from several databases, cloud sources, and Apache Hadoop, along with a custom software development kit for building your own adapters.
Apache Hadoop and Apache Spark integration
SAP HANA provides multiple options to analyze Apache Hadoop data, including the SAP Vora engine, SAP Cloud Platform Big Data Services, an Apache Spark adapter, and Apache Hive.
You can access data in the Hadoop distributed file system and access MapReduce functions as data sources in SQL using user-defined virtual functions.
Free developer version
SAP HANA, express edition, a free developer version of SAP HANA, is available to get started with developing applications on SAP HANA.
The express edition is limited to 32GB of memory, but apart from that and a few enterprise features, it offers the full range of functions available in SAP HANA. SAP HANA, express edition can be run on-premise by installing the binaries directly, by running a virtual machine containing the binaries, or by running a Docker image.
SAP HANA, express edition is also available in the cloud via the SAP Cloud Appliance Library (AWS r Microsoft Azure), the Google Cloud Platform, or the Microsoft Azure Marketplace.
You’ve been providing many improvements to the SAP HANA Hibernate Dialect. Is Hibernate a good fit when working with SAP HANA?
Yes, Hibernate works really well with SAP HANA. Over the course of my research, I haven’t found a project that’s as popular as Hibernate and provides the same range of features. Furthermore, these features each complement specific SAP HANA capabilities.
First of all, the core dialect provides the necessary flexibility to deal with the unavoidable differences between databases regarding the SQL syntax.
On top of that, Hibernate offers interesting extensions that work really well with SAP HANA. One such example is the Hibernate Spatial dialect that can be used to process geospatial data leveraging SAP HANA’s advanced geospatial engine. SAP HANA’s text search and analysis features can be consumed via the Hibernate Search project.
And when working with SAP HANA’s NoSQL capabilities, there is the Hibernate OGM project that can be extended to work with the SAP HANA graph engine and the SAP HANA document store.
In my opinion, Hibernate is a perfect fit for building advanced transactional and analytical applications on SAP HANA because it provides interfaces for the most important SAP HANA capabilities thereby allowing the developers to consume these capabilities from within their familiar domain without having to care too much about the database specifics.
Since you’ve been fixing various issues in Hibernate ORM, what was your experience of contributing to the Hibernate ORM project? Is it easy for other developers to start contributing?
My experience contributing to the ORM project has been very positive. The contribution process is simple and clearly described on the Hibernate website and in the GitHub repository.
I especially like the fact that there is only a minimum amount of licensing overhead to consider, so you can get started without consulting a lawyer.
The contribution guidelines are clear, and there are resources available that describe how to get started with popular IDEs complete with code templates and formatters, thus making it easy to stick to the coding conventions.
The Hibernate team regularly checks for new GitHub pull requests, reviews them, and provides helpful feedback if necessary.
We always value feedback from our community, so can you tell us what features you’d like us to add to make easier for other data access frameworks to integrate with JPA or Hibernate?
Overall, I’m very happy with the features provided by Hibernate. I’ve been able to implement all the features I needed one way or another. So really, I can’t complain, but since you asked…
One thing that I’ve come across while implementing the Hibernate dialect for SAP HANA is that sometimes it would be helpful to have a way of adapting the SQL query generation that’s being done by Hibernate to add database-specific syntax. As far as I’ve understood, reworking the SQL query generation is a major topic for Hibernate 6, so I’m looking forward to that.
Another interesting feature would be the ability to easily influence not only the SQL query generation but also the schema generation. That way, even the database schema could be tailored to the specific database, possibly leveraging database-specific features not included in standard SQL.
Thank you, Jonathan, for taking your time. It is a great honor to have you here. To reach Jonathan, you can follow him on GitHub.