SAP HANA is a platform that offers new levels of data modeling that exceeds what's possible with traditional relational database management systems (RDBMS). But it requires data to be handled in more sophisticated ways to achieve maximum performance.
In SAP HANA, data is still stored in tables, but how one designs the data storage model differs greatly compared with what's required for a traditional database. Data is compressed better and reads are accomplished faster when it is stored in columnar tables. To take full advantage of this structure, the data model has to be much flatter than in a traditional row-based RDBMS, for two reasons.
First, redundant data is not nearly as much of an issue as it is in row-based tables: The columnar tables store repeating values only once by providing pointers to reference the duplicate data. Also, when data isn't flattened into one table and spread or normalized across multiple tables in SAP HANA, the cost of joins can grow. SAP HANA has row- and column-based engines in which different types of processing occur, and join costs arise when data must physically move from one engine to be processed in another. So, it is beneficial to keep the data processing in one engine if possible, and preferably that engine in SAP HANA is the column engine.
More on SAP HANA
One SAP TechEd attendee talks about his company's interest in SAP HANA
Read about the growing integration between Hadoop and HANA
Join cost is not an issue in a traditional database thanks to indexing and normalization, but it can be significant in SAP HANA when queries violate the functions supported in columnar storage structures. If processing is unsupported in the column-based, in-memory engine, the result set must be physically moved to the row engine. The performance hit from moving the data in memory can be significant. So, in SAP HANA there is still a need to model the data in the provisioning stage. It's just much different from the data model you would design for a traditional RDBMS.
After the data is modeled and provisioned, or loaded, companies must begin to deal with metadata. This journey begins with modeling the "attribute" and "analytic" views that are based on the provisioned data in the base column-store tables. These views operate much like traditional database views. Attribute views are designed to give the master data some context: They provide meaningful values, such as descriptions for ID columns, or names instead of the actual ID values or names.
Analytic views are where calculations and aggregations come into play. Both attribute views and analytic views will be the building blocks to finally create "calculation" views. Calculation views combine and extend both analytic views and aggregate views as a composite or intersection of the meaningful description in an attribute view while exploiting the calculations present in an aggregate view. This metadata-driven model of runtime calculations in memory is where SAP HANA really proves most valuable, because this metadata layer often removes the need to persist data at any further level beyond the original provisioned tables.
While modeling in SAP HANA often just refers to jumping straight to the calculation, analytic or attribute views, it cannot be overstated that provisioning the data properly in memory is a crucial exercise. Data must always conform to the needs of the storage structures so the means available from the database or platform can be most effectively exploited. SAP HANA is no different in this case. While data models could be directly ported from a star schema in a traditional RDBMS, there are many rewards from first examining, then designing, a proper base model for provisioning data in SAP HANA.
How does SAP HANA handle data?
In early versions of RDBMS, data was first modeled into physical, row-based tables and stored on disks, since these were the technologies available. The data was then indexed to enable faster access via SQL queries. Indexing was (and still is) necessary because databases were designed around the concept of row-based transactional data.
This platform did not have reading data as its primary purpose. The structures were designed around the concept of getting data in, not getting data out. Later, online analytical processing (OLAP) database structures, sometimes referred to as "cubes," were trumpeted as precalculated or pre-aggregated solutions to the performance limitations of row-based reporting. OLAP was the first attempt at addressing getting data out and was focused exclusively on data reading. The drawback was that data needed to be "built" by transforming it into additional persistent storage. The process is laborious and expensive in terms of both storage and processing.
Columnar-stored, disk-based databases then surfaced as an alternative way to store data structures tuned for efficient reporting. In these, data is stored more efficiently so additional layer builds are not as necessary. Read performance is much better, but getting data in is more time-consuming and difficult because of how the data is stored in traditional columnar databases.
Then, there is SAP HANA. In some ways, it is the culmination of all of these designs, but with a unique distinction: SAP HANA also stores the full data sets in memory.
SAP HANA is unique in combining all of these approaches while storing the data as close to the CPU as possible: in memory. Data is physically persisted in memory in either row-based or columnar structures. It can also be modeled in certain types of logical views to emulate cube-based storage structures.
Bio: Don Loden is a principal business intelligence consultant with full lifecycle data warehouse development experience in multiple verticals. He is an SAP-certified application associate on SAP BusinessObjects Data Integrator and speaks globally at numerous SAP and ASUG conferences. Contact him through email at email@example.com or on Twitter @donloden.
This was first published in November 2012