記事

How to Enjoy Hybrid Partitioning with Teradata Columnar

Teradata Vantage's Advanced SQL Engine's performance-enhancing options include column-row hybrid partitioning. Find out how to take advantage of this great feature.

Carrie Ballinger

2019年7月16日 4 分で読める

Hybrids are nothing new. One of my favorite fruits, the nectarine, is a hybrid between a peach and a plum. Hybrid tea roses, one of the most recognized and popular variety of cut flowers, are a cross between hybrid perpetual roses and old-fashioned tea roses. Look around you and you'll likely see many other examples of successful hybrids, even in a relational database system.

Teradata Columnar, for example, offers a hybrid of two different database table partitioning choices: column partitioning and row partitioning. But before delving into a description of column/row hybrid partitioning, let's first examine row then column partitioning individually.

First Came Row Formatted Data

The Teradata Advanced SQL Engine was architected to store data for a table a row at a time. Each physical record that is loaded into the database is transformed into a row and assigned to one of AMPs (parallel units) in the configuration. This assignment takes place in way that maintains an even spread of data across all AMPs systemwide. When data is stored by row, you can retrieve all its column values in one physical I/O.

Figure 1: Data stored in row-based units.

Then the Option of Partitioning the Rows

An additional option exists for tables stored by row, called "row partitioning". Rows can be grouped on disk by a "partitioning column." An example of a partitioning column for a row-formatted table is a date column, such as TxnDate. When a table is partitioned by TxnDate, all the rows for transactions that took place on a specific unit of time, such as day, week, or month, are stored together in a partition. When using row partitioning, the database will apply "partition elimination" and only read the data rows that meet the query's partitioning column predicate values. The less data that is read, the faster a query will return an answer set.

Figure 2: Data stored using row partitioning based on TxnDate.

Columnar

Teradata Columnar is an enhancement that offers the ability to store the data in a table by column, instead of by row. In its simplest form, each column in the table becomes its own column partition. The benefit of Columnar is faster execution time for queries that access a subset of a table's columns, because less data will have to be read from the database.

With a basic Columnar implementation, each physical row that is stored on disk is a collection of values from one column. If your table has 100 columns but each query that accesses the table only needs 5 of those columns, Columnar can greatly reduce the physical I/O required to read the table.

Figure 3: Data stored using column partitioning.

With Columnar, a specific column partition, such as the one for ItemNo, will only be accessed if a query references the ItemNo column, otherwise that partition will be skipped. In addition, various forms of compression are automatically applied to each physical row that holds these column values as the physical row is being constructed, contributing to space savings in the storage subsystem.

One of the unique advantages of Teradata Columnar is that it is an option you can choose to use or choose not to use. Some tables are more suitable for row-formatting, for example tables that are small or tables whose have a high percentage of their columns frequently accessed. With Teradata, you have a choice of formatting by row or column, so you can match the physical structure of the table with how the table will be accessed.

To better understand the differences between row and column partitioning, consider an architectural example where there are two different, contrasting approaches to constructing living spaces.

Partitioning by family—separate dwellings

This is similar to row partitioning. Each unit contains all the necessary components of household living—a living room, a bedroom, a bathroom, a kitchen, a laundry room. Each house, apartment, or any other bundled set of these different components is located physically separate from any other such dwelling. Once you are in the house, you can easily move from kitchen to living room, to bedroom.

Partitioning by function—dormitories

Dormitories are like column partitioning. Various rooms that people live in are grouped together by function. Bedrooms are congregated in one section of the structure. There is a separate shared dining area and a large group laundry area in the basement. When you enter a dormitory, you can either enter the sleeping area, the eating area, or the shared living area, but moving between the different functional areas is more of an effort.

As with database design, there are tradeoffs in selecting the right living space for you. Dormitories have economy of scale advantages, particularly in the area of energy conservation, landscaping, mail delivery, more efficient use of space, and cost to live there. But separate dwellings offer more privacy, more control over your environment, ease of moving between functional areas, and the ability to customize your surroundings.

Introducing Hybrid Partitioning

In Teradata Vantage you can combine both types of partitioning into "hybrid partitioning." Partition by column, and then on top of that partition by row, using a column such as TxnDate to define the borders of the row partitioning. This creates smaller partitions, each which represents the intersection of a column partition and a row partition.

The table below illustrates hybrid partitioning. In that table, all the column values for Quantity will be segregated by the TxnDate of their logical row. If the query requests Quantity data for transactions with a TxnDate of "05-29-2011' then the only the first three Quantity values will be read from disk.
columnar-pic-4-(1).png

Figure 4: Data stored using hybrid partitioning.

Partition elimination is even more effective with hybrid partitioning because just these combined partitions associated to the TxnDate range expressed in the query are physically accessed. Less data read leads to faster query execution times. Tables with hybrid partitioning are particularly suitable for large tables that store sales data, sensor data, web click data, or other time-qualified events.

From my love of eating nectarines to my enjoyment of smelling tea roses, experience has taught me that hybrid solutions are often significantly better than the parent entities from which they were born. The same is true of the column-row hybrid partitioning option. It's the best of both worlds, and one of many performance-enhancing options within Vantage's Advanced SQL Engine.