Have you ever gone on an unscripted road trip? You know your desired destination, but you decide on the details of your route as you travel along. Is the scenic by-way jammed with tourists? Is there road work up ahead? No problem. You just adapt to the unexpected conditions and consider alternate ways to get to your destination.
After going on a few road trips where everything was planned in advance, I have learned the hard way that it is better to stay flexible and open to adapting to the conditions that arise along the way. Taking a decide-as-you-go trip is similar to one of the capabilities available in Teradata Vantage, the Adaptive Optimizer.
A Travel Planner Inside the Database
First, let's consider what the optimizer does. Think of the optimizer as a sort of travel planner inside the database. The optimizer constructs a plan for the most efficient way to navigate through multiple objects to satisfy a query. There may be 25 or 30 different operations that must be performed as a query executes. It is the job of the optimizer to come up with a game plan for when and how to join tables and when and how to perform aggregations or sorts. Better plans mean faster query results.
When the optimizer constructs a query plan, it first estimates the cost of each operation that must be performed. Then it considers the total execution cost of various different variations and sequences of these operations when they are combined. Finding the cheapest overall plan is the goal.
Suppose your job required you to meet with customers in five different cities, each hundreds of miles apart. The sequencing of the city visits, and the time of day you arrive and depart, will influence how much mileage you put on your car and how long you will be on the road. Only by doing your homework ahead of time can you design the fastest and cheapest trip.
In addition to building plans to access multiple tables all within the database, the optimizer is also capable of building plans that bring together data from multiple different platforms. This is like building your travel itinerary when your vacation involves domestic as well as international travel.
Traditionally, the Vantage optimizer has used a straightforward, single-itinerary approach: Gather all available demographics and other inputs, evaluate processing estimates and come up with costs of different potential routes and methods, then pick the most efficient plan.
When Complexity Grows
Over the years, queries have become more complex and incorporated more sophisticated analytics. If a query is joining 30 tables, the sequence in which those tables are joined, and how they are joined, becomes critical to performance.
As a frugal travel-planner, the optimizer makes better decisions the more input it has available. Lots of demographic input is available for the initial operations in a plan, including table statistics collected by the administrator. However, it is more difficult for the optimizer to derive reliable cost estimates for operations that take place near the end of a complex plan. There are many variables that arise when a query executes that cannot be known pre-execution, when optimization takes place. For example, the optimizer may guess that Hadoop will return hundreds of rows, and instead millions come back.
Enter the Adaptive Optimizer, which essentially creates an incremental, rather than an end-to-end plan through the database. When the Adaptive Optimizer is in charge, it breaks a query into fragments. A plan is built for the first fragment and that first fragment executes on the AMPs. AMPs are processing modules, several per node, that execute database work in parallel across the configuration. Results from that first fragment are returned to the optimizer as a sort of intermediate answer set.
Using this incremental approach, values that were calculated in the first fragment and statistical feedback collected on the first fragment's results can be plugged into the optimization of the second fragment, providing a richer set of input into plan building. The optimizer learns and adapts from the results of the previous fragment, so it can build a better next fragment. This is similar to going on a plan-as-you-go road trip where you make decisions about the next leg of your trip based on experiences you had during the first leg.
By the way, adaptive optimization is not wasted on short or simple queries. The optimizer picks and chooses, and determines eligible requests based on certain thresholds, for example expected execution time. It also considers the type of request. For example, the Adaptive Optimizer is always involved in queries that access remote data, such as data stored in a Hadoop platform.
Some users keep large stores of historical data on inexpensive servers, such as Hadoop. Vantage allows users to reach out and grab small subsets of that data, bring it into the database, and join it to relational tables that carry things such as transaction data or client data. When the optimizer is building a query plan that includes Hadoop, the first fragment always encompasses the work involved in accessing that remote data. When that first fragment completes, information about the data coming back from Hadoop, such as the number of rows being returned and the spread of values across a potential join column, are visible to the optimizer and will be used in building the next fragment. This statistical input, which cannot be known until Hadoop access is complete, allows the optimizer to make better decisions about how to join Hadoop data to relational tables and make use of database performance techniques such as table partitioning and indexed access.