Regression Analyzed, Tout de Suite: Leveraging Oracle 12c Database Real Application Testing (RAT) Suite, Part 1

    By: James Czuprynski on Dec 19, 2016

    Regression Analyzed, Tout de Suite: Leveraging Oracle 12c Database Real Application Testing (RAT) Suite, Part 1

    By Jim Czuprynski  ◾  Nick Marcovecchio, Editor


    This article — the first in a two-part series — discusses the new features of RAT in 12cR1.It then illustrates how to capture an actual application workload from a pre-12c Oracle Database for eventual replay and analysis against an Oracle 12c Database.

    The Real Application Testing (RAT) suite of performance tuning tools was first offered as part of Oracle Database 11gR1. Oracle Database 12cR1 added some significant features to the capabilities of database capture and replay that make it possible for any Oracle DBA to quickly capture a database application workload and replay it within an Oracle 12c environment to determine potential vectors of performance improvement or regression.

    Determining the ultimate root causes for an application’s unexpected performance regression is probably one of the most difficult situations any Oracle DBA can encounter. When it happens, it’s very much like a seasoned pilot who’s struggling to regain control of a rapidly descending aircraft that just seconds ago had been cruising comfortably at its optimal speed and altitude: hours of boredom punctuated by moments of sheer terror.

    When I ask Oracle DBAs what they believe is the most common cause of application performance regression, they almost always shout in unison: “Poorly written application code!” While it’s true that poorly written SQL statements that haven’t been adequately tested in conditions as close to production as possible are probably responsible for unexpectedly poor performance 90 percent of the time, we should remember that when we point a finger at our application developers, we’re pointing three fingers back at ourselves. Oracle DBAs can unwittingly cause unexpectedly poor application performance through improper tuning methods, neglect of proven implementation checklists and even simple human error.

    One of the classic cases I’ve encountered involved a junior DBA who changed the setting for CURSOR_SHARING from its default value of EXACT to FORCE because the first answer revealed in his Google search for a similar issue suggested that change … except that he changed the value using ALTER SYSTEM instead of ALTER SESSION, and then neglected to tell anyone he’d done that. Not until the performance of many other queries regressed dramatically did someone ask the crucial question: “OK, who changed anything recently?” Of course, that question wasn’t asked until several hours after our shop had initiated a massive and painful MVFP (multi-vendor finger-pointing) exercise.

    While this seems to be a relatively obvious case of how making even a minor change to an Oracle Database environment will impact application performance, here are just a few examples of much more significant causes for potential application performance regression: 

    Upgrading to a new database version: Upgrading a database to a new major release — say, from 11.2.0.4. to 12.1.0.2 — almost always introduces the possibility of performance regression for at least some statements. However, remember that it is possible that the Oracle query optimizer may operate slightly differently after upgrading to a minor release, or even after just applying a quarterly PSU.

    Hardware upgrades: Because average commodity hardware typically lasts about three years, it’s not unusual for organizations to face performance differences when upgrading to the latest computing hardware, SAN storage or networking components.

    O/S upgrades: Almost every operating system requires occasional patching for security vulnerabilities, improved hardware and firmware interfaces and revised device drivers and codices.

    Migrating database files to different file formats or media: Regardless of the original file system used — GFS, EXT3/4 or NTFS — moving an Oracle Database’s files to ASM, ACFS or NFS are possible vectors for application performance regression. Of course, moving these same files from spinning disk to SSDs or NVMe will almost certainly improve physical I/O performance, but it’s important to be able to measure the expected versus actual performance improvement as well.

    Migrating to different operating systems and platforms: Migrating from a little-endian system like Linux or Windows to a big-endian system like Solaris, HP-UX or AIX may also have unexpected consequences for application performance.

    Migrating to enterprise systems: Finally, migrating from a “home-grown” hardware environment to an integrated enterprise system like Oracle Exadata Database Machine or Oracle Database Appliance (ODA) will hopefully improve application performance. However, while Exadata features like Smart Scan, Smart Flash Cache and storage indexes usually help performance of table scans, it’s also not unlikely that the optimizer may choose an execution plan that now favors a table scan when an index scan might still be optimal.

    What DBAs Typically Miss: Testing to Destruction

    One of the reasons that commercial air travel is so much safer today than 50 years ago is that Boeing, Airbus and other manufacturers don’t just test their aircraft within expected performance thresholds; they are tested well beyond those thresholds by experienced test pilots. Likewise, aeronautical engineers regularly test the aluminum, steel and composite materials that comprise modern airliners well beyond their known strengths to insure they hold up under the most extreme conditions of temperature, stress and pressure. Yet most Oracle DBAs almost never perform a similar test to destruction with the systems that support our production application workloads.

    The solution that most IT organizations adopt to avoid unexpected performance regression is to test a reasonable representation of expected application workloads against the next version of the system, usually in a tightly controlled QA environment. But even this approach ignores what I like to call the possibility of impossible extremes. For example, what if an overwhelming number of application users unexpectedly connect to my database simultaneously? Or what if two or more application workloads that have always happened within different timeframes now occur simultaneously? In other words, how can an Oracle DBA stress-test a current or future database environment beyond the point of no return?

    In these situations, it may be extremely difficult to identify all of the different vectors that are causing performance regression. Because many Oracle database environments are quite complex, there are often multiple interactions between components that may be triggering regression. Again, this unfortunately tends to degrade to finger-pointing between the various IT groups tasked to discover the true root cause(s) of the regression.

    Predicting Regression with Real Application Testing (RAT)

    The good news is that there’s already a valuable Oracle tool chest that can truly mitigate performance regression and even perform realistic testing to destruction. Oracle Real Application Testing Suite — often unfortunately referred to as RAT Suite — has actually been available since Oracle 11gR1, but it has benefited from some significant enhancements in 12cR1. RAT actually comprises three toolsets: SQL Performance Analyzer, Workload Intelligence, and Database Capture and Replay. While the rest of this article series focuses on Database Capture and Replay, the features of other two tool sets are also worth noting.

    SQL Performance Analyzer analyzes SQL statements that comprise application workloads or subsets thereof and is able to detect changes in statement execution plans because of different initialization parameter settings, different database or optimizer versions, or different hardware, storage or operating systems. Once detected, regressing SQL statements can be analyzed for better performance via SQL Tuning Advisor. However, SPA is primarily aimed at improving query performance, not DML statements.

    Workload Intelligence is a new tool set first available as part of Oracle 12cR1. It’s designed to analyze a captured workload for execution patterns that aren’t obvious to human brains. The output of a Workload Intelligence analysis is one or more sets of SQL statements that comprise detected workload patterns. Workload Intelligence is a Java-based program, and it does require the creation of a separate schema for capturing results of its analysis.As its name implies, Database Workload Capture captures an entire application workload — both DML as well as SELECT statements — in the precise order in which they occur so that they can be played back in that same order (or, as I’ll discuss shortly, in a completely different order). Captured workloads can be filtered based on several attributes of the application workload’s sessions, including database service, module, action, program, executing user and even (in the case of RAC) instance. The basic concepts of Database Workload Capture are shown in Figure 1.

    Figure 1: Database Workload Capture
    Figure 1: Database Workload Capture

    A captured workload is composed of two sets of files: a set of shadow capture files that encompass each application session’s workload, and a shadow capture control file that records the order in which each session began and ended so that sessions can be played back in the same order as they occurred. In addition, when an application workload capture ends, AWR data can be captured automatically for later comparison during future database replays.Database Workload ReplayOnce all desired workloads have been captured, it’s a simple matter to replay them against a representative test system. Figure 2 illustrates the basic elements of Database Workload Replay. Captured workloads can be replayed either in whole or in part against the target test system, and once a replay is complete, AWR data can be captured automatically for later comparison to the original captured workload and even between future database replays. 

    Figure 2: Database Workload Replay


    Figure 2: Database Workload Replay

    For best results, the database against which the workload is being captured should be duplicated to a testing environment as near to the point in time at which the recorded workload(s) began. One of the simplest ways to handle this requirement is to leverage an Oracle Data Guard snapshot standby database, which can be rewound back to the starting timestamp of the application workload recording on its corresponding primary database. However, any other method that effectively resets the target database to the appropriate point in time can be used, including Data Pump Export and Import or even reloading programmatically via database links or the Oracle SQL*Loader utility.

    During Workload Replay, all unfiltered statements are replayed against the test system, including queries and DML. By default, the unfiltered workload is replayed in precisely the same order as it was recorded, but in 12cR1, there are several additional options. Workloads can be selectively ramped up by either increasing the number of users, sessions and requests for the same data; it’s also possible to replay workloads simultaneously in unexpected combinations. These options therefore enable true testing to destruction. In other words, beyond a prospective system’s current capacity to execute the workload.

    Consolidated Replay and Workload Scale-Up

    Oracle 12cR1 offers the Consolidated Replay feature, which permits the combination of different workloads captured from different databases at different times into one consolidated workload. For example, an Oracle DBA can now combine:

    • An identical workload that was generated against completely different 10gR1, 11gR1, and 11gR2 database environments; or
    • An identical workload generated against the same database, but during two or more different time periods; or even
    • Completely different workloads generated against the same database, but captured during wildly diverse conditions.

    Another new 12cR1 Database Replay feature is Workload Scale-up, which enables the combination of dissimilar workloads for simultaneous replay — even if they happened at completely different time periods. The beauty of this new feature set is the potential for a true “test to destruction” of a database system. It includes the following features:

    • Schema Remapping allows replaying a single workload originally recorded from a single schema into one or more schemas simultaneously.
    • Time Shifting permits lining up different workloads so that their peak activity periods occur simultaneously.
    • Finally, Workload Folding separates a single workload captured over a longer duration into different sub-workloads, thus allowing those workloads to be played back simultaneously.

    I’ll demonstrate some of these features in the next article in this series, but for now, let’s consider how best to leverage these features in a realistic situation: the migration of an Oracle database from release 11gR2 to release

    12cR1 RAT to the Rescue: A Realistic Scenario

    For a practical demonstration of the power of 12cR1 Real Application Testing, consider the following scenario — one that might not be far from your IT organization’s recent requirements:

    • The current database environment consists of an Oracle 11gR2 (11.2.0.4) database running in a VirtualBox environment under Oracle Enterprise Linux 5.6. It currently supports both an OLTP and a DSS workload. However, there is virtually no contention between these application workloads, because the peak workload period of typically 60 OLTP user sessions occurs between 10 and 11 a.m., while the peak workload period of 80 DSS user sessions occurs much later in the day, usually between 7 and 9 p.m.The CIO and CFO have at last agreed to secure funding for a significant upgrade to the current production environment. The Oracle DBA has decided to upgrade to the latest available release of Oracle Database 12c in a virtualized VMWare ESX environment running Oracle Enterprise Linux 7.2. The new environment will have an additional 50 percent virtual memory available, and the CFO is willing to pay the additional licensing costs for the Database In-Memory option if the DBA can prove there are significant improvements to DSS application workload performance. However, there’s just one catch: Marketing executives want to be able to run the current DSS application workloads simultaneously during the normal OLTP workload window so that they can conduct in-depth analyses of sales performance in nearly real time. 
    • The CFO and CIO insist upon knowing that the new database environment is sufficient to support this new requirement.

    Performing Database Workload Capture

    To satisfy the requirements of this project, I’ll first need to capture representative workloads for both the OLTP and DSS applications. Fortunately, capturing an application workload is incredibly simple, as the code in Listing 1 shows. Here’s a breakdown of what this code actually does:

    • First, I’ll create a separate folder on the source database’s file system to contain the shadow capture files for the workload capture operation. I’ll also need to create a directory object so that I can point the shadow capture processes at this folder.
    • I want to isolate my workload capture operation to a subset of all database activity, so I’ll use procedure DBMS_WORKLOAD_CAPTURE.ADD_FILTER to specify the list of database services that I’m interested in. Note that even though I’ve captured activity for six database services, I can decide to filter that even further during future replay operations.
    • Next, I’ll invoke procedure DBMS_WORKLOAD_CAPTURE.START_CAPTURE to capture the workload.

    This invocation of DBMS_WORKLOAD_CAPTURE applies the following rules to the application workload capture:

    • The workload captured will be named DBR_CAPTURE_100, and all shadow capture files will be written to directories within the file system path defined by directory object DBRCONTROL.
    • All activity will be captured until the Oracle DBA terminates capture by invoking procedure DBMS_WORKLOAD_CAPTURE.FINISH_CAPTURE.
    • Workload capture filters are treated as inclusive; in other words, all other workloads will be excluded.
    • The AUTO_UNRESTRICT parameter’s setting of TRUE means that if the source database instance had been started in STARTUP RESTRICT mode, it would be immediately placed back into NORMAL mode once the Workload Capture procedure is terminated.
    • A SQL Tuning Set will be automatically constructed upon termination of capture, and the library cache will be evaluated every 5 seconds for any new SQL statements.

    Using a homegrown workload generator, I then generated a random application workload consisting of 80 percent OLTP activity and 20 percent DSS activity against the AP schema of my 11.2.0.4 database. Once the workload completed, I terminated workload capture using the code in Listing 2.

    Approximately 90 seconds after issuing this command, the workload capture terminated. This gave any pending uncommitted DML statements sufficient time to be committed rather than rolled back, allowing for a cleaner cutoff of transaction activity.

    I then copied the contents of the directory path corresponding to the DBRCONTROL object to another location, cleared the contents of the path specified by DBRCONTROL, reset my 11.2.0.4 database to its original state by regenerating all the data in the AP schema and then generated another random application workload — this time generating 20 percent OLTP activity and 80 percent DSS activity — and finally terminated that workload. 

    I used the SQL in Listing 3 to summarize the results of these two workload capture operations, and Listing 4 shows the resulting output from that query. Next StepsIn the second and final part of this article series, I’ll illustrate how to use the latest features of RAT Suite in Oracle Database 12cR1 to replay the workloads I’ve already captured to determine if there will be any significant variance in application performance when replayed against my new database environment, as well as how to detect those variances and determine possible solutions.   ReferencesOracle Database Testing Guide 12c Release 1 (E55028-04), May 2015


    Oracle DBAs can unwittingly cause unexpectedly poor application performance through improper tuning methods ...

    How can an Oracle DBA stress-test a current or future database environment beyond the point of no return?

    Workloads can be selectively ramped up by either increasing the number of users, sessions and requests for the same data; it’s also possible to replay workloads simultaneously in unexpected combinations.

    Listing 1: Database Workload Capture: Setup and Execution

    Listing 2: Terminating OLTP Workload Capture 

    Listing 3: Query for Displaying Workload Capture Results

    Listing 4: Workload Capture Results 


    Performing Database Workload Capture

    To satisfy the requirements of this project, I’ll first need to capture representative workloads for both the OLTP and DSS applications. Fortunately, capturing an application workload is incredibly simple, as the code in Listing 1 shows. Here’s a breakdown of what this code actually does:

    • First, I’ll create a separate folder on the source database’s file system to contain the shadow capture files for the workload capture operation. I’ll also need to create a directory object so that I can point the shadow capture processes at this folder.
    • I want to isolate my workload capture operation to a subset of all database activity, so I’ll use procedure DBMS_WORKLOAD_CAPTURE.ADD_FILTER to specify the list of database services that I’m interested in. Note that even though I’ve captured activity for six database services, I can decide to filter that even further during future replay operations.
    • Next, I’ll invoke procedure DBMS_WORKLOAD_CAPTURE.START_CAPTURE to capture the workload.

    This invocation of DBMS_WORKLOAD_CAPTURE applies the following rules to the application workload capture:

    • The workload captured will be named DBR_CAPTURE_100, and all shadow capture files will be written to directories within the file system path defined by directory object DBRCONTROL.
    • All activity will be captured until the Oracle DBA terminates capture by invoking procedure DBMS_WORKLOAD_CAPTURE.FINISH_CAPTURE.
    • Workload capture filters are treated as inclusive; in other words, all other workloads will be excluded.
    • The AUTO_UNRESTRICT parameter’s setting of TRUE means that if the source database instance had been started in STARTUP RESTRICT mode, it would be immediately placed back into NORMAL mode once the Workload Capture procedure is terminated.
    • A SQL Tuning Set will be automatically constructed upon termination of capture, and the library cache will be evaluated every 5 seconds for any new SQL statements.

    Using a homegrown workload generator, I then generated a random application workload consisting of 80 percent OLTP activity and 20 percent DSS activity against the AP schema of my 11.2.0.4 database. Once the workload completed, I terminated workload capture using the code in Listing 2.

    Approximately 90 seconds after issuing this command, the workload capture terminated. This gave any pending uncommitted DML statements sufficient time to be committed rather than rolled back, allowing for a cleaner cutoff of transaction activity.

    I then copied the contents of the directory path corresponding to the DBRCONTROL object to another location, cleared the contents of the path specified by DBRCONTROL, reset my 11.2.0.4 database to its original state by regenerating all the data in the AP schema and then generated another random application workload — this time generating 20 percent OLTP activity and 80 percent DSS activity — and finally terminated that workload.

    I used the SQL in Listing 3 to summarize the results of these two workload capture operations, and Listing 4 shows the resulting output from that query.

    Next Steps

    In the second and final part of this article series, I’ll illustrate how to use the latest features of RAT Suite in Oracle Database 12cR1 to replay the workloads I’ve already captured to determine if there will be any significant variance in application performance when replayed against my new database environment, as well as how to detect those variances and determine possible solutions.   

    References

    Oracle Database Testing Guide 12c Release 1 (E55028-04), May 2015

    Released: December 19, 2016, 4:08 pm | Updated: January 25, 2017, 7:24 am
    Keywords: Feature | 12c | Jim Czuprynski


    Copyright © 2017 Communication Center. All Rights Reserved
    All material, files, logos and trademarks within this site are properties of their respective organizations.
    Terms of Service - Privacy Policy - Contact

    Independent Oracle Users Group
    330 N. Wabash Ave., Suite 2000, Chicago, IL 60611
    phone: 312-245-1579 | email: ioug@ioug.org

    IOUG Logo

    Copyright © 1993-2017 by the Independent Oracle Users Group
    Terms of Use | Privacy Policy