DBA 201: Testing Your DB Upgrades & Balancing Risk

    By: Gary Gordhamer on Dec 27, 2016

    DBA 201

    By Gary Gordhamer  ◾  Simon Pane, Editor

    Database upgrades are often a pain. They are impactful, scary, hard to discuss and inevitable.

    Just as you finish one upgrade, patch or application change, you will begin discussing the next one. This article will not be a definitive plan for how to test, or exact guidance on what you need to test, but rather guidance on how to think about database testing with respect to DB upgrades.

    Testing is all about mitigating risk. Our goal as technical professionals is to reduce the risk of change affecting our business negatively. If the change implications are positive then the business does better or is safer, but if the change effects are negative, that can be a career-limiting move. Truthfully, the results are often based not just on the technical testing, but also how the testing is portrayed and how the conversation between IT and the business happens. When we say that an upgrade is well tested, we are basically saying the risk of doing the upgrade is low.

    To help with the framing of risk, let’s discuss what goes into a change. First is the technical pieces from Oracle; the second is the application that uses the database; and finally, there is how the business uses the application.

    Patches

    What can you expect when you patch a database? We need to start with understanding what upgrades or patches are. I’m going to bucket database upgrades into four categories:

    1. One-off patch: A fix that is intended to resolve one issue, or maybe a small number of issues. Realistically, the patch represents a single area of functionality being touched.
    2. Bundle patch: This is a combination of one-off patches. This includes most quarterly “Patch Set Updates” (PSUs), merge patches and many other patches. This is where multiple (many) one-off fixes are combined to make it easier to apply them, due to dependencies, or to help make sure high priority issues get fixed.
    3. Minor version upgrades, a “Dot patch or release”: Upgrades generally represented by a third or fourth position numerical change in the product version, such as 11.2.0.3 to 11.2.0.4 or 12.1.0.1 to 12.1.0.2 upgrades. In general, these contain many fixes, and might contain some minor functionality differences in the product (including query optimization and performance). Occasionally, new functional features are released in these versions, but not often. Most third-party applications certify to databases at this level.
    4. Major version upgrades: These are the upgrades that affect the main product version number or sometimes the second digit, such as 11.1.0.7 to 11.2.0.3 or 12.1.0.2 to 12.2.0.1, or 11.2.0.4 to 12.1.0.2. These represent the most change to the product, and almost always will include functionality changes and new features.

    From this list, it is important to note that any patch can change any part of the database. Just because something is a one-off patch, does not mean that it does not change the query optimizer. , Likewise, just because something is a bundle patch doesn’t mean that it definitely will touch the query optimizer. It is critical that you become familiar with how to read patch notes, and get a feel for what will or will not be affected. Your mental model of change needs to balance the size of the change and the way the change is packaged by Oracle.

    Applications

    The other major variable in our equation are databases, which are used by applications. Applications can either be incredibly complex or simple. All good DBAs will begin to understand every application they support (even if they support hundreds or thousands of them). This does not mean knowing every technical nook and cranny of the application (though it could). Generally, I think of applications in a few categories, much like the DB upgrades:

    • Light applications use just tables and indexes in the database. There are many of these, and they mainly see the DB as a guaranteed data store. They may rely heavily or lightly on the multi-user concurrency of the DB engine.
    • Medium applications use some of the features of the DB that are specific. They may use Java in the database, XML data types, PL/SQL packages, etc. They are still mainly just table and index consumers, but have some code in the database for performance, unique application functionality, or to handle large volumes of users or data.
    • Heavy applications use just about every feature of the database and more. These are the most complex systems such as Oracle E-Business suite.

    In these three groups, I’m including both commercial off-the-shelf applications (COTS) and in-house written custom applications. One minor difference here is where you go for support. For COTS applications, you reach out to an external vendor for certification and support. For in-house built applications, you may have your own team (or you may not, based on the age of the application). Again, these are things you will have to add to your own mental model.

    The second factor to the application is how the business uses the application. This is somewhat irrespective of the weight of the application. For example, one company might have a simple light mobile application that every sales person uses to place every order; not complex, but very critical. On the other hand, a very heavy ERP system might only be used for financial reporting, which would be critical only a few days a month or a few weeks a year. I’m not going to provide much guidance here since this is all unique to your company. You might be a company of 100 people, and a 10-person  sales team is very important. Or you may be a company of 100,000 users and apps with thousands of users, which may or may not be as important.

    Overall, this process is entirely up to you as the DBA to decide in conjunction with your business partners. Your company may even have a formal group such as a change advisory board (CAB) that helps evaluate the risk. How can we use this information? For starters, let’s apply a simple mathematical formula to some of these items and see how it feels:

    P = DB Patch or Upgrade (1 = one-off, 2 = bundle, 3 = minor, 4 = major)

    W = Application Weight (1 = light, 2 = medium, 3 = heavy)

    C = Business Criticality (1 = low, 5 = super critical)

    R = Risk

    P*W*C = R

    Let’s walk through a simple example:

    We need to apply a one-off patch to the Oracle Cluster Ready Services (CRS) software to prevent RAC node evictions. The application is a mobile sales application used in the U.S. that is in-house built, and has about 500 users. We will call the application “medium” and the business criticality as a 3 (right in the middle).

    1 * 2 * 3 = 6

    Ultimately, what this means is that the formula has a range of values for results from 1 to 60, an even split would say 1-20 is low risk, 21-40 is medium risk, and 41-60 is high risk. Though it’s really up to you to decide. Again, this is not an absolute, but a way to help describe or think about risk.

    The important part is not just to do the math, but rather how you would be able to describe it to management and users. Now let’s try a non-mathematical way to describe this. I’ll preface each line with the risk level based on that logical point:

    • Low: Assume the risk is low and try to prove it is not (you can start at either end).
    • Low: This is a one off patch, it only changes a small item or single feature.
    • Low: This patch is for a known bug; a node eviction would be very impactful (the risk of not applying may be higher than the risk of applying).
    • Low: This patch affects the CRS, and the application code never talks directly to the CRS.
    • Low: The application is used during U.S. daytime mostly, so getting an outage after U.S. hours should be easy, and there should be ample time to remediate any problems.

    Now you can see that through this method we logically came up with the same answer and no math.

    Let’s do one more quick example. We want to apply the July 2016 PSU to a stand-alone database for an application that is based on XML DB and Oracle Application Express (APEX), used by everyone in a global organization to enter their time cards. From a quick math formula, we get 16 (2*2*4), which would be low (though this is probably not true). Walk through the items again:

    • Low: A bundle of security patches, meant to fix known issues.
    • Low: If there are small functional changes to the DB, they are to remove undocumented or unintentional functionality (our application should not be based on these).
    • Medium: This patch affects PL/SQL and database engine components that our application is directly related to.
    • Medium: The application is used globally and there are many users that would be upset if they could not track their time.
    • Medium: Since the application is global, it will be hard to find a good timeslot that allows for proper testing, and back out time without interruption to users.

    The straight math does not provide as clear a picture. Here we have also added the concept of cost around downtime. As you mature in your mental model on upgrades and patches, you will need to add in multiple aspects about how the business uses the application. For example, if the application is used to manufacture product, then downtime could mean lost manufacturing and directly cost the business either revenue, or lost opportunity. Hopefully, you have a solid understanding of how to describe the risk of database patching.

    Testing to mitigate risk

    Now that you understand the risk, it is now time to mitigate it — by testing.

    Applications are expected to take specific inputs (data), do actions and then probably do some form of output (reports, actions, processes). Also note that there are many roles in testing; from a DBA standpoint, you can probably only be responsible for showing that the database is healthy, where functional users or developers are probably need to test that the application still functions as intended. Also, I’m focusing on the functional aspects of the application and testing, and will not get into performance testing. Let’s introduce some generalized ideas:

    • Full functional testing, walking through all features of the application, verifying every data input, every process, and every output.
    • Some functional testing, testing the critical functions. For example, on a ticket system this might be create a ticket, close a ticket, route a ticket. For an order system, this might be to make an order, approve the order. We are not testing every possible feature.
    • Minimal testing, just opening the application, maybe query a record or two.
    • No testing, startup up the application and make sure login works.

    In this list, I focused on the application side of the testing, or showing that the database did not break the application. I also generalized on the term “functional testing;” within your organization, there may be many terms used to describe testing the applications ability to work and many teams involved in this process.

    You should always do some basic database health checks during every patch or upgrade. In fact, you may also want to proactively perform some database health checks weekly or monthly, since many things can happen outside of DB patching.

    We have discussed risk in a mathematical, logical manner. Now let’s show it more visually and integrate testing into the discussion.

     DB change, application type, & testing methodology 

    With this chart, we now have three variables: DB change, application type, and testing methodology (for which we have added in the color coding). We could say:

    • Green – No testing or minimal testing.
    • Yellow – Minimal testing or some functional testing.
    • Red – Some functional testing or full functional testing.

    Again, you will have to come up with your own language for this relationship that matches your needs. In fact, you could have a different set of rules based on the application, data classification or many other factors.

    Rewards

    The last part of any conversation on risk is the relation to reward. Again, I’m going to simplify rewards for DB patching into a few categories:

    • Fixing bugs, or removing unintended behavior. This is the sweet spot for almost all upgrades. All one-off patches, bundle patches, security patches and most minor version updates are included this area. Trying to keep a system running smoothly, closing out security holes and maintaining a system to a supported level.
    • Introducing new features, such as In-Memory processing, multitenant option, PL/SQL performance enhancements, new data types and more, can be very compelling depending on your goals.
    • Introducing new bugs. Although this is not an intended reason to upgrade, I want to at least acknowledge that this can happen. In most cases, this is the risk we are concerned about and, generally, where most push back will come from.
    • Maintaining vendor support. Either at the application level or the database level, often patching is necessary to ensure that the system remains supported by Oracle and other application vendors.

    Any patch could include all four of these items or any one. Your goal should be to frame the conversation around any given upgrade to focus on the critical components. For example, if you are applying a PSU patch to close security holes, your reward is closing the security holes. The PSU could possibly contain a new feature or a new bug, but those are not your rewards in this specific request.

    Why do we care about the reward? The bigger the reward, maybe the bigger the risk we are willing to accept. Say we have a critical financial application for our company, and a patch comes out for a known security bug that would allow someone to gain access to the database without a password. The risk for this patch might be medium, and would suggest some functional testing. Due to the risk level / reward (remove the possible attack vector), we do only minimal testing and rush the patch into production. In this case, the high reward value offsets the medium risk level.

    Perhaps you are beginning to notice a pattern: There is no specific formula that tells you what you need to test. As a database professional, you have to come up with your own internal mental model that represents many factors, from the technical and business aspects. Testing is not the removal of issues or the removal of risk; it is just a safeguard or way to reduce the risk. As a professional DBA, you will be looked to as the expert in many areas, so being able to speak well and communicate clearly and in a confident manner is critical. This all seems very non-technical in nature. Let me leave you with some technical items to satisfy your itch. These are all Oracle-provided scripts and technical notes on things you can check in your database to show it is healthy. Again, this does not guarantee that the application that uses the database is healthy:

    • FAQ: Database Upgrade and Migration (Doc ID 1352987.1)
    • ORAchk - Health Checks for the Oracle Stack (Doc ID 1268927.2)
    • Script to Collect DB Upgrade/Migrate Diagnostic Information (dbupgdiag.sql) (Doc ID 556610.1)
    • Master Note: Troubleshooting Oracle Data Dictionary (Doc ID 1506140.1)
    • hcheck.sql ­ Script to Check for Known Problems in Oracle8i, Oracle9i, Oracle10g, Oracle 11g and Oracle 12c (Doc ID 136697.1)
    • Best Practices for running catalog, catproc and utlrp script (Doc ID 863312.1)
    • Debug and Validate Invalid Objects (Doc ID 300056.1)
    • Where Can I Find the Parallel Version of Utlrp.sql? (Doc ID 230136.1)
    • Overview of Refreshing Materialized Views (Doc ID 549874.1)
    • How to Monitor the Progress of a Materialized View Refresh (MVIEW) (Doc ID 258021.1)
    • Script to Check the Status or State of the JVM within the Database (Doc ID 456949.1).
    Released: December 27, 2016, 12:47 pm | Updated: June 2, 2017, 9:19 am
    Keywords: Department | DBA 201 | Gary Gordhamer


    Copyright © 2017 Communication Center. All Rights Reserved
    All material, files, logos and trademarks within this site are properties of their respective organizations.
    Terms of Service - Privacy Policy - Contact

    Independent Oracle Users Group
    330 N. Wabash Ave., Suite 2000, Chicago, IL 60611
    phone: 312-245-1579 | email: ioug@ioug.org

    IOUG Logo

    Copyright © 1993-2017 by the Independent Oracle Users Group
    Terms of Use | Privacy Policy