Upgrading Your Big Data Clusters Can Be Challenging

    By: Ian Abramson on Dec 15, 2016

    By Ian Abramson

    The promise of Big Data is to provide data in a way which is unencumbered and available when needed. The key to providing a reliable and secure environment is to ensure that your software is current and up to date. Depending on how you installed your Hadoop cluster and the distribution you are using, this can either be easy or very difficult. Most distributions like MapR, Couldera and Hortonworks support a guided and supported upgrade path. If you are using Apache Hadoop, then you will find the challenge to be significant.

    As an experienced Oracle professional, I realize that database upgrades have improved over the years to a point where they are almost simple to perform. The task is well documented and the problems you encounter tend to be ones with which others have seen and to which Oracle would either provide a workaround or a patch. With Hadoop, this is not always an option, and, often, you need to develop fixes on your own or wait for the community to develop things.

    From a core computing perspective, the upgrade can be done with a reasonable amount of confidence. You can install a new version of the software and then copy or re-assign your cluster to the new version of Hadoop. If you are using a cloud implementation of Hadoop, this can be done quite effectively, with two separate clusters. If you are upgrading in-place, then I suggest you back things up, if possible. But just like upgrading your database, you must be careful. With the right planning and testing, things from a foundational level should be fine.

    The biggest challenges and problems I have seen is the secondary products which you install along with Hadoop (HDFS). Upgrades to products like Spark have caused issues with backward compatibility. Due to changes in libraries, we have found that we must retest all of our code to ensure that a new software version supports our current code. You will find that in the Big Data ecosystem the pace of change often introduces problems that previously did not exist or a function you previously used is no longer supported or has changed. This is where you need to focus your upgrade energy. We are seeing significant change in areas of processing, security and presentation, and if you want to take advantage of this new functionality, you may also know that you may need to rework your effort. As more stability comes to Hadoop, you can expect that these problems will be reduced and that better backward compatibility is maintained.

    Ultimately, you must keep your Hadoop cluster current; just as you would with your relational databases, you need to consider keeping your cluster current through a carefully planned and executed upgrade plan. 

    Released: December 15, 2016, 1:37 pm | Updated: June 2, 2017, 9:33 am
    Keywords: Department | Data Evolution | Ian Abramson


    Copyright © 2017 Communication Center. All Rights Reserved
    All material, files, logos and trademarks within this site are properties of their respective organizations.
    Terms of Service - Privacy Policy - Contact

    Independent Oracle Users Group
    330 N. Wabash Ave., Suite 2000, Chicago, IL 60611
    phone: 312-245-1579 | email: ioug@ioug.org

    IOUG Logo

    Copyright © 1993-2017 by the Independent Oracle Users Group
    Terms of Use | Privacy Policy