CDH6.3.2 detailed steps for upgrading impala 3.2 to impala 3.4

Posted by astarmathsandphysics on Tue, 23 Nov 2021 06:57:06 +0100

Cdh6.3.2 impala 3.2 to impala 3.4 compilation process

Local environment

hardware requirements

  • The CPU must support at least SSSE3

  • Minimum memory: 16GB (64G recommended by the community)

  • Hard disk space: 120GB (for test data)

Linux only operating systems

  • Ubuntu 14.04,16.04,18.04

  • CentOS 7

Compiling environment

  • A set of CDH6.3.2 clusters is deployed on three centos machines
  • A centos machine with the same environment is used to compile Apache Impala 3.4

impala and other component versions support


Compile impala

The basic version of impala corresponding to CDH6.3.2 is Apache Impala 3.2. Of course, there are many patches. From the Impala web page, you can see that the version number is 3.2.0-cdh6.3.2

Apache Impala is release d in the form of source code, so it needs to be compiled on the corresponding platform. Find a machine that is consistent with the cluster environment.

Compile Impala according to the chapter "Building Impala without Test Data (for testing Impala)" in the document:
The difference is that we need to compile version 3.4 instead of the latest master branch, so we need to select the version when cloning:

git clone --single-branch --branch 3.4.0 impala-3.4
cd impala-3.4

Due to the change of Cloudera maven repo URL, pom.xml needs to be modified to compile successfully (IMPALA-9815). We mark the commit of IMPALA-9815:

git fetch origin 481ea4ab0d476a4aa491f99c2a4e376faddc0b03
git cherry-pick 481ea4ab0d476a4aa491f99c2a4e376faddc0b03

bin/ script installation and compilation dependencies:

export IMPALA_HOME=`pwd`
$IMPALA_HOME/bin/ script problem summary

Re execute script
rm -rf /var/lib/pgsql/*
rm -rf /usr/local/bin/ant
Apache ant * issues
# Modify ant download address
vim $IMPALA_HOME/bin/

245 redhat sha512sum -c - <<< '2287dc5cfc21043c14e5413f9afb1c87c9f266ec2a9ba2d3bf2285446f6e4ccb59b558bf2e5c57911a05dfa293c7d5c7ad60ac9f744ba11406f4e6f9a27b2403  apache-ant-1.10.12-bin.tar.gz'
246 redhat sudo tar -C /usr/local -xzf apache-ant-1.10.12-bin.tar.gz
247 redhat sudo ln -s /usr/local/apache-ant-1.10.12/bin/ant /usr/local/bin
ivy dependency download problem
# Modify the corresponding configuration file
vim /root/hadoop-lzo/build.xml
96   <property name="ivy_repo_url" value="${ivy.version}/ivy-${ivy.version}.jar"/>
vim /root/hadoop-lzo/ivy/ivysettings.xml
15     value=""

If you have previously compiled Impala on this machine, you can also skip the above step.

Setting environment variables

source $IMPALA_HOME/bin/

Start compilation

$IMPALA_HOME/ -noclean -notests -release
  • Note: if it is for testing purposes, you can remove - release, so that the compiled impalad can print more information when encountering a bug. For example, the bug can be determined by DCHECK in advance, which is easier to locate

Summary of compilation problems

logredactor download problem

Compiled successfully view

During the compilation process, many dependencies need to be downloaded, and the speed of VPN will be much faster. I spent about an hour compiling here. After compiling, I can find the impalad executable file and impala frontend jar package:

$ ll -h be/build/latest/service/impalad fe/target/impala-frontend-0.1-SNAPSHOT.jar
-rwxrwxr-x 1 root root 460M 6 November 20:30 be/build/latest/service/impalad*
-rw-rw-r-- 1 root root 7.5M 6 November 20:33 fe/target/impala-frontend-0.1-SNAPSHOT.jar
$ strings be/build/latest/service/impalad | grep 3.4.0

In the last instruction above, you should find strings such as 3.4.0-RELEASE in the impalad executable file. The compiled impalad executable file has more than 400 M, because it contains a lot of symbol information. You can use strip -- strip debug impalad to reduce its size

impala uses static compilation by default, but there are still some dynamic dependencies. Check with the ldd instruction:

# ldd beild/latestrvice/impalad 
beild/latestrvice/impalad: version `CXXABI_1.3.8' not found (required by beild/latestrvice/impalad)
beild/latestrvice/impalad: version `GLIBCXX_3.4.20' not found (required by beild/latestrvice/impalad)
beild/latestrvice/impalad: version `GLIBCXX_3.4.20' not found (required by /root/impala-3.4/toolchain/kudu-4ed0dbbd1/
beild/latestrvice/impalad: version `CXXABI_1.3.8' not found (required by /root/impala-3.4/toolchain/kudu-4ed0dbbd1/ =>  (0x00007ffcd11d0000) => /usrb/jvm/java-1.8.0-openjdk- (0x00007f987c937000) => (0x00007f987c71b000) => (0x00007f987c4fe000) => not found => /root/impala-3.4/toolchain/kudu-4ed0dbbd1/ (0x00007f987bd7f000) => (0x00007f987bb77000) => (0x00007f987b973000) => (0x00007f987b701000) => (0x00007f987b29e000) => (0x00007f987afb5000) => (0x00007f987ad68000) => (0x00007f987aa60000) => (0x00007f987a75e000) => (0x00007f987a548000) => (0x00007f987a17a000)
	b64/ (0x00007f987cb3b000) => (0x00007f9879f60000) => (0x00007f9879d29000) => (0x00007f9879af6000) => (0x00007f98798f2000) => (0x00007f98796e2000) => (0x00007f98794cc000) => (0x00007f98792c8000) => (0x00007f98790c5000) => (0x00007f9878e9e000) => (0x00007f9878c3c000)

Most of these so files are self-contained or installed in the system. We just need to copy those related to the Impala version, such as, others do not need to be copied together.

  • Note: replace is because impala 3.4 uses functions not supported by the kudu client of impala 3.2. Impala 3.3 began to support column annotation of kudu table and upgraded the dependent kudu client version. See IMPALA-5351 for details. If this function is not used, it is said that libkudu can not be replaced_ Client. So. 0 (I didn't verify it).

Deployment study

View CDH cluster executable

Log in to the cluster machine and check the executable files used by impalad, statestore and catalogd. You can find their startup commands with instructions like "ps aux | grep catalogd"

/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/catalogd --flagfile=/var/run/cloudera-scm-agent/process/39-impala-CATALOGSERVER/impala-conf/catalogserver_flags
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/statestored --flagfile=/var/run/cloudera-scm-agent/process/38-impala-STATESTORE/impala-conf/state_store_flags
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad --flagfile=/var/run/cloudera-scm-agent/process/37-impala-IMPALAD/impala-conf/impalad_flags

Old Impala directory

The flagfile used here is generated by CM. We don't need to manage it. We just need to let CM use a new executable file. Let's generate a directory with the same directory structure as / opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala, and then set impala in CM_ Use it with the home environment variable.

# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala
drwxr-xr-x 2 root root   113 11 September 2019 bin
drwxr-xr-x 2 root root    36 11 September 2019 cloudera
drwxr-xr-x 2 root root 16384 11 September 2019 lib
lrwxrwxrwx 1 root root    11 11 September 2019 sbin -> sbin-retail
drwxr-xr-x 2 root root    56 11 September 2019 sbin-debug
drwxr-xr-x 2 root root    56 11 September 2019 sbin-retail
drwxr-xr-x 7 root root   158 11 September 2019 toolchain
drwxr-xr-x 7 root root  4096 11 September 2019 www

bin directory

The bin directory is mainly some diagnostic scripts. Don't worry

# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/bin/
-rwxr-xr-x 1 root root 25189 11 September 2019
-rwxr-xr-x 1 root root  9183 11 September 2019
-rwxr-xr-x 1 root root  2013 11 September 2019
-rw-r--r-- 1 root root     0 11 September 2019

cloudera directory

It's version information. Don't worry. lib contains some soft chains of jar packages and dependent so files, which should be copied.
The sbin directory points to sbin retail, which mainly contains impalad executable files. catalogd and statestore are soft chains and do not need to be copied:

# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/cloudera/
-rw-r--r-- 1 root root 515 11 September 2019

toolchain directory

All the directories are statically linked to impalad executable files. I don't know how to carry them. Don't worry:

# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/toolchain/
drwxr-xr-x 3 root root 19 11 September 2019 breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2
drwxr-xr-x 3 root root 17 11 September 2019 cmake-3.8.2-p1
drwxr-xr-x 3 root root 21 11 September 2019 llvm-5.0.1-asserts-p1
drwxr-xr-x 3 root root 21 11 September 2019 llvm-5.0.1-p1
drwxr-xr-x 3 root root 19 11 September 2019 orc-1.5.5-p1

www directory

The directory is for web pages and needs to be updated.

Dependent dynamic link library

Finally, confirm the dependent DLL with ldd instruction

# ldd be/build/latest/service/impalad
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad: /lib64/ version `GLIBCXX_3.4.20' not found (required by /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad)
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad: /lib64/ version `CXXABI_1.3.8' not found (required by /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad) =>  (0x00007ffd191ae000) => not found => /lib64/ (0x00007f7405613000) => not found => not found => /lib64/ (0x00007f740540a000) => /lib64/ (0x00007f7405206000) => /lib64/ (0x00007f7404fb9000) => /lib64/ (0x00007f7404d46000) => /lib64/ (0x00007f74048e5000) => /lib64/ (0x00007f74045fd000) => /lib64/ (0x00007f74042f4000) => /lib64/ (0x00007f7403ff2000) => /lib64/ (0x00007f7403ddc000) => /lib64/ (0x00007f7403bbf000) => /lib64/ (0x00007f74037fc000)
        /lib64/ (0x0000560d3b6df000) => /lib64/ (0x00007f74035e2000) => /lib64/ (0x00007f74033aa000) => /lib64/ (0x00007f7403177000) => /lib64/ (0x00007f7402f73000) => /lib64/ (0x00007f7402d64000) => /lib64/ (0x00007f7402b60000) => /lib64/ (0x00007f740294a000) => /lib64/ (0x00007f7402746000) => /lib64/ (0x00007f740251f000) => /lib64/ (0x00007f74022bc000)

Three of them cannot be found because some environment variables are not configured. For example, and are provided by java. CM can find them by configuring java related environment variables. We can see that most so files are available in the system. We only need to copy the libkudu that the new version depends on_ Client. So. 0 will do

Generate new Impala directory

First copy a copy of the original directory, and then change it based on it.

# cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib
# cp -r impala apache-impala-3.4
# cd apache-impala-3.4

lib directory

Delete all the jar packages in the lib directory, leaving the so file

# rm lib/*.jar
# ll lib
-rw-r--r-- 1 root root   89864 Jun 19 22:37
lrwxrwxrwx 1 root root      36 Jun 19 22:37 -> ../../hadoop/lib/native/
lrwxrwxrwx 1 root root      42 Jun 19 22:37 -> ../../hadoop/lib/native/
-rw-r--r-- 1 root root 6638528 Jun 19 22:37
-rw-r--r-- 1 root root 6638528 Jun 19 22:37
-rw-r--r-- 1 root root 1003416 Jun 19 22:37
-rw-r--r-- 1 root root 1003424 Jun 19 22:37 is replaced by the one we used when compiling Impala 3.4. From the previous ldd output, you can see in $IMPALA_HOME/toolchain/kudu-4ed0dbbd1/debug/lib/, other so files need not be controlled

impala-3.4 dependent ar

The jar packages dependent on impala-3.4 are also copied into the lib directory. They can be found in the compilation directory. The specific path is $IMPALA_HOME/fe/target/dependency/


Put the impala-frontend-0.1-SNAPSHOT.jar compiled from impala-3.4 into the lib directory. The path in the compilation directory is fe/target/impala-frontend-0.1-SNAPSHOT.jar


Put impala-data-source-api-1.0-SNAPSHOT.jar compiled from impala-3.4 into the lib directory. The path in the compilation directory is ext data source / API / target / impala-data-source-api-1.0-SNAPSHOT.jar. In fact, you can also copy the original CDH6.3.2, because the external data source has not been updated for several years

SBIN retail directory

Replace the impalad inside with the impalad compiled by apache impala 3.4. The path in the compilation directory is be/build/latest/service/impalad
Check whether the catalogd and statestore soft chains point to impalad:

# ll sbin-retail
lrwxrwxrwx 1 root root         7 Jun 19 22:37 catalogd -> impalad*
-rwxr-xr-x 1 root root 481420800 Jun 20 00:06 impalad*
lrwxrwxrwx 1 root root         7 Jun 19 22:37 statestored -> impalad*

www directory

This directory is used by WebUI. Delete the old version and copy the new version.

New impala dependency summary


Change CM configuration and restart

Put the new Impala directory on all machines and make sure they are consistent. Then go to Impala - > configuration - > env in CM and add an environment variable IMPALA_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/apache-impala-3.4

Then restart the entire Impala cluster. After the restart is successful, you can see that catalogd, statestore and impalad enable the new page, and the version number is 3.4

Verification and rollback

Finally, verify whether the cluster works normally, including some new functions of Impala 3.4. If there is any incompatibility, roll back the CM configuration and restart the Impala cluster, because we haven't moved anything from the old version.


Upgrade Impala separately in CDH through the following steps:

  • Generate a new impala directory in the / opt/cloudera/parcels/CDH/lib directory

  • Copy the following contents of the new version to the corresponding location of the directory: impalad, impala-frontend-0.1-SNAPSHOT.jar, all jar packages dependent on the new version of FE, www directory and libkudu dependent on the new version_

  • Set impala in Impala Service Environment Advanced Configuration Snippet (Safety Valve) configuration of CM_ The home environment variable points to the new directory

  • Restart cluster

Reference documents

Topics: Big Data impala CDH