CDH6.3.2 detailed steps for upgrading impala 3.2 to impala 3.4

Posted by astarmathsandphysics on Tue, 23 Nov 2021 06:57:06 +0100

Cdh6.3.2 impala 3.2 to impala 3.4 compilation process

Local environment

hardware requirements

  • The CPU must support at least SSSE3

  • Minimum memory: 16GB (64G recommended by the community)

  • Hard disk space: 120GB (for test data)

Linux only operating systems

  • Ubuntu 14.04,16.04,18.04

  • CentOS 7

Compiling environment

  • A set of CDH6.3.2 clusters is deployed on three centos machines
  • A centos machine with the same environment is used to compile Apache Impala 3.4

impala and other component versions support

impala3.4Impala4
hudiYX
icebergXY
Hive2YX

Compile impala

The basic version of impala corresponding to CDH6.3.2 is Apache Impala 3.2. Of course, there are many patches. From the Impala web page, you can see that the version number is 3.2.0-cdh6.3.2

Apache Impala is release d in the form of source code, so it needs to be compiled on the corresponding platform. Find a machine that is consistent with the cluster environment.

Compile Impala according to the chapter "Building Impala without Test Data (for testing Impala)" in the document:
https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
The difference is that we need to compile version 3.4 instead of the latest master branch, so we need to select the version when cloning:

git clone --single-branch --branch 3.4.0 https://github.com/apache/impala.git impala-3.4
cd impala-3.4

Due to the change of Cloudera maven repo URL, pom.xml needs to be modified to compile successfully (IMPALA-9815). We mark the commit of IMPALA-9815: https://github.com/apache/impala/commit/481ea4ab0d476a4aa491f99c2a4e376faddc0b03

git fetch origin 481ea4ab0d476a4aa491f99c2a4e376faddc0b03
git cherry-pick 481ea4ab0d476a4aa491f99c2a4e376faddc0b03

bin/bootstrap_system.sh script installation and compilation dependencies:

export IMPALA_HOME=`pwd`
$IMPALA_HOME/bin/bootstrap_system.sh

bootstrap_system.sh script problem summary

Re execute bootstrap_system.sh script
rm -rf /var/lib/pgsql/*
rm -rf /usr/local/bin/ant
Apache ant * issues
# Modify bootstrap_system.sh ant download address
vim $IMPALA_HOME/bin/bootstrap_system.sh

244   https://downloads.apache.org/ant/binaries/apache-ant-1.10.12-bin.tar.gz
245 redhat sha512sum -c - <<< '2287dc5cfc21043c14e5413f9afb1c87c9f266ec2a9ba2d3bf2285446f6e4ccb59b558bf2e5c57911a05dfa293c7d5c7ad60ac9f744ba11406f4e6f9a27b2403  apache-ant-1.10.12-bin.tar.gz'
246 redhat sudo tar -C /usr/local -xzf apache-ant-1.10.12-bin.tar.gz
247 redhat sudo ln -s /usr/local/apache-ant-1.10.12/bin/ant /usr/local/bin
ivy dependency download problem
# Modify the corresponding configuration file
vim /root/hadoop-lzo/build.xml
96   <property name="ivy_repo_url" value="https://repo.maven.apache.org/maven2/org/apache/ivy/ivy/${ivy.version}/ivy-${ivy.version}.jar"/>
 
vim /root/hadoop-lzo/ivy/ivysettings.xml
15     value="https://repo.maven.apache.org/maven2/"

If you have previously compiled Impala on this machine, you can also skip the above step.

Setting environment variables

source $IMPALA_HOME/bin/impala-config.sh

Start compilation

$IMPALA_HOME/buildall.sh -noclean -notests -release
  • Note: if it is for testing purposes, you can remove - release, so that the compiled impalad can print more information when encountering a bug. For example, the bug can be determined by DCHECK in advance, which is easier to locate

Summary of buildall.sh compilation problems

logredactor download problem
wget  https://repository.cloudera.com/artifactory/cloudera-repos/org/cloudera/logredactor/logredactor/2.0.7/logredactor-2.0.7.jar
wget  https://repository.cloudera.com/artifactory/cloudera-repos/org/cloudera/logredactor/logredactor/2.0.7/logredactor-2.0.7.pom

Compiled successfully view

During the compilation process, many dependencies need to be downloaded, and the speed of VPN will be much faster. I spent about an hour compiling here. After compiling, I can find the impalad executable file and impala frontend jar package:

$ ll -h be/build/latest/service/impalad fe/target/impala-frontend-0.1-SNAPSHOT.jar
-rwxrwxr-x 1 root root 460M 6 November 20:30 be/build/latest/service/impalad*
-rw-rw-r-- 1 root root 7.5M 6 November 20:33 fe/target/impala-frontend-0.1-SNAPSHOT.jar
$ strings be/build/latest/service/impalad | grep 3.4.0
3.4.0-RELEASE

In the last instruction above, you should find strings such as 3.4.0-RELEASE in the impalad executable file. The compiled impalad executable file has more than 400 M, because it contains a lot of symbol information. You can use strip -- strip debug impalad to reduce its size

impala uses static compilation by default, but there are still some dynamic dependencies. Check with the ldd instruction:

# ldd beild/latestrvice/impalad 
beild/latestrvice/impalad: b64bstdc++.so.6: version `CXXABI_1.3.8' not found (required by beild/latestrvice/impalad)
beild/latestrvice/impalad: b64bstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by beild/latestrvice/impalad)
beild/latestrvice/impalad: b64bstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /root/impala-3.4/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0)
beild/latestrvice/impalad: b64bstdc++.so.6: version `CXXABI_1.3.8' not found (required by /root/impala-3.4/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0)
	linux-vdso.so.1 =>  (0x00007ffcd11d0000)
	libjsig.so => /usrb/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64/jrebd64bjsig.so (0x00007f987c937000)
	libpthread.so.0 => b64bpthread.so.0 (0x00007f987c71b000)
	libsasl2.so.3 => b64bsasl2.so.3 (0x00007f987c4fe000)
	libjvm.so => not found
	libkudu_client.so.0 => /root/impala-3.4/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0 (0x00007f987bd7f000)
	librt.so.1 => b64brt.so.1 (0x00007f987bb77000)
	libdl.so.2 => b64bdl.so.2 (0x00007f987b973000)
	libssl.so.10 => b64bssl.so.10 (0x00007f987b701000)
	libcrypto.so.10 => b64bcrypto.so.10 (0x00007f987b29e000)
	libkrb5.so.3 => b64bkrb5.so.3 (0x00007f987afb5000)
	libgssapi_krb5.so.2 => b64bgssapi_krb5.so.2 (0x00007f987ad68000)
	libstdc++.so.6 => b64bstdc++.so.6 (0x00007f987aa60000)
	libm.so.6 => b64bm.so.6 (0x00007f987a75e000)
	libgcc_s.so.1 => b64bgcc_s.so.1 (0x00007f987a548000)
	libc.so.6 => b64bc.so.6 (0x00007f987a17a000)
	b64/ld-linux-x86-64.so.2 (0x00007f987cb3b000)
	libresolv.so.2 => b64bresolv.so.2 (0x00007f9879f60000)
	libcrypt.so.1 => b64bcrypt.so.1 (0x00007f9879d29000)
	libk5crypto.so.3 => b64bk5crypto.so.3 (0x00007f9879af6000)
	libcom_err.so.2 => b64bcom_err.so.2 (0x00007f98798f2000)
	libkrb5support.so.0 => b64bkrb5support.so.0 (0x00007f98796e2000)
	libz.so.1 => b64bz.so.1 (0x00007f98794cc000)
	libkeyutils.so.1 => b64bkeyutils.so.1 (0x00007f98792c8000)
	libfreebl3.so => b64bfreebl3.so (0x00007f98790c5000)
	libselinux.so.1 => b64bselinux.so.1 (0x00007f9878e9e000)
	libpcre.so.1 => b64bpcre.so.1 (0x00007f9878c3c000)

Most of these so files are self-contained or installed in the system. We just need to copy those related to the Impala version, such as libkudu_client.so.0, others do not need to be copied together.

  • Note: replace libkudu_client.so.0 is because impala 3.4 uses functions not supported by the kudu client of impala 3.2. Impala 3.3 began to support column annotation of kudu table and upgraded the dependent kudu client version. See IMPALA-5351 for details. If this function is not used, it is said that libkudu can not be replaced_ Client. So. 0 (I didn't verify it).

Deployment study

View CDH cluster executable

Log in to the cluster machine and check the executable files used by impalad, statestore and catalogd. You can find their startup commands with instructions like "ps aux | grep catalogd"

/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/catalogd --flagfile=/var/run/cloudera-scm-agent/process/39-impala-CATALOGSERVER/impala-conf/catalogserver_flags
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/statestored --flagfile=/var/run/cloudera-scm-agent/process/38-impala-STATESTORE/impala-conf/state_store_flags
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad --flagfile=/var/run/cloudera-scm-agent/process/37-impala-IMPALAD/impala-conf/impalad_flags

Old Impala directory

The flagfile used here is generated by CM. We don't need to manage it. We just need to let CM use a new executable file. Let's generate a directory with the same directory structure as / opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala, and then set impala in CM_ Use it with the home environment variable.

# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala
drwxr-xr-x 2 root root   113 11 September 2019 bin
drwxr-xr-x 2 root root    36 11 September 2019 cloudera
drwxr-xr-x 2 root root 16384 11 September 2019 lib
lrwxrwxrwx 1 root root    11 11 September 2019 sbin -> sbin-retail
drwxr-xr-x 2 root root    56 11 September 2019 sbin-debug
drwxr-xr-x 2 root root    56 11 September 2019 sbin-retail
drwxr-xr-x 7 root root   158 11 September 2019 toolchain
drwxr-xr-x 7 root root  4096 11 September 2019 www

bin directory

The bin directory is mainly some diagnostic scripts. Don't worry

# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/bin/
-rwxr-xr-x 1 root root 25189 11 September 2019 collect_diagnostics.py
-rwxr-xr-x 1 root root  9183 11 September 2019 collect_minidumps.py
-rwxr-xr-x 1 root root  2013 11 September 2019 collect_shared_libs.sh
-rw-r--r-- 1 root root     0 11 September 2019 __init__.py

cloudera directory

It's version information. Don't worry. lib contains some soft chains of jar packages and dependent so files, which should be copied.
The sbin directory points to sbin retail, which mainly contains impalad executable files. catalogd and statestore are soft chains and do not need to be copied:

# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/cloudera/
-rw-r--r-- 1 root root 515 11 September 2019 cdh_version.properties

toolchain directory

All the directories are statically linked to impalad executable files. I don't know how to carry them. Don't worry:

# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/toolchain/
drwxr-xr-x 3 root root 19 11 September 2019 breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2
drwxr-xr-x 3 root root 17 11 September 2019 cmake-3.8.2-p1
drwxr-xr-x 3 root root 21 11 September 2019 llvm-5.0.1-asserts-p1
drwxr-xr-x 3 root root 21 11 September 2019 llvm-5.0.1-p1
drwxr-xr-x 3 root root 19 11 September 2019 orc-1.5.5-p1

www directory

The directory is for web pages and needs to be updated.

Dependent dynamic link library

Finally, confirm the dependent DLL with ldd instruction

# ldd be/build/latest/service/impalad
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad)
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad)
        linux-vdso.so.1 =>  (0x00007ffd191ae000)
        libjsig.so => not found
        libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f7405613000)
        libjvm.so => not found
        libkudu_client.so.0 => not found
        librt.so.1 => /lib64/librt.so.1 (0x00007f740540a000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f7405206000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f7404fb9000)
        libssl.so.10 => /lib64/libssl.so.10 (0x00007f7404d46000)
        libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f74048e5000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f74045fd000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f74042f4000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f7403ff2000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7403ddc000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7403bbf000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f74037fc000)
        /lib64/ld-linux-x86-64.so.2 (0x0000560d3b6df000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f74035e2000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f74033aa000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f7403177000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f7402f73000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f7402d64000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f7402b60000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f740294a000)
        libfreebl3.so => /lib64/libfreebl3.so (0x00007f7402746000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f740251f000)
        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f74022bc000)

Three of them cannot be found because some environment variables are not configured. For example, libjsig.so and libjvm.so are provided by java. CM can find them by configuring java related environment variables. We can see that most so files are available in the system. We only need to copy the libkudu that the new version depends on_ Client. So. 0 will do

Generate new Impala directory

First copy a copy of the original directory, and then change it based on it.

# cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib
# cp -r impala apache-impala-3.4
# cd apache-impala-3.4

lib directory

Delete all the jar packages in the lib directory, leaving the so file

# rm lib/*.jar
# ll lib
-rw-r--r-- 1 root root   89864 Jun 19 22:37 libgcc_s.so.1
lrwxrwxrwx 1 root root      36 Jun 19 22:37 libhadoop.so -> ../../hadoop/lib/native/libhadoop.so
lrwxrwxrwx 1 root root      42 Jun 19 22:37 libhadoop.so.1.0.0 -> ../../hadoop/lib/native/libhadoop.so.1.0.0
-rw-r--r-- 1 root root 6638528 Jun 19 22:37 libkudu_client.so.0
-rw-r--r-- 1 root root 6638528 Jun 19 22:37 libkudu_client.so.0.1.0
-rw-r--r-- 1 root root 1003416 Jun 19 22:37 libstdc++.so.6
-rw-r--r-- 1 root root 1003424 Jun 19 22:37 libstdc++.so.6.0.20
libkudu_client.so.0

libkudu_client.so.0 is replaced by the one we used when compiling Impala 3.4. From the previous ldd output, you can see in $IMPALA_HOME/toolchain/kudu-4ed0dbbd1/debug/lib/libkudu_client.so.0, other so files need not be controlled

impala-3.4 dependent ar

The jar packages dependent on impala-3.4 are also copied into the lib directory. They can be found in the compilation directory. The specific path is $IMPALA_HOME/fe/target/dependency/

impala-frontend

Put the impala-frontend-0.1-SNAPSHOT.jar compiled from impala-3.4 into the lib directory. The path in the compilation directory is fe/target/impala-frontend-0.1-SNAPSHOT.jar

impala-data-source-api

Put impala-data-source-api-1.0-SNAPSHOT.jar compiled from impala-3.4 into the lib directory. The path in the compilation directory is ext data source / API / target / impala-data-source-api-1.0-SNAPSHOT.jar. In fact, you can also copy the original CDH6.3.2, because the external data source has not been updated for several years

SBIN retail directory

Replace the impalad inside with the impalad compiled by apache impala 3.4. The path in the compilation directory is be/build/latest/service/impalad
Check whether the catalogd and statestore soft chains point to impalad:

# ll sbin-retail
lrwxrwxrwx 1 root root         7 Jun 19 22:37 catalogd -> impalad*
-rwxr-xr-x 1 root root 481420800 Jun 20 00:06 impalad*
lrwxrwxrwx 1 root root         7 Jun 19 22:37 statestored -> impalad*

www directory

This directory is used by WebUI. Delete the old version and copy the new version.

New impala dependency summary

$IMPALA_HOME/fe/target/dependency/
$IMPALA_HOME/www/
$IMPALA_HOME/be/build/latest/service/impalad
$IMPALA_HOME/fe/target/impala-frontend-0.1-SNAPSHOT.jar
$IMPALA_HOME/ext-data-source/api/target/impala-data-source-api-1.0-SNAPSHOT.jar
$IMPALA_HOME/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0

Change CM configuration and restart

Put the new Impala directory on all machines and make sure they are consistent. Then go to Impala - > configuration - > env in CM and add an environment variable IMPALA_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/apache-impala-3.4

Then restart the entire Impala cluster. After the restart is successful, you can see that catalogd, statestore and impalad enable the new page, and the version number is 3.4

Verification and rollback

Finally, verify whether the cluster works normally, including some new functions of Impala 3.4. If there is any incompatibility, roll back the CM configuration and restart the Impala cluster, because we haven't moved anything from the old version.

summary

Upgrade Impala separately in CDH through the following steps:

  • Generate a new impala directory in the / opt/cloudera/parcels/CDH/lib directory

  • Copy the following contents of the new version to the corresponding location of the directory: impalad, impala-frontend-0.1-SNAPSHOT.jar, all jar packages dependent on the new version of FE, www directory and libkudu dependent on the new version_ client.so.0

  • Set impala in Impala Service Environment Advanced Configuration Snippet (Safety Valve) configuration of CM_ The home environment variable points to the new directory

  • Restart cluster

Reference documents

https://www.icode9.com/content-4-883156.html

https://blog.csdn.net/huang_quanlong/article/details/106868826

Topics: Big Data impala CDH