Cdh6.3.2 impala 3.2 to impala 3.4 compilation process
Local environment
hardware requirements
-
The CPU must support at least SSSE3
-
Minimum memory: 16GB (64G recommended by the community)
-
Hard disk space: 120GB (for test data)
Linux only operating systems
-
Ubuntu 14.04,16.04,18.04
-
CentOS 7
Compiling environment
- A set of CDH6.3.2 clusters is deployed on three centos machines
- A centos machine with the same environment is used to compile Apache Impala 3.4
impala and other component versions support
impala3.4 | Impala4 | |
---|---|---|
hudi | Y | X |
iceberg | X | Y |
Hive2 | Y | X |
Compile impala
The basic version of impala corresponding to CDH6.3.2 is Apache Impala 3.2. Of course, there are many patches. From the Impala web page, you can see that the version number is 3.2.0-cdh6.3.2
Apache Impala is release d in the form of source code, so it needs to be compiled on the corresponding platform. Find a machine that is consistent with the cluster environment.
Compile Impala according to the chapter "Building Impala without Test Data (for testing Impala)" in the document:
https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
The difference is that we need to compile version 3.4 instead of the latest master branch, so we need to select the version when cloning:
git clone --single-branch --branch 3.4.0 https://github.com/apache/impala.git impala-3.4 cd impala-3.4
Due to the change of Cloudera maven repo URL, pom.xml needs to be modified to compile successfully (IMPALA-9815). We mark the commit of IMPALA-9815: https://github.com/apache/impala/commit/481ea4ab0d476a4aa491f99c2a4e376faddc0b03
git fetch origin 481ea4ab0d476a4aa491f99c2a4e376faddc0b03 git cherry-pick 481ea4ab0d476a4aa491f99c2a4e376faddc0b03
bin/bootstrap_system.sh script installation and compilation dependencies:
export IMPALA_HOME=`pwd` $IMPALA_HOME/bin/bootstrap_system.sh
bootstrap_system.sh script problem summary
Re execute bootstrap_system.sh script
rm -rf /var/lib/pgsql/* rm -rf /usr/local/bin/ant
Apache ant * issues
# Modify bootstrap_system.sh ant download address vim $IMPALA_HOME/bin/bootstrap_system.sh 244 https://downloads.apache.org/ant/binaries/apache-ant-1.10.12-bin.tar.gz 245 redhat sha512sum -c - <<< '2287dc5cfc21043c14e5413f9afb1c87c9f266ec2a9ba2d3bf2285446f6e4ccb59b558bf2e5c57911a05dfa293c7d5c7ad60ac9f744ba11406f4e6f9a27b2403 apache-ant-1.10.12-bin.tar.gz' 246 redhat sudo tar -C /usr/local -xzf apache-ant-1.10.12-bin.tar.gz 247 redhat sudo ln -s /usr/local/apache-ant-1.10.12/bin/ant /usr/local/bin
ivy dependency download problem
# Modify the corresponding configuration file vim /root/hadoop-lzo/build.xml 96 <property name="ivy_repo_url" value="https://repo.maven.apache.org/maven2/org/apache/ivy/ivy/${ivy.version}/ivy-${ivy.version}.jar"/> vim /root/hadoop-lzo/ivy/ivysettings.xml 15 value="https://repo.maven.apache.org/maven2/"
If you have previously compiled Impala on this machine, you can also skip the above step.
Setting environment variables
source $IMPALA_HOME/bin/impala-config.sh
Start compilation
$IMPALA_HOME/buildall.sh -noclean -notests -release
- Note: if it is for testing purposes, you can remove - release, so that the compiled impalad can print more information when encountering a bug. For example, the bug can be determined by DCHECK in advance, which is easier to locate
Summary of buildall.sh compilation problems
logredactor download problem
wget https://repository.cloudera.com/artifactory/cloudera-repos/org/cloudera/logredactor/logredactor/2.0.7/logredactor-2.0.7.jar wget https://repository.cloudera.com/artifactory/cloudera-repos/org/cloudera/logredactor/logredactor/2.0.7/logredactor-2.0.7.pom
Compiled successfully view
During the compilation process, many dependencies need to be downloaded, and the speed of VPN will be much faster. I spent about an hour compiling here. After compiling, I can find the impalad executable file and impala frontend jar package:
$ ll -h be/build/latest/service/impalad fe/target/impala-frontend-0.1-SNAPSHOT.jar -rwxrwxr-x 1 root root 460M 6 November 20:30 be/build/latest/service/impalad* -rw-rw-r-- 1 root root 7.5M 6 November 20:33 fe/target/impala-frontend-0.1-SNAPSHOT.jar $ strings be/build/latest/service/impalad | grep 3.4.0 3.4.0-RELEASE
In the last instruction above, you should find strings such as 3.4.0-RELEASE in the impalad executable file. The compiled impalad executable file has more than 400 M, because it contains a lot of symbol information. You can use strip -- strip debug impalad to reduce its size
impala uses static compilation by default, but there are still some dynamic dependencies. Check with the ldd instruction:
# ldd beild/latestrvice/impalad beild/latestrvice/impalad: b64bstdc++.so.6: version `CXXABI_1.3.8' not found (required by beild/latestrvice/impalad) beild/latestrvice/impalad: b64bstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by beild/latestrvice/impalad) beild/latestrvice/impalad: b64bstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /root/impala-3.4/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0) beild/latestrvice/impalad: b64bstdc++.so.6: version `CXXABI_1.3.8' not found (required by /root/impala-3.4/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0) linux-vdso.so.1 => (0x00007ffcd11d0000) libjsig.so => /usrb/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64/jrebd64bjsig.so (0x00007f987c937000) libpthread.so.0 => b64bpthread.so.0 (0x00007f987c71b000) libsasl2.so.3 => b64bsasl2.so.3 (0x00007f987c4fe000) libjvm.so => not found libkudu_client.so.0 => /root/impala-3.4/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0 (0x00007f987bd7f000) librt.so.1 => b64brt.so.1 (0x00007f987bb77000) libdl.so.2 => b64bdl.so.2 (0x00007f987b973000) libssl.so.10 => b64bssl.so.10 (0x00007f987b701000) libcrypto.so.10 => b64bcrypto.so.10 (0x00007f987b29e000) libkrb5.so.3 => b64bkrb5.so.3 (0x00007f987afb5000) libgssapi_krb5.so.2 => b64bgssapi_krb5.so.2 (0x00007f987ad68000) libstdc++.so.6 => b64bstdc++.so.6 (0x00007f987aa60000) libm.so.6 => b64bm.so.6 (0x00007f987a75e000) libgcc_s.so.1 => b64bgcc_s.so.1 (0x00007f987a548000) libc.so.6 => b64bc.so.6 (0x00007f987a17a000) b64/ld-linux-x86-64.so.2 (0x00007f987cb3b000) libresolv.so.2 => b64bresolv.so.2 (0x00007f9879f60000) libcrypt.so.1 => b64bcrypt.so.1 (0x00007f9879d29000) libk5crypto.so.3 => b64bk5crypto.so.3 (0x00007f9879af6000) libcom_err.so.2 => b64bcom_err.so.2 (0x00007f98798f2000) libkrb5support.so.0 => b64bkrb5support.so.0 (0x00007f98796e2000) libz.so.1 => b64bz.so.1 (0x00007f98794cc000) libkeyutils.so.1 => b64bkeyutils.so.1 (0x00007f98792c8000) libfreebl3.so => b64bfreebl3.so (0x00007f98790c5000) libselinux.so.1 => b64bselinux.so.1 (0x00007f9878e9e000) libpcre.so.1 => b64bpcre.so.1 (0x00007f9878c3c000)
Most of these so files are self-contained or installed in the system. We just need to copy those related to the Impala version, such as libkudu_client.so.0, others do not need to be copied together.
- Note: replace libkudu_client.so.0 is because impala 3.4 uses functions not supported by the kudu client of impala 3.2. Impala 3.3 began to support column annotation of kudu table and upgraded the dependent kudu client version. See IMPALA-5351 for details. If this function is not used, it is said that libkudu can not be replaced_ Client. So. 0 (I didn't verify it).
Deployment study
View CDH cluster executable
Log in to the cluster machine and check the executable files used by impalad, statestore and catalogd. You can find their startup commands with instructions like "ps aux | grep catalogd"
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/catalogd --flagfile=/var/run/cloudera-scm-agent/process/39-impala-CATALOGSERVER/impala-conf/catalogserver_flags /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/statestored --flagfile=/var/run/cloudera-scm-agent/process/38-impala-STATESTORE/impala-conf/state_store_flags /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad --flagfile=/var/run/cloudera-scm-agent/process/37-impala-IMPALAD/impala-conf/impalad_flags
Old Impala directory
The flagfile used here is generated by CM. We don't need to manage it. We just need to let CM use a new executable file. Let's generate a directory with the same directory structure as / opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala, and then set impala in CM_ Use it with the home environment variable.
# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala drwxr-xr-x 2 root root 113 11 September 2019 bin drwxr-xr-x 2 root root 36 11 September 2019 cloudera drwxr-xr-x 2 root root 16384 11 September 2019 lib lrwxrwxrwx 1 root root 11 11 September 2019 sbin -> sbin-retail drwxr-xr-x 2 root root 56 11 September 2019 sbin-debug drwxr-xr-x 2 root root 56 11 September 2019 sbin-retail drwxr-xr-x 7 root root 158 11 September 2019 toolchain drwxr-xr-x 7 root root 4096 11 September 2019 www
bin directory
The bin directory is mainly some diagnostic scripts. Don't worry
# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/bin/ -rwxr-xr-x 1 root root 25189 11 September 2019 collect_diagnostics.py -rwxr-xr-x 1 root root 9183 11 September 2019 collect_minidumps.py -rwxr-xr-x 1 root root 2013 11 September 2019 collect_shared_libs.sh -rw-r--r-- 1 root root 0 11 September 2019 __init__.py
cloudera directory
It's version information. Don't worry. lib contains some soft chains of jar packages and dependent so files, which should be copied.
The sbin directory points to sbin retail, which mainly contains impalad executable files. catalogd and statestore are soft chains and do not need to be copied:
# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/cloudera/ -rw-r--r-- 1 root root 515 11 September 2019 cdh_version.properties
toolchain directory
All the directories are statically linked to impalad executable files. I don't know how to carry them. Don't worry:
# ll /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/toolchain/ drwxr-xr-x 3 root root 19 11 September 2019 breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2 drwxr-xr-x 3 root root 17 11 September 2019 cmake-3.8.2-p1 drwxr-xr-x 3 root root 21 11 September 2019 llvm-5.0.1-asserts-p1 drwxr-xr-x 3 root root 21 11 September 2019 llvm-5.0.1-p1 drwxr-xr-x 3 root root 19 11 September 2019 orc-1.5.5-p1
www directory
The directory is for web pages and needs to be updated.
Dependent dynamic link library
Finally, confirm the dependent DLL with ldd instruction
# ldd be/build/latest/service/impalad /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad) /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/impala/sbin-retail/impalad) linux-vdso.so.1 => (0x00007ffd191ae000) libjsig.so => not found libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f7405613000) libjvm.so => not found libkudu_client.so.0 => not found librt.so.1 => /lib64/librt.so.1 (0x00007f740540a000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f7405206000) libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f7404fb9000) libssl.so.10 => /lib64/libssl.so.10 (0x00007f7404d46000) libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f74048e5000) libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f74045fd000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f74042f4000) libm.so.6 => /lib64/libm.so.6 (0x00007f7403ff2000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7403ddc000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7403bbf000) libc.so.6 => /lib64/libc.so.6 (0x00007f74037fc000) /lib64/ld-linux-x86-64.so.2 (0x0000560d3b6df000) libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f74035e2000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f74033aa000) libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f7403177000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f7402f73000) libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f7402d64000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f7402b60000) libz.so.1 => /lib64/libz.so.1 (0x00007f740294a000) libfreebl3.so => /lib64/libfreebl3.so (0x00007f7402746000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f740251f000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f74022bc000)
Three of them cannot be found because some environment variables are not configured. For example, libjsig.so and libjvm.so are provided by java. CM can find them by configuring java related environment variables. We can see that most so files are available in the system. We only need to copy the libkudu that the new version depends on_ Client. So. 0 will do
Generate new Impala directory
First copy a copy of the original directory, and then change it based on it.
# cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib # cp -r impala apache-impala-3.4 # cd apache-impala-3.4
lib directory
Delete all the jar packages in the lib directory, leaving the so file
# rm lib/*.jar # ll lib -rw-r--r-- 1 root root 89864 Jun 19 22:37 libgcc_s.so.1 lrwxrwxrwx 1 root root 36 Jun 19 22:37 libhadoop.so -> ../../hadoop/lib/native/libhadoop.so lrwxrwxrwx 1 root root 42 Jun 19 22:37 libhadoop.so.1.0.0 -> ../../hadoop/lib/native/libhadoop.so.1.0.0 -rw-r--r-- 1 root root 6638528 Jun 19 22:37 libkudu_client.so.0 -rw-r--r-- 1 root root 6638528 Jun 19 22:37 libkudu_client.so.0.1.0 -rw-r--r-- 1 root root 1003416 Jun 19 22:37 libstdc++.so.6 -rw-r--r-- 1 root root 1003424 Jun 19 22:37 libstdc++.so.6.0.20
libkudu_client.so.0
libkudu_client.so.0 is replaced by the one we used when compiling Impala 3.4. From the previous ldd output, you can see in $IMPALA_HOME/toolchain/kudu-4ed0dbbd1/debug/lib/libkudu_client.so.0, other so files need not be controlled
impala-3.4 dependent ar
The jar packages dependent on impala-3.4 are also copied into the lib directory. They can be found in the compilation directory. The specific path is $IMPALA_HOME/fe/target/dependency/
impala-frontend
Put the impala-frontend-0.1-SNAPSHOT.jar compiled from impala-3.4 into the lib directory. The path in the compilation directory is fe/target/impala-frontend-0.1-SNAPSHOT.jar
impala-data-source-api
Put impala-data-source-api-1.0-SNAPSHOT.jar compiled from impala-3.4 into the lib directory. The path in the compilation directory is ext data source / API / target / impala-data-source-api-1.0-SNAPSHOT.jar. In fact, you can also copy the original CDH6.3.2, because the external data source has not been updated for several years
SBIN retail directory
Replace the impalad inside with the impalad compiled by apache impala 3.4. The path in the compilation directory is be/build/latest/service/impalad
Check whether the catalogd and statestore soft chains point to impalad:
# ll sbin-retail lrwxrwxrwx 1 root root 7 Jun 19 22:37 catalogd -> impalad* -rwxr-xr-x 1 root root 481420800 Jun 20 00:06 impalad* lrwxrwxrwx 1 root root 7 Jun 19 22:37 statestored -> impalad*
www directory
This directory is used by WebUI. Delete the old version and copy the new version.
New impala dependency summary
$IMPALA_HOME/fe/target/dependency/ $IMPALA_HOME/www/ $IMPALA_HOME/be/build/latest/service/impalad $IMPALA_HOME/fe/target/impala-frontend-0.1-SNAPSHOT.jar $IMPALA_HOME/ext-data-source/api/target/impala-data-source-api-1.0-SNAPSHOT.jar $IMPALA_HOME/toolchain/kudu-4ed0dbbd1/debugb64bkudu_client.so.0
Change CM configuration and restart
Put the new Impala directory on all machines and make sure they are consistent. Then go to Impala - > configuration - > env in CM and add an environment variable IMPALA_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/apache-impala-3.4
Then restart the entire Impala cluster. After the restart is successful, you can see that catalogd, statestore and impalad enable the new page, and the version number is 3.4
Verification and rollback
Finally, verify whether the cluster works normally, including some new functions of Impala 3.4. If there is any incompatibility, roll back the CM configuration and restart the Impala cluster, because we haven't moved anything from the old version.
summary
Upgrade Impala separately in CDH through the following steps:
-
Generate a new impala directory in the / opt/cloudera/parcels/CDH/lib directory
-
Copy the following contents of the new version to the corresponding location of the directory: impalad, impala-frontend-0.1-SNAPSHOT.jar, all jar packages dependent on the new version of FE, www directory and libkudu dependent on the new version_ client.so.0
-
Set impala in Impala Service Environment Advanced Configuration Snippet (Safety Valve) configuration of CM_ The home environment variable points to the new directory
-
Restart cluster
Reference documents
https://www.icode9.com/content-4-883156.html
https://blog.csdn.net/huang_quanlong/article/details/106868826