Tez CDH5.16.4 compilation and installation

Posted by ju8ular1 on Mon, 07 Mar 2022 04:56:29 +0100

Environmental preparation

CentOS 7

apache-maven-3.6.3

hadoop-2.6.0-cdh5.16.2

protobuf-2.5.0 Download: https://github.com/protocolbuffers/protobuf/releases?after=v3.0.0-alpha-4.1

apache-tez-0.9.2-src.tar.gz Download: https://dlcdn.apache.org/tez/0.9.2/

Note: if you use Windows environment to compile, you need to install git. In addition, protobuf-2.5.0 does not need to be installed in Windows environment, but protoc-2.5.0-win32 Zip. The download link is the same. You can find it by turning down carefully. If you compile using Linux environment, no additional software is required.

Protobuf installation configuration

This software is a necessary environment for Tez compilation and must be installed.

Installation in Linux Environment

[root@basecoalmine source]# tar -zxvf protobuf-2.5.0.tar.gz -C ../software/
[root@basecoalmine source]# cd ../software/
[root@basecoalmine software]# cd protobuf-2.5.0/
[root@basecoalmine protobuf-2.5.0]# ./configure
[root@basecoalmine protobuf-2.5.0]# make & make install
# testing environment
[root@basecoalmine protobuf-2.5.0]# protoc --version
libprotoc 2.5.0

Install in Windows Environment

Specify any directory and set protocol-2.5.0-win32 Zip, and then configure the environment variable.

Environment variables:
D:\software\protoc-2.5.0-win32

Test environment:
C:\Users\King>protoc --version libprotoc 2.5.0

Modify compilation

Modify pom file

Unzip apache-tez-0.9.2-src tar. GZ source package, modify the content of pom file.

(1) First modification (line 40):

    <!-- modify hadoop edition -->
    <hadoop.version>2.6.0-cdh5.16.2</hadoop.version>

(2) Second amendment (line 94):

    <!-- add to cdh source -->
    <repository>
      <id>cloudera</id>
      <name>cloudera Repository</name>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
      <snapshots>
            <enabled>false</enabled>
      </snapshots>
    </repository>

(3) Third amendment (line 117):

    <!-- add to cdh source -->
    <pluginRepository>
      <id>cloudera</id>
      <name>cloudera Repository</name>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </pluginRepository>

(4) Fourth amendment (line 779):

These two modules are of little use. If they are not commented out, other compilation environments need to be configured additionally, so they can be directly excluded for convenience.

    <!-- Note the following modules -->
    <!-- <module>tez-ext-service-tests</module>
    <module>tez-ui</module> -->

Modify code

Modified file: D: \ apache-tez-0.9.2-src \ tez MapReduce \ SRC \ main \ Java \ ORG \ Apache \ tez \ MapReduce \ Hadoop \ MapReduce \ jobcontextimpl java

Add method at the end of code:

At first, override was not commented out. An error was reported when compiling, and then it was commented out.

  // @Override
  public boolean userClassesTakesPrecedence() {
    return getJobConf().getBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, false);
  } 

Introduction package at the beginning of code:

import org.apache.tez.mapreduce.hadoop.MRJobConfig;

Compilation and packaging

Enter the root directory of Tez source code and start compiling. If it is a windows environment, you need to execute the following commands in git bash:

mvn clean package -DskipTests=true -Dmaven.test.skip=true -Dmaven.javadoc.skip=true

The compiled package is in apache-tez-0.9.2-src \ tez dist \ target directory.

Software installation

Compiled tez-0.9.2 tar. gz,tez-0.9.2-minimal.tar.gz will be used.

(1) Tez-0.9.2 tar. GZ upload to HDFS Directory: / user/tez /.

[root@basecoalmine software]# hadoop fs -mkdir /user/tez/
[root@basecoalmine software]# hadoop fs -put /opt/source/tez-0.9.2.tar.gz /user/tez/

(2)tez-0.9.2-minimal.tar.gz will be used as a dependency.

[root@basecoalmine software]# mkdir tez-0.9.2
[root@basecoalmine software]# mv tez-0.9.2-minimal.tar.gz ./tez-0.9.2
[root@basecoalmine software]# cd tez-0.9.2
# Unzip the package and delete the original package
[root@basecoalmine tez-0.9.2]# tar -zxvf tez-0.9.2-minimal.tar.gz 
[root@basecoalmine tez-0.9.2]# rm -rf tez-0.9.2-minimal.tar.gz
# Create soft link
[root@basecoalmine tez-0.9.2]# cd /opt/app
[root@basecoalmine app]# ln -s /opt/software/tez-0.9.2 tez
# Create profile directory
[root@basecoalmine app]# mkdir /opt/app/tez/conf

Configuration modification

Tez can be configured in two ways:

Configured in hadoop, mr tasks executed through this hadoop cluster can only be submitted in tez mode, and hive runs on tez by default without other configurations;
In hive configuration, only hive programs can dynamically switch the execution engine, while other mapreduce programs can only run on yarn.

(2) Hadoop integration Tez mode

$hadoop on hadoop's master node_ Create tez site under home / etc / hadoop / XML file, as follows:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <!-- Point here hdfs Upper tez-0.9.2.tar.gz package -->
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/user/tez/tez-0.9.2.tar.gz</value>    
    </property>
    <!-- Tez Runtime read hadoop jar-->
    <property>
         <name>tez.use.cluster.hadoop-libs</name>
         <value>true</value>
    </property>
</configuration>

Modify Yard site XML configuration:

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn-tez</value>
    </property>

Modify yarn env SH configuration, add the following at the end, introduce the jar packages of tez, and load tez dependencies when yarn starts:

export TEZ_HOME=/opt/app/tez   
for jar in `ls $TEZ_HOME |grep jar`; do
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_HOME/lib/$jar
done

Restart hadoop.

functional testing

Create a text file called test Txt, write any fields in the file and upload them to the / tmp directory of HDFS,

Create the result output directory / tmp/out, and execute the following command to test.

[root@basecoalmine tmp]# hadoop jar $TEZ_HOME/tez-examples-0.9.2.jar orderedwordcount /tmp/test.txt /tmp/out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/software/hadoop-2.6.0-cdh5.16.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/software/tez-0.9.2/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/03/04 00:36:52 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.6.0-cdh5.16.2, majorVersion=2, minorVersion=6
22/03/04 00:36:52 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim27, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.6.0-cdh5.16.2, majorVersion=2, minorVersion=6
22/03/04 00:36:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/03/04 00:36:52 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.9.2, revision=${buildNumber}, SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git, buildTime=2022-03-03T09:23:19Z ]
22/03/04 00:36:53 INFO client.RMProxy: Connecting to ResourceManager at basecoalmine/192.168.111.56:8032
22/03/04 00:36:54 INFO examples.OrderedWordCount: Running OrderedWordCount
22/03/04 00:36:54 INFO client.TezClient: Submitting DAG application with id: application_1646372190034_0001
22/03/04 00:36:54 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://basecoalmine:9000/user/tez/tez-0.9.2.tar.gz
22/03/04 00:36:54 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
22/03/04 00:36:54 INFO client.TezClient: Tez system stage directory hdfs://basecoalmine:9000/tmp/root/tez/staging/.tez/application_1646372190034_0001 doesn't exist and is created
22/03/04 00:36:55 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1646372190034_0001, dagName=OrderedWordCount, callerContext={ context=TezExamples, callerType=null, callerId=null }
22/03/04 00:36:56 INFO impl.YarnClientImpl: Submitted application application_1646372190034_0001
22/03/04 00:36:56 INFO client.TezClient: The url to track the Tez AM: http://basecoalmine:8088/proxy/application_1646372190034_0001/
22/03/04 00:37:05 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running
22/03/04 00:37:05 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:05 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:05 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:05 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 33.33% TotalTasks: 3 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 66.67% TotalTasks: 3 Succeeded: 2 Running: 1 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress: 100% TotalTasks: 3 Succeeded: 3 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
22/03/04 00:37:10 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED

(3) Hive integrated Tez mode

Create tez site in the / opt/app/tez/conf directory XML file, as follows:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <!-- Point here hdfs Upper tez-0.9.2.tar.gz package -->
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/user/tez/tez-0.9.2.tar.gz</value>    
    </property>
    <!-- Tez Runtime read hadoop jar-->
    <property>
         <name>tez.use.cluster.hadoop-libs</name>
         <value>true</value>
    </property>
</configuration>

Modify HIV Env SH configuration file, add the following contents, so that tez dependencies can be loaded when hive starts:

export TEZ_HOME=/opt/app/tez
# tez-site.xml Directory
export TEZ_CONF_DIR=/opt/app/tez/conf
# Add tez jar package to hive environment
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
   export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
  export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done

Restart hive

functional testing

-- Increase resources
set hive.tez.container.size=3220;
-- use tez engine
set hive.execution.engine=tez;
-- Create table
create table student(id int, name string);
-- Insert data into a table
insert into student values(1,"zhangsan");
insert into student values(2,"lisi");
-- If there is no error, it means success
select * from student;
select count(*) from student;

 

Topics: hive