Flex compiles the parcel package and integrates it into cdh6 three point two

Posted by Gruessle on Tue, 01 Feb 2022 06:48:48 +0100

Flex-1.12 parse compilation

This paper mainly integrates flink into cdh-6.3.2

1 compilation environment preparation

1.1 install jdk1 eight

slightly

1.2 installing git

yum install git -y

1.3 installing maven

  • Install maven
# If the network is blocked, you can download and upload manually
wget http://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
tar xzvf apache-maven-3.6.3-bin.tar.gz -C /usr/local/
mv /usr/local/apache-maven-3.6.3/ /usr/local/maven
vim /etc/profile
export M2_HOME=/usr/local/maven
export PATH=$PATH:$M2_HOME/bin:
source /etc/profile
mvn -v

  • Configure storage path and mvn Library
vim /usr/local/maven/conf/setting.xml
# Modify this configuration
<localRepository>/home/hadoop-wd@gridsum.com/guobin/flink-parcel/maven_repo</localRepository>
# Add configuration
<mirrors>
    <mirror>
      <id>alimaven</id>
      <mirrorOf>central</mirrorOf>
      <name>aliyun maven</name>
      <url>http://maven.aliyun.com/nexus/content/repositories/central/</url>
    </mirror>
    <mirror>
      <id>alimaven</id>
      <name>aliyun maven</name>
      <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
      <mirrorOf>*,!cloudera</mirrorOf>
    </mirror>
    <mirror>
      <id>central</id>
      <name>Maven Repository Switchboard</name>
      <url>http://repo1.maven.org/maven2/</url>
      <mirrorOf>central</mirrorOf>
    </mirror>
    <mirror>
      <id>repo2</id>
      <mirrorOf>central</mirrorOf>
      <name>Human Readable Name for this Mirror.</name>
      <url>http://repo2.maven.org/maven2/</url>
    </mirror>
    <mirror>
      <id>ibiblio</id>
      <mirrorOf>central</mirrorOf>
      <name>Human Readable Name for this Mirror.</name>
      <url>http://mirrors.ibiblio.org/pub/mirrors/maven2/</url>
    </mirror>
    <mirror>
      <id>jboss-public-repository-group</id>
      <mirrorOf>central</mirrorOf>
      <name>JBoss Public Repository Group</name>
      <url>http://repository.jboss.org/nexus/content/groups/public</url>
    </mirror>
    <mirror>
      <id>google-maven-central</id>
      <name>Google Maven Central</name>
      <url>https://maven-central.storage.googleapis.com</url>
      <mirrorOf>central</mirrorOf>
    </mirror>
    <!-- Image of central warehouse in China -->
    <mirror>
      <id>maven.net.cn</id>
      <name>oneof the central mirrors in china</name>
      <url>http://maven.net.cn/content/groups/public/</url>
      <mirrorOf>central</mirrorOf>
    </mirror>
</mirrors>

1.4 installing Node

Compile the Flink runtime web module

wget https://nodejs.org/dist/v12.19.0/node-v12.19.0-linux-x64.tar.xz
tar -xvf node-v12.19.0-linux-x64.tar.xz

ln -s /opt/build_flink/node-v12.19.0-linux-x64/bin/npm   /usr/local/bin/ 
ln -s /opt/build_flink/node-v12.19.0-linux-x64/bin/node   /usr/local/bin/
node -v
npm -v

2. Flex source code compilation

2.1 flink-shaded

2.1.1 download the Flink shade package

release-10.0 can be selected here, and release-9.0 is in flink1.0 There is no relevant common package in the integration of version 11. The flex-shaded-hadoop-2 module has been removed in release-11.0 FLINK-17685

wget https://github.com/apache/flink-shaded/archive/release-10.0.zip
tar -xvf release-10.0.zip
cd flink-shaded-release-10.0/

2.1.2 modify pom package

Modify pom dependency

vim pom.xml # Add the following content to < / profiles >
<profile>
    <id>vendor-repos</id>
    <activation>
    <property>
    <name>vendor-repos</name>
    </property>
    </activation>
    <!-- Add vendor maven repositories -->
    <repositories>
    <!-- Cloudera -->
    <repository>
    <id>cloudera-releases</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
    <releases>
    <enabled>true</enabled>
    </releases>
    <snapshots>
    <enabled>false</enabled>
    </snapshots>
    </repository>
    <!-- Hortonworks -->
    <repository>
    <id>HDPReleases</id>
    <name>HDP Releases</name>
    <url>https://repo.hortonworks.com/content/repositories/releases/</url>
    <snapshots><enabled>false</enabled></snapshots>
    <releases><enabled>true</enabled></releases>
    </repository>
    <repository>
    <id>HortonworksJettyHadoop</id>
    <name>HDP Jetty</name>
    <url>https://repo.hortonworks.com/content/repositories/jetty-hadoop</url>
    <snapshots><enabled>false</enabled></snapshots>
    <releases><enabled>true</enabled></releases>
    </repository>
    <!-- MapR -->
    <repository>
    <id>mapr-releases</id>
    <url>https://repository.mapr.com/maven/</url>
    <snapshots><enabled>false</enabled></snapshots>
    <releases><enabled>true</enabled></releases>
    </repository>
    </repositories>
</profile>

Execute compilation

mvn -T4C clean install -DskipTests -Pvendor-repos -Dhadoop.version=3.0.0-cdh6.3.2 -Dscala-2.11 -Drat.skip=true

The generated file is in the current directory under the following directory: flink-shaded-hadoop-2-parent/flink-shaded-hadoop-2-uber/target /

2.2 flink

2.2.1 download the flink package

Download the corresponding version of flink, Manual Download

wget https://github.com/apache/flink/archive/release-1.12.4.zip

Then decompress

unzip release-1.12.4.zip

2.2.2 modify pom

Modify flick POM xml

cd flink-release-1.12.4/
vim pom.xml # Add the following configuration
<repositories>
		<repository>
			<id>cloudera</id>
			<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
		</repository>
	</repositories>

Modify flick runtime web POM XML configuration. In the tag, modify the groupId to com github. The value under the label of eirslet is the value provided below

cd flink-runtime-web
vim pom.xml 
<groupId>com.github.eirslett</groupId>
                                <artifactId>frontend-maven-plugin</artifactId>
                                <version>1.6</version>
                                <executions>
                                        <execution>
                                                <id>install node and npm</id>
                                                <goals>
                                                        <goal>install-node-and-npm</goal>
                                                </goals>
                                                <configuration>
                                                        <nodeDownloadRoot>http://npm.taobao.org/mirrors/node/</nodeDownloadRoot>
                                                        <npmDownloadRoot>https://registry.npmjs.org/npm/-/</npmDownloadRoot>
                                                        <nodeVersion>v10.9.0</nodeVersion>
                                                </configuration>
                                        </execution>
                                        <execution>
                                                <id>npm install</id>
                                                <goals>
                                                        <goal>npm</goal>
                                                </goals>
                                                <configuration>
                                                        <arguments>install -registry=https://registry.npm.taobao.org --cache-max=0 --no-save</arguments>
                                                        <environmentVariables>
                                                                <HUSKY_SKIP_INSTALL>true</HUSKY_SKIP_INSTALL>
                                                        </environmentVariables>
                                                </configuration>
                                        </execution>


2.2.3 start compilation

cd ../ # Back to flick-release-1.12.4
mvn clean install -DskipTests -Dfast -Drat.skip=true -Dhaoop.version=3.0.0-cdh6.3.2 -Pvendor-repos -Dinclude-hadoop -Dscala-2.11 -T4C

2.2.4 adding dependent packages

# You need to add a dependent package to the compiled package and add it to $path / flick-release-1.12.4/flick-dist/target/flick-1.12.4-bin/flick-1.12.4/lib
# libfb303-0.9.3. Jar (self prepared), hive-exec-2.1.1-cdh6 3.2. Jar (self prepared)
# flink-connectors/flink-connector-hive/target/flink-connector-hive_ 2.11-1.12.4. Jar (in the flink-release-1.12.4 directory)
# flink-shaded-hadoop-2-uber-3.0.0-cdh6.3.2-10.0.jar (packaged and generated in the previous step)
# flink-libraries/flink-state-processing-api/target/flink-state-processor-api_ 2.11-1.12.4. Jar (in the flink-release-1.12.4 directory)

2.2.5 packaging

Package the compiled flink package and use it later in making flink parcel

cd $PATH/flink-release-1.12.4/flink-dist/target/flink-1.12.4-bin/
tar zvcf flink-1.12.4-bin_scala-2.11.tgz flink-1.12.4/

3. Production by parcel

3.1 preparation

  • Download the Flink parcel packaging tool
git clone https://github.com/EvenGui/flink-parcel-master
# Enter this directory

  • Modify configuration

Before that, copy the flink package generated by [2.2.5] package to the flink parcel master directory

vim flink-parcel.properties
#FLINk download address
FLINK_URL=  ./flink-1.12.4-bin_scala-2.11.tgz

#flink version number
FLINK_VERSION=1.12.4

#Extended version number
EXTENS_VERSION=BIN_SCALA-2.11

#Operating system version, taking centos as an example
OS_VERSION=7

#CDH small version
CDH_MIN_FULL=5.16.2
CDH_MAX_FULL=6.4

#CDH large version
CDH_MIN=5
CDH_MAX=6

3.2 start making Flink parcel

sh build.sh parcel

Compiled successfully

3.3 compiling csd

sh build.sh csd

This compilation package can be installed and used in the cdh cluster.

3.4 package all packages into zip package

mkdir -p flink-parcel-1.12.4
mv FLINK-1.12.4-BIN_SCALA-2.11_build/* flink-parcel-1.12.4/
mv FLINK_ON_YARN-1.12.4.jar flink-parcel-1.12.4/
zip flink-parcel-1.12.4.zip -r flink-parcel-1.12.4/

4 installation

If this is an upgrade, you can first save the savepoint of the corresponding task, uninstall the old version from the cluster (omitted), and add a new version again.

4.1 upload the corresponding parcel package to the cluster

Upload the package of [3.4] (#3.4 package all packages into zip package) to / opt / cloudera / parcel repo of CM server node in cdh cluster (back up the original manifest.json), and then unzip it

unzip flink-parcel-1.12.4.zip

Move the extracted flink parcel package to / opt / cloudera / parcel repo. In addition, Flink_ ON_ YARN-1.12.4. Move the jar to / opt/cloudera/csd, and then modify the permissions

chown cloudera-scm:cloudera-scm -R /opt/cloudera/parcel-repo
chown cloudera-scm:cloudera-scm -R /opt/cloudera/csd

4.2 activate parcel package

Operate in cm visual interface

  • Detect new parcel s

  • Assignment activation

4.3 new services

  • Cm server needs to be restarted before adding services

Execute on CM server machine:

systemctl restart cloudera-scm-server

  • Add service

Then select the flick point to continue. Hive is configured later, so the non hive version is bound here

Custom role

Select one machine on the Dashboard and all machines on the gateway. Click continue

Review the changes and click continue

Then click start to finish.

4.4 configuration adjustment

  • Configure environment variables

This is because the environment variables of this component need to be configured:

Return to the cm main interface - > select the blink yarn component - > select the configuration - > search "environment", and add the following configuration

HADOOP_CONF_DIR=/etc/hadoop/conf
HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/jars/*
HADOOP_USER_NAME=flink

  • Configure statebackend

    [the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-dwwm6ego-1622691791878)(
    )]

  • Keytab

If kerberos authentication is not started, remove the value from this configuration

  • Resource allocation

The default configuration is adopted. If necessary, it is adjusted according to the actual situation. The specific resources of each task are configured at startup

4.4.1 configure hive

5 test

  • Configure execution environment (all machines)
vi /etc/profile

# Add the following
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/jars/*
export HADOOP_USER_NAME=flink
 # After adding, execute the following command to take effect
 source /etc/profile

  • test

Execute a wordcount:

cd /opt/cloudera/parcels/FLINK/lib/flink/

# Execute word count test

./bin/flink run -Dclassloader.check-leaked-classloader=false examples/batch/WordCount.jar 

After submission, the output is as follows:

If a similar exception occurs at the end of the test:

Classloader can be configured check-leaked-classloader=false