Hadoop, zookeeper, spark installation

Posted by Skepsis on Thu, 21 Oct 2021 15:31:14 +0200

New Folder: Compressed Package Folder, Software Installation Directory Folder

The following does not indicate which host operations are all Master host operations

# Recursively Create Compressed Package Folder
mkdir -p /usr/tar
# Recursively create the software installation directory folder
mkdir -p /usr/apps

Install upload and color code commands [requires networking]

# Upload Command Installation
yum install -y lrzsz
# Color Command Installation
yum install -y vim

Description: Use the lrzsz upload command or XFTP upload

Modify the hosts file to add your own three machine IP s

vi /etc/hosts   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 master slave1 slave2

Save: Esc:wq Enter

Configure Secret-Free

Send the configured/etc/hosts file to two other hosts

scp /etc/hosts root@slave1:/etc/
scp /etc/hosts root@slave2:/etc/
# Enter the corresponding yes/password when you need to enter yes/password

All three hosts generate key files:

# Master, Salve1, Salve2 all need to be entered
ssh-keygen -t rsa
# Enter all the way

Copy the key to a host, and this article chooses to send it to the Master host

# Master, Salve1, Salve2 all need to be entered
ssh-copy-id master

Send Master Host's Comprehensive Three Keys file to two other hosts

scp /root/.ssh/authorized_keys root@slave1:/root/.ssh/
scp /root/.ssh/authorized_keys root@slave2:/root/.ssh/

Test Success

# Operate all three hosts whenever possible
# Connect to master host to see if a password is required
ssh master
# Exit command
# Connect to slave1 host to see if a password is required
ssh slave1
# Exit command
# Connect to slave2 host to see if a password is required
ssh slave2
# Exit command

Upload Software Installation Package (Compressed Package)

#Enter the tar directory
cd /usr/tar/
# Upload required installation packages

The upload in this article is: Please select the correct installation file during use, Spark in this article is slightly incorrect, hadoop2.6 and I used version 2.7 instead

Upload complete view:

[root@master tar]# ls
hadoop-2.7.4.tar.gz  jdk-8u161-linux-x64.tar.gz  spark-2.4.7-bin-hadoop2.6.tgz  zookeeper-3.4.10.tar.gz

Unzip to the specified directory:

# Unzip jdk to specified folder - C specifies
tar -zxf jdk-8u161-linux-x64.tar.gz -C /usr/apps/
# Unzip Hadoop to the specified folder
tar -zxf hadoop-2.7.4.tar.gz -C /usr/apps/
# Unzip zookeeper to specified folder
tar -zxf zookeeper-3.4.10.tar.gz -C /usr/apps/
# Unzip Spark to specified folder
tar -zxf spark-2.4.7-bin-hadoop2.6.tgz -C /usr/apps/

Enter the software directory (which directory is mentioned above)

cd /usr/apps/


[root@master apps]# ll
 Total usage 16
drwxr-xr-x. 10 20415  101 4096 8 January 2017 hadoop-2.7.4
drwxr-xr-x.  8    10  143 4096 12 February 20, 2017 jdk1.8.0_161
drwxr-xr-x. 13  1000 1000 4096 9 August 8 2020 spark-2.4.7-bin-hadoop2.6
drwxr-xr-x. 10  1001 1001 4096 3 February 23, 2017 zookeeper-3.4.10

Rename [Easy to remember, no compulsion]

# Will jdk1.8.0_161 renamed jdk1.8.0
mv jdk1.8.0_161/ jdk1.8.0
# Rename spark-2.4.7-bin-hadoop2.6 to spark-2.4.7
mv spark-2.4.7-bin-hadoop2.6/ spark-2.4.7
[root@master apps]# ll
 Total usage 16
drwxr-xr-x. 10 20415  101 4096 8 January 2017 hadoop-2.7.4
drwxr-xr-x.  8    10  143 4096 12 February 20, 2017 jdk1.8.0
drwxr-xr-x. 13  1000 1000 4096 9 August 8 2020 spark-2.4.7
drwxr-xr-x. 10  1001 1001 4096 3 February 23, 2017 zookeeper-3.4.10

Configuring environment variables

First look at the directory where each software is located

Enter the pwd command in each software to see the absolute path

[root@master apps]# cd hadoop-2.7.4/
[root@master hadoop-2.7.4]# pwd
[root@master hadoop-2.7.4]# cd ..
[root@master apps]# cd jdk1.8.0/
[root@master jdk1.8.0]# pwd
[root@master jdk1.8.0]# cd ..
[root@master apps]# cd zookeeper-3.4.10/
[root@master zookeeper-3.4.10]# pwd
[root@master zookeeper-3.4.10]# cd ..
[root@master apps]# cd spark-2.4.7/
[root@master spark-2.4.7]# pwd

Start configuring environment variables when you get absolute paths to each software

The color command vim is used below. In fact, the vi command is OK, just no color

vim /etc/profile

Shift+G to quickly navigate to the end of the document

The configuration file is as follows:

export JAVA_HOME=/usr/apps/jdk1.8.0
export PATH=$JAVA_HOME/bin:$PATH
# CLASSPATH is configured before version 1.5 and may not be configured after version 1.5, but it is most secure and unmatched.
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export ZOOKEEPER_HOME=/usr/apps/zookeeper-3.4.10

export HADOOP_HOME=/usr/apps/hadoop-2.7.4

export SPARK_HOME=/usr/apps/spark-2.4.7

Configure Hadoop

Enter the Hadoop profile directory

[root@master hadoop-2.7.4]# cd /usr/apps/hadoop-2.7.4/etc/hadoop/
  1. hadoop-env.sh: Configure the environment variables required for Hadoop to run
    Line 25 [set nu] View line number
vim hadoop-env.sh

Modify as follows:

# set JAVA_HOME in this file, so that it is correctly defined on
 22 # remote nodes.
 24 # The java implementation to use.
 25 export JAVA_HOME=/usr/apps/jdk1.8.0
 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
  1. core-sitel.xml: core file
vim core-site.xml 

Add it as follows:

<!-- Appoint HDFS in NameNode Address-->
<!-- Appoint hadoop Storage directory where files are generated at runtime-->
  1. hdfs-site.xml: HDFS profile, inherits core-site.xml profile
vim hdfs-site.xml

Add the following modifications:

<!-- Specify the number of copies of the file -->
<!-- Appoint secondary Host and Port -->
<!-- secondary: Auxiliary Management namenode Primary Node -->
  1. mapred-site.xml: MapReduce configuration file, inherits core-site.xml configuration file

vim mapred-site.xml.template where template is a template file, we want to copy it as mapred-site.xml file

cp mapred-site.xml.template mapred-site.xml

Modify file:

vim mapred-site.xml

The modifications are as follows:

<!-- Appoint MapReduce Runtime framework, specified here in Yarn Up, default is local -->
  1. yarn-site.xml: Yarn profile, inherits core-site.xml profile
    Distributed Resource Scheduling System
vim yarn-site.xml

The modifications are as follows:


<!-- Site specific YARN configuration properties -->
<!-- yarn The primary node is master On Host -->

Modify Slave File: Configure Slave

vim slaves 

The modifications are as follows:


At this time, the Hadoop Distributed File System has been configured [non-HA]

zookeeper configuration

# Go to zookeeper new directory
cd /usr/apps/zookeeper-3.4.10/

Create zookeeper's log and data folders

mkdir zkdata zklog

ll command to see if creation was successful

[root@master zookeeper-3.4.10]# ll
 Total dosage 1580
drwxr-xr-x.  2 1001 1001    4096 3 February 23, 2017 bin
-rw-rw-r--.  1 1001 1001   84725 3 February 23, 2017 build.xml
drwxr-xr-x.  2 1001 1001      74 3 February 23, 2017 conf
drwxr-xr-x. 10 1001 1001    4096 3 February 23, 2017 contrib
drwxr-xr-x.  2 1001 1001    4096 3 February 23, 2017 dist-maven
drwxr-xr-x.  6 1001 1001    4096 3 February 23, 2017 docs
-rw-rw-r--.  1 1001 1001    1709 3 February 23, 2017 ivysettings.xml
-rw-rw-r--.  1 1001 1001    5691 3 February 23, 2017 ivy.xml
drwxr-xr-x.  4 1001 1001    4096 3 February 23, 2017 lib
-rw-rw-r--.  1 1001 1001   11938 3 February 23, 2017 LICENSE.txt
-rw-rw-r--.  1 1001 1001    3132 3 February 23, 2017 NOTICE.txt
-rw-rw-r--.  1 1001 1001    1770 3 February 23, 2017 README_packaging.txt
-rw-rw-r--.  1 1001 1001    1585 3 February 23, 2017 README.txt
drwxr-xr-x.  5 1001 1001      44 3 February 23, 2017 recipes
drwxr-xr-x.  8 1001 1001    4096 3 February 23, 2017 src
drwxr-xr-x.  2 root root       6 10 February 2204:54 zkdata
drwxr-xr-x.  2 root root       6 10 February 2204:54 zklog
-rw-rw-r--.  1 1001 1001 1456729 3 February 23, 2017 zookeeper-3.4.10.jar
-rw-rw-r--.  1 1001 1001     819 3 February 23, 2017 zookeeper-3.4.10.jar.asc
-rw-rw-r--.  1 1001 1001      33 3 February 23, 2017 zookeeper-3.4.10.jar.md5
-rw-rw-r--.  1 1001 1001      41 3 February 23, 2017 zookeeper-3.4.10.jar.sha1

View the absolute paths to zkdata and zklog

[root@master zkdata]# pwd
# Only one zkdata directory has been viewed and the absolute path of the zklog is known accordingly

Enter the zookeeper profile directory

cd /usr/apps/zookeeper-3.4.10/conf/
[root@master conf]# ll
 Total usage 12
-rw-rw-r--. 1 1001 1001  535 3 February 23, 2017 configuration.xsl
-rw-rw-r--. 1 1001 1001 2161 3 February 23, 2017 log4j.properties
-rw-rw-r--. 1 1001 1001  922 3 February 23, 2017 zoo_sample.cfg
# Copy the template file to become a configuration file for editing
cp zoo_sample.cfg zoo.cfg

Edit zookeeper profile

vim zoo.cfg

Configuration is complete as follows:

# The number of milliseconds of each tick
# The number of ticks that the initial 
# synchronization phase can take
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
# the port at which the clients will connect
# the maximum number of client connections.
# increase this if you need to handle more clients
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
# The number of snapshots to retain in dataDir
# Purge task interval in hours
# Set to "0" to disable auto purge feature

Setting zookeeper's myid file

Enter the zkdata directory of zookeeper

Be careful!!! This file (myid) needs to be changed in Slave1 and Slave2. Look back

cd /usr/apps/zookeeper-3.4.10/zkdata/

Output "1" to myid file

echo 1 > myid
# View myid file
cat myid 

Configure Spark

Enter Spark Configuration

cd /usr/apps/spark-2.4.7/conf/

Modify the Spark launch command template file:

cp spark-env.sh.template spark-env.sh

Edit the profile:

vim spark-env.sh

Insert at the end:
To put it plainly, those profiles, please refer to the / etc/profile documentation for details

export JAVA_HOME=/usr/apps/jdk1.8.0
export HADOOP_HOME=/usr/apps/hadoop-2.7.4
export HADOOP_CONF_DIR=/usr/apps/hadoop-2.7.4/etc/hadoop
export SPARK_MASTER_IP=master
# The following three lines are not writable

Profile details:

JAVA_HOME:Java Installation Directory
SCALA_HOME:Scala Installation Directory
HADOOP_HOME:hadoop installation directory
HADOOP_ CONF_ Directory of configuration files for the DIR:hadoop cluster
SPARK_MASTER_IP:ip address of Master node of spark cluster
SPARK_WORKER_MEMORY: Maximum memory size that each worker node can allocate to exectors
SPARK_WORKER_CORES: Number of CPU cores per worker node
SPARK_WORKER_INSTANCES: Number of w orker nodes opened on each machine

Modify Slave File

vim slaves

The following: [Two slaves]


Distribution profiles and environment variables

-r to send directory files

# Distribute all software profiles to two slaves
scp -r /usr/apps/ root@slave1:/usr/
scp -r /usr/apps/ root@slave2:/usr/
# Send profil e environment variable file to two other slaves
scp /etc/profile root@slave1:/etc/
scp /etc/profile root@slave2:/etc/

Two slaves modify zookeeper's myid

# slave1
[root@slave1 ~]# cd /usr/apps/zookeeper-3.4.10/zkdata/
[root@slave1 zkdata]# echo 2 > myid 
[root@slave1 zkdata]# cat myid 
[root@salve2 ~]# cd /usr/apps/zookeeper-3.4.10/zkdata/
[root@salve2 zkdata]# echo 3 > myid 
[root@salve2 zkdata]# cat myid 

Close firewall or open port

Three hosts shut down the firewall:

# Three hosts execute this command to close the firewall
systemctl stop firewalld
# After closing, you can execute the following command to see if the closing was successful
systemctl status firewalld

Open corresponding port settings: Firewall needs to be restarted after release

# Open firewall specified port, release port
firewall-cmd --add-port=Port number --permanent
# service iptables restart
firewall-cmd --reload

Refresh Environment Variables [Three Hosts]

source /etc/profile

Start Hadoop

Format namenode file system [master host executes it]

hdfs namenode -format

Start Hadoop's dfs system


The results are as follows:

[root@master conf]# start-dfs.sh 
Starting namenodes on [master]
master: starting namenode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-namenode-master.out
slave1: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-slave1.out
slave2: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-salve2.out
master: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-secondarynamenode-slave1.out

Open yarn service for Hadoop


The following:

[root@master apps]# start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-resourcemanager-master.out
slave1: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-slave1.out
slave2: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-salve2.out
master: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-master.out

Start zookeeper [following command to be executed by all three hosts]

zkServer.sh start

Start Spark

Because Spark's startup command is the same as Hadoop's, and environment variables are configured:

Enter into the sbin directory of Spark

cd /usr/apps/spark-2.4.7/sbin/


# Execute start-all.sh in the current directory

jps view command:
The master's Spark process name is Master
The other two are Worker s

Hadoop's hdfsUI interface:
Hadoop's yarnUI interface:
UI interface for Spark:

Topics: Hadoop Spark ssh