Hadoop, zookeeper, spark installation

Posted by Skepsis on Thu, 21 Oct 2021 15:31:14 +0200

New Folder: Compressed Package Folder, Software Installation Directory Folder

The following does not indicate which host operations are all Master host operations

# Recursively Create Compressed Package Folder
mkdir -p /usr/tar
# Recursively create the software installation directory folder
mkdir -p /usr/apps

Install upload and color code commands [requires networking]

# Upload Command Installation
yum install -y lrzsz
# Color Command Installation
yum install -y vim

Description: Use the lrzsz upload command or XFTP upload

Modify the hosts file to add your own three machine IP s

vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.38.170 master
192.168.38.171 slave1
192.168.38.172 slave2

Save: Esc:wq Enter

Configure Secret-Free

Send the configured/etc/hosts file to two other hosts

scp /etc/hosts root@slave1:/etc/
scp /etc/hosts root@slave2:/etc/
# Enter the corresponding yes/password when you need to enter yes/password

All three hosts generate key files:

# Master, Salve1, Salve2 all need to be entered
ssh-keygen -t rsa
# Enter all the way

Copy the key to a host, and this article chooses to send it to the Master host

# Master, Salve1, Salve2 all need to be entered
ssh-copy-id master

Send Master Host's Comprehensive Three Keys file to two other hosts

scp /root/.ssh/authorized_keys root@slave1:/root/.ssh/
scp /root/.ssh/authorized_keys root@slave2:/root/.ssh/

Test Success

# Operate all three hosts whenever possible
# Connect to master host to see if a password is required
ssh master
# Exit command
exit
# Connect to slave1 host to see if a password is required
ssh slave1
# Exit command
exit
# Connect to slave2 host to see if a password is required
ssh slave2
# Exit command
exit

Upload Software Installation Package (Compressed Package)

#Enter the tar directory
cd /usr/tar/
# Upload required installation packages
rz

The upload in this article is: Please select the correct installation file during use, Spark in this article is slightly incorrect, hadoop2.6 and I used version 2.7 instead
jdk-8u161-linux-x64.tar.gz
hadoop-2.7.4.tar.gz
spark-2.4.7-bin-hadoop2.6.tgz
zookeeper-3.4.10.tar.gz

Upload complete view:

[root@master tar]# ls
hadoop-2.7.4.tar.gz  jdk-8u161-linux-x64.tar.gz  spark-2.4.7-bin-hadoop2.6.tgz  zookeeper-3.4.10.tar.gz

Unzip to the specified directory:

# Unzip jdk to specified folder - C specifies
tar -zxf jdk-8u161-linux-x64.tar.gz -C /usr/apps/
# Unzip Hadoop to the specified folder
tar -zxf hadoop-2.7.4.tar.gz -C /usr/apps/
# Unzip zookeeper to specified folder
tar -zxf zookeeper-3.4.10.tar.gz -C /usr/apps/
# Unzip Spark to specified folder
tar -zxf spark-2.4.7-bin-hadoop2.6.tgz -C /usr/apps/

Enter the software directory (which directory is mentioned above)

cd /usr/apps/

See:

[root@master apps]# ll
 Total usage 16
drwxr-xr-x. 10 20415  101 4096 8 January 2017 hadoop-2.7.4
drwxr-xr-x.  8    10  143 4096 12 February 20, 2017 jdk1.8.0_161
drwxr-xr-x. 13  1000 1000 4096 9 August 8 2020 spark-2.4.7-bin-hadoop2.6
drwxr-xr-x. 10  1001 1001 4096 3 February 23, 2017 zookeeper-3.4.10

Rename [Easy to remember, no compulsion]

# Will jdk1.8.0_161 renamed jdk1.8.0
mv jdk1.8.0_161/ jdk1.8.0
# Rename spark-2.4.7-bin-hadoop2.6 to spark-2.4.7
mv spark-2.4.7-bin-hadoop2.6/ spark-2.4.7
[root@master apps]# ll
 Total usage 16
drwxr-xr-x. 10 20415  101 4096 8 January 2017 hadoop-2.7.4
drwxr-xr-x.  8    10  143 4096 12 February 20, 2017 jdk1.8.0
drwxr-xr-x. 13  1000 1000 4096 9 August 8 2020 spark-2.4.7
drwxr-xr-x. 10  1001 1001 4096 3 February 23, 2017 zookeeper-3.4.10

Configuring environment variables

First look at the directory where each software is located

Enter the pwd command in each software to see the absolute path

[root@master apps]# cd hadoop-2.7.4/
[root@master hadoop-2.7.4]# pwd
/usr/apps/hadoop-2.7.4
[root@master hadoop-2.7.4]# cd ..
[root@master apps]# cd jdk1.8.0/
[root@master jdk1.8.0]# pwd
/usr/apps/jdk1.8.0
[root@master jdk1.8.0]# cd ..
[root@master apps]# cd zookeeper-3.4.10/
[root@master zookeeper-3.4.10]# pwd
/usr/apps/zookeeper-3.4.10
[root@master zookeeper-3.4.10]# cd ..
[root@master apps]# cd spark-2.4.7/
[root@master spark-2.4.7]# pwd
/usr/apps/spark-2.4.7

Start configuring environment variables when you get absolute paths to each software

The color command vim is used below. In fact, the vi command is OK, just no color

vim /etc/profile

Shift+G to quickly navigate to the end of the document

The configuration file is as follows:

#JAVA_HOME
export JAVA_HOME=/usr/apps/jdk1.8.0
export PATH=$JAVA_HOME/bin:$PATH
# CLASSPATH is configured before version 1.5 and may not be configured after version 1.5, but it is most secure and unmatched.
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

#ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/usr/apps/zookeeper-3.4.10
export PATH=$ZOOKEEPER_HOME/bin:$PATH

#HADOOP_HOME
export HADOOP_HOME=/usr/apps/hadoop-2.7.4
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

#SPARK_HOME
export SPARK_HOME=/usr/apps/spark-2.4.7
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

Configure Hadoop

Enter the Hadoop profile directory

[root@master hadoop-2.7.4]# cd /usr/apps/hadoop-2.7.4/etc/hadoop/
  1. hadoop-env.sh: Configure the environment variables required for Hadoop to run
    Line 25 [set nu] View line number
vim hadoop-env.sh

Modify as follows:

# set JAVA_HOME in this file, so that it is correctly defined on
 22 # remote nodes.
 23 
 24 # The java implementation to use.
 25 export JAVA_HOME=/usr/apps/jdk1.8.0
 26 
 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
  1. core-sitel.xml: core file
vim core-site.xml 

Add it as follows:

<configuration>
<!-- Appoint HDFS in NameNode Address-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<!-- Appoint hadoop Storage directory where files are generated at runtime-->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/apps/hadoop/tmp</value>
</property>
</configuration>
  1. hdfs-site.xml: HDFS profile, inherits core-site.xml profile
vim hdfs-site.xml

Add the following modifications:

<configuration>
<property>
<!-- Specify the number of copies of the file -->
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- Appoint secondary Host and Port -->
<!-- secondary: Auxiliary Management namenode Primary Node -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:50090</value>
</property>
</configuration>
  1. mapred-site.xml: MapReduce configuration file, inherits core-site.xml configuration file

vim mapred-site.xml.template where template is a template file, we want to copy it as mapred-site.xml file

cp mapred-site.xml.template mapred-site.xml

Modify file:

vim mapred-site.xml

The modifications are as follows:

<configuration>
<!-- Appoint MapReduce Runtime framework, specified here in Yarn Up, default is local -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
  1. yarn-site.xml: Yarn profile, inherits core-site.xml profile
    Distributed Resource Scheduling System
vim yarn-site.xml

The modifications are as follows:

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<!-- yarn The primary node is master On Host -->
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Modify Slave File: Configure Slave

vim slaves 

The modifications are as follows:

master
slave1
slave2

At this time, the Hadoop Distributed File System has been configured [non-HA]

zookeeper configuration

# Go to zookeeper new directory
cd /usr/apps/zookeeper-3.4.10/

Create zookeeper's log and data folders

mkdir zkdata zklog

ll command to see if creation was successful

[root@master zookeeper-3.4.10]# ll
 Total dosage 1580
drwxr-xr-x.  2 1001 1001    4096 3 February 23, 2017 bin
-rw-rw-r--.  1 1001 1001   84725 3 February 23, 2017 build.xml
drwxr-xr-x.  2 1001 1001      74 3 February 23, 2017 conf
drwxr-xr-x. 10 1001 1001    4096 3 February 23, 2017 contrib
drwxr-xr-x.  2 1001 1001    4096 3 February 23, 2017 dist-maven
drwxr-xr-x.  6 1001 1001    4096 3 February 23, 2017 docs
-rw-rw-r--.  1 1001 1001    1709 3 February 23, 2017 ivysettings.xml
-rw-rw-r--.  1 1001 1001    5691 3 February 23, 2017 ivy.xml
drwxr-xr-x.  4 1001 1001    4096 3 February 23, 2017 lib
-rw-rw-r--.  1 1001 1001   11938 3 February 23, 2017 LICENSE.txt
-rw-rw-r--.  1 1001 1001    3132 3 February 23, 2017 NOTICE.txt
-rw-rw-r--.  1 1001 1001    1770 3 February 23, 2017 README_packaging.txt
-rw-rw-r--.  1 1001 1001    1585 3 February 23, 2017 README.txt
drwxr-xr-x.  5 1001 1001      44 3 February 23, 2017 recipes
drwxr-xr-x.  8 1001 1001    4096 3 February 23, 2017 src
drwxr-xr-x.  2 root root       6 10 February 2204:54 zkdata
drwxr-xr-x.  2 root root       6 10 February 2204:54 zklog
-rw-rw-r--.  1 1001 1001 1456729 3 February 23, 2017 zookeeper-3.4.10.jar
-rw-rw-r--.  1 1001 1001     819 3 February 23, 2017 zookeeper-3.4.10.jar.asc
-rw-rw-r--.  1 1001 1001      33 3 February 23, 2017 zookeeper-3.4.10.jar.md5
-rw-rw-r--.  1 1001 1001      41 3 February 23, 2017 zookeeper-3.4.10.jar.sha1

View the absolute paths to zkdata and zklog

[root@master zkdata]# pwd
/usr/apps/zookeeper-3.4.10/zkdata
# Only one zkdata directory has been viewed and the absolute path of the zklog is known accordingly

Enter the zookeeper profile directory

cd /usr/apps/zookeeper-3.4.10/conf/
[root@master conf]# ll
 Total usage 12
-rw-rw-r--. 1 1001 1001  535 3 February 23, 2017 configuration.xsl
-rw-rw-r--. 1 1001 1001 2161 3 February 23, 2017 log4j.properties
-rw-rw-r--. 1 1001 1001  922 3 February 23, 2017 zoo_sample.cfg
# Copy the template file to become a configuration file for editing
cp zoo_sample.cfg zoo.cfg

Edit zookeeper profile

vim zoo.cfg

Configuration is complete as follows:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/usr/apps/zookeeper-3.4.10/zkdata
dataLogDir=/usr/apps/zookeeper-3.4.10/zklog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

Setting zookeeper's myid file

Enter the zkdata directory of zookeeper

Be careful!!! This file (myid) needs to be changed in Slave1 and Slave2. Look back

cd /usr/apps/zookeeper-3.4.10/zkdata/

Output "1" to myid file

echo 1 > myid
# View myid file
cat myid 
1

Configure Spark

Enter Spark Configuration

cd /usr/apps/spark-2.4.7/conf/

Modify the Spark launch command template file:

cp spark-env.sh.template spark-env.sh

Edit the profile:

vim spark-env.sh

Insert at the end:
To put it plainly, those profiles, please refer to the / etc/profile documentation for details

export JAVA_HOME=/usr/apps/jdk1.8.0
export HADOOP_HOME=/usr/apps/hadoop-2.7.4
export HADOOP_CONF_DIR=/usr/apps/hadoop-2.7.4/etc/hadoop
export SPARK_MASTER_IP=master
# The following three lines are not writable
export SPARK_WORKER_MEMORY=4g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1

Profile details:

JAVA_HOME:Java Installation Directory
SCALA_HOME:Scala Installation Directory
HADOOP_HOME:hadoop installation directory
HADOOP_ CONF_ Directory of configuration files for the DIR:hadoop cluster
SPARK_MASTER_IP:ip address of Master node of spark cluster
SPARK_WORKER_MEMORY: Maximum memory size that each worker node can allocate to exectors
SPARK_WORKER_CORES: Number of CPU cores per worker node
SPARK_WORKER_INSTANCES: Number of w orker nodes opened on each machine

Modify Slave File

vim slaves

The following: [Two slaves]

slave1
slave2

Distribution profiles and environment variables

-r to send directory files

# Distribute all software profiles to two slaves
scp -r /usr/apps/ root@slave1:/usr/
scp -r /usr/apps/ root@slave2:/usr/
# Send profil e environment variable file to two other slaves
scp /etc/profile root@slave1:/etc/
scp /etc/profile root@slave2:/etc/

Two slaves modify zookeeper's myid

# slave1
[root@slave1 ~]# cd /usr/apps/zookeeper-3.4.10/zkdata/
[root@slave1 zkdata]# echo 2 > myid 
[root@slave1 zkdata]# cat myid 
2
#slave2
[root@salve2 ~]# cd /usr/apps/zookeeper-3.4.10/zkdata/
[root@salve2 zkdata]# echo 3 > myid 
[root@salve2 zkdata]# cat myid 
3

Close firewall or open port

Three hosts shut down the firewall:

# Three hosts execute this command to close the firewall
systemctl stop firewalld
# After closing, you can execute the following command to see if the closing was successful
systemctl status firewalld

Open corresponding port settings: Firewall needs to be restarted after release

# Open firewall specified port, release port
firewall-cmd --add-port=Port number --permanent
# service iptables restart
firewall-cmd --reload

Refresh Environment Variables [Three Hosts]

source /etc/profile

Start Hadoop

Format namenode file system [master host executes it]

hdfs namenode -format

Start Hadoop's dfs system

start-dfs.sh

The results are as follows:

[root@master conf]# start-dfs.sh 
Starting namenodes on [master]
master: starting namenode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-namenode-master.out
slave1: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-slave1.out
slave2: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-salve2.out
master: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-secondarynamenode-slave1.out

Open yarn service for Hadoop

start-yarn.sh

The following:

[root@master apps]# start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-resourcemanager-master.out
slave1: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-slave1.out
slave2: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-salve2.out
master: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-master.out

Start zookeeper [following command to be executed by all three hosts]

zkServer.sh start

Start Spark

Because Spark's startup command is the same as Hadoop's, and environment variables are configured:

Enter into the sbin directory of Spark

cd /usr/apps/spark-2.4.7/sbin/

Execution:

# Execute start-all.sh in the current directory
./start-all.sh

jps view command:
The master's Spark process name is Master
The other two are Worker s

Hadoop's hdfsUI interface:
IP:50070
Hadoop's yarnUI interface:
IP:8088
UI interface for Spark:
IP:8080

Topics: Hadoop Spark ssh