New Folder: Compressed Package Folder, Software Installation Directory Folder
The following does not indicate which host operations are all Master host operations
# Recursively Create Compressed Package Folder mkdir -p /usr/tar # Recursively create the software installation directory folder mkdir -p /usr/apps
Install upload and color code commands [requires networking]
# Upload Command Installation yum install -y lrzsz # Color Command Installation yum install -y vim
Description: Use the lrzsz upload command or XFTP upload
Modify the hosts file to add your own three machine IP s
vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.38.170 master 192.168.38.171 slave1 192.168.38.172 slave2
Save: Esc:wq Enter
Configure Secret-Free
Send the configured/etc/hosts file to two other hosts
scp /etc/hosts root@slave1:/etc/ scp /etc/hosts root@slave2:/etc/ # Enter the corresponding yes/password when you need to enter yes/password
All three hosts generate key files:
# Master, Salve1, Salve2 all need to be entered ssh-keygen -t rsa # Enter all the way
Copy the key to a host, and this article chooses to send it to the Master host
# Master, Salve1, Salve2 all need to be entered ssh-copy-id master
Send Master Host's Comprehensive Three Keys file to two other hosts
scp /root/.ssh/authorized_keys root@slave1:/root/.ssh/ scp /root/.ssh/authorized_keys root@slave2:/root/.ssh/
Test Success
# Operate all three hosts whenever possible # Connect to master host to see if a password is required ssh master # Exit command exit # Connect to slave1 host to see if a password is required ssh slave1 # Exit command exit # Connect to slave2 host to see if a password is required ssh slave2 # Exit command exit
Upload Software Installation Package (Compressed Package)
#Enter the tar directory cd /usr/tar/ # Upload required installation packages rz
The upload in this article is: Please select the correct installation file during use, Spark in this article is slightly incorrect, hadoop2.6 and I used version 2.7 instead
jdk-8u161-linux-x64.tar.gz
hadoop-2.7.4.tar.gz
spark-2.4.7-bin-hadoop2.6.tgz
zookeeper-3.4.10.tar.gz
Upload complete view:
[root@master tar]# ls hadoop-2.7.4.tar.gz jdk-8u161-linux-x64.tar.gz spark-2.4.7-bin-hadoop2.6.tgz zookeeper-3.4.10.tar.gz
Unzip to the specified directory:
# Unzip jdk to specified folder - C specifies tar -zxf jdk-8u161-linux-x64.tar.gz -C /usr/apps/ # Unzip Hadoop to the specified folder tar -zxf hadoop-2.7.4.tar.gz -C /usr/apps/ # Unzip zookeeper to specified folder tar -zxf zookeeper-3.4.10.tar.gz -C /usr/apps/ # Unzip Spark to specified folder tar -zxf spark-2.4.7-bin-hadoop2.6.tgz -C /usr/apps/
Enter the software directory (which directory is mentioned above)
cd /usr/apps/
See:
[root@master apps]# ll Total usage 16 drwxr-xr-x. 10 20415 101 4096 8 January 2017 hadoop-2.7.4 drwxr-xr-x. 8 10 143 4096 12 February 20, 2017 jdk1.8.0_161 drwxr-xr-x. 13 1000 1000 4096 9 August 8 2020 spark-2.4.7-bin-hadoop2.6 drwxr-xr-x. 10 1001 1001 4096 3 February 23, 2017 zookeeper-3.4.10
Rename [Easy to remember, no compulsion]
# Will jdk1.8.0_161 renamed jdk1.8.0 mv jdk1.8.0_161/ jdk1.8.0 # Rename spark-2.4.7-bin-hadoop2.6 to spark-2.4.7 mv spark-2.4.7-bin-hadoop2.6/ spark-2.4.7
[root@master apps]# ll Total usage 16 drwxr-xr-x. 10 20415 101 4096 8 January 2017 hadoop-2.7.4 drwxr-xr-x. 8 10 143 4096 12 February 20, 2017 jdk1.8.0 drwxr-xr-x. 13 1000 1000 4096 9 August 8 2020 spark-2.4.7 drwxr-xr-x. 10 1001 1001 4096 3 February 23, 2017 zookeeper-3.4.10
Configuring environment variables
First look at the directory where each software is located
Enter the pwd command in each software to see the absolute path
[root@master apps]# cd hadoop-2.7.4/ [root@master hadoop-2.7.4]# pwd /usr/apps/hadoop-2.7.4 [root@master hadoop-2.7.4]# cd .. [root@master apps]# cd jdk1.8.0/ [root@master jdk1.8.0]# pwd /usr/apps/jdk1.8.0 [root@master jdk1.8.0]# cd .. [root@master apps]# cd zookeeper-3.4.10/ [root@master zookeeper-3.4.10]# pwd /usr/apps/zookeeper-3.4.10 [root@master zookeeper-3.4.10]# cd .. [root@master apps]# cd spark-2.4.7/ [root@master spark-2.4.7]# pwd /usr/apps/spark-2.4.7
Start configuring environment variables when you get absolute paths to each software
The color command vim is used below. In fact, the vi command is OK, just no color
vim /etc/profile
Shift+G to quickly navigate to the end of the document
The configuration file is as follows:
#JAVA_HOME export JAVA_HOME=/usr/apps/jdk1.8.0 export PATH=$JAVA_HOME/bin:$PATH # CLASSPATH is configured before version 1.5 and may not be configured after version 1.5, but it is most secure and unmatched. export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar #ZOOKEEPER_HOME export ZOOKEEPER_HOME=/usr/apps/zookeeper-3.4.10 export PATH=$ZOOKEEPER_HOME/bin:$PATH #HADOOP_HOME export HADOOP_HOME=/usr/apps/hadoop-2.7.4 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH #SPARK_HOME export SPARK_HOME=/usr/apps/spark-2.4.7 export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
Configure Hadoop
Enter the Hadoop profile directory
[root@master hadoop-2.7.4]# cd /usr/apps/hadoop-2.7.4/etc/hadoop/
- hadoop-env.sh: Configure the environment variables required for Hadoop to run
Line 25 [set nu] View line number
vim hadoop-env.sh
Modify as follows:
# set JAVA_HOME in this file, so that it is correctly defined on 22 # remote nodes. 23 24 # The java implementation to use. 25 export JAVA_HOME=/usr/apps/jdk1.8.0 26 27 # The jsvc implementation to use. Jsvc is required to run secure datanodes
- core-sitel.xml: core file
vim core-site.xml
Add it as follows:
<configuration> <!-- Appoint HDFS in NameNode Address--> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <!-- Appoint hadoop Storage directory where files are generated at runtime--> <property> <name>hadoop.tmp.dir</name> <value>/usr/apps/hadoop/tmp</value> </property> </configuration>
- hdfs-site.xml: HDFS profile, inherits core-site.xml profile
vim hdfs-site.xml
Add the following modifications:
<configuration> <property> <!-- Specify the number of copies of the file --> <name>dfs.replication</name> <value>3</value> </property> <!-- Appoint secondary Host and Port --> <!-- secondary: Auxiliary Management namenode Primary Node --> <property> <name>dfs.namenode.secondary.http-address</name> <value>slave1:50090</value> </property> </configuration>
- mapred-site.xml: MapReduce configuration file, inherits core-site.xml configuration file
vim mapred-site.xml.template where template is a template file, we want to copy it as mapred-site.xml file
cp mapred-site.xml.template mapred-site.xml
Modify file:
vim mapred-site.xml
The modifications are as follows:
<configuration> <!-- Appoint MapReduce Runtime framework, specified here in Yarn Up, default is local --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
- yarn-site.xml: Yarn profile, inherits core-site.xml profile
Distributed Resource Scheduling System
vim yarn-site.xml
The modifications are as follows:
<configuration> <!-- Site specific YARN configuration properties --> <property> <!-- yarn The primary node is master On Host --> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Modify Slave File: Configure Slave
vim slaves
The modifications are as follows:
master slave1 slave2
At this time, the Hadoop Distributed File System has been configured [non-HA]
zookeeper configuration
# Go to zookeeper new directory cd /usr/apps/zookeeper-3.4.10/
Create zookeeper's log and data folders
mkdir zkdata zklog
ll command to see if creation was successful
[root@master zookeeper-3.4.10]# ll Total dosage 1580 drwxr-xr-x. 2 1001 1001 4096 3 February 23, 2017 bin -rw-rw-r--. 1 1001 1001 84725 3 February 23, 2017 build.xml drwxr-xr-x. 2 1001 1001 74 3 February 23, 2017 conf drwxr-xr-x. 10 1001 1001 4096 3 February 23, 2017 contrib drwxr-xr-x. 2 1001 1001 4096 3 February 23, 2017 dist-maven drwxr-xr-x. 6 1001 1001 4096 3 February 23, 2017 docs -rw-rw-r--. 1 1001 1001 1709 3 February 23, 2017 ivysettings.xml -rw-rw-r--. 1 1001 1001 5691 3 February 23, 2017 ivy.xml drwxr-xr-x. 4 1001 1001 4096 3 February 23, 2017 lib -rw-rw-r--. 1 1001 1001 11938 3 February 23, 2017 LICENSE.txt -rw-rw-r--. 1 1001 1001 3132 3 February 23, 2017 NOTICE.txt -rw-rw-r--. 1 1001 1001 1770 3 February 23, 2017 README_packaging.txt -rw-rw-r--. 1 1001 1001 1585 3 February 23, 2017 README.txt drwxr-xr-x. 5 1001 1001 44 3 February 23, 2017 recipes drwxr-xr-x. 8 1001 1001 4096 3 February 23, 2017 src drwxr-xr-x. 2 root root 6 10 February 2204:54 zkdata drwxr-xr-x. 2 root root 6 10 February 2204:54 zklog -rw-rw-r--. 1 1001 1001 1456729 3 February 23, 2017 zookeeper-3.4.10.jar -rw-rw-r--. 1 1001 1001 819 3 February 23, 2017 zookeeper-3.4.10.jar.asc -rw-rw-r--. 1 1001 1001 33 3 February 23, 2017 zookeeper-3.4.10.jar.md5 -rw-rw-r--. 1 1001 1001 41 3 February 23, 2017 zookeeper-3.4.10.jar.sha1
View the absolute paths to zkdata and zklog
[root@master zkdata]# pwd /usr/apps/zookeeper-3.4.10/zkdata # Only one zkdata directory has been viewed and the absolute path of the zklog is known accordingly
Enter the zookeeper profile directory
cd /usr/apps/zookeeper-3.4.10/conf/
[root@master conf]# ll Total usage 12 -rw-rw-r--. 1 1001 1001 535 3 February 23, 2017 configuration.xsl -rw-rw-r--. 1 1001 1001 2161 3 February 23, 2017 log4j.properties -rw-rw-r--. 1 1001 1001 922 3 February 23, 2017 zoo_sample.cfg
# Copy the template file to become a configuration file for editing cp zoo_sample.cfg zoo.cfg
Edit zookeeper profile
vim zoo.cfg
Configuration is complete as follows:
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/usr/apps/zookeeper-3.4.10/zkdata dataLogDir=/usr/apps/zookeeper-3.4.10/zklog # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=master:2888:3888 server.2=slave1:2888:3888 server.3=slave2:2888:3888
Setting zookeeper's myid file
Enter the zkdata directory of zookeeper
Be careful!!! This file (myid) needs to be changed in Slave1 and Slave2. Look back
cd /usr/apps/zookeeper-3.4.10/zkdata/
Output "1" to myid file
echo 1 > myid # View myid file cat myid 1
Configure Spark
Enter Spark Configuration
cd /usr/apps/spark-2.4.7/conf/
Modify the Spark launch command template file:
cp spark-env.sh.template spark-env.sh
Edit the profile:
vim spark-env.sh
Insert at the end:
To put it plainly, those profiles, please refer to the / etc/profile documentation for details
export JAVA_HOME=/usr/apps/jdk1.8.0 export HADOOP_HOME=/usr/apps/hadoop-2.7.4 export HADOOP_CONF_DIR=/usr/apps/hadoop-2.7.4/etc/hadoop export SPARK_MASTER_IP=master # The following three lines are not writable export SPARK_WORKER_MEMORY=4g export SPARK_WORKER_CORES=2 export SPARK_WORKER_INSTANCES=1
Profile details:
JAVA_HOME:Java Installation Directory
SCALA_HOME:Scala Installation Directory
HADOOP_HOME:hadoop installation directory
HADOOP_ CONF_ Directory of configuration files for the DIR:hadoop cluster
SPARK_MASTER_IP:ip address of Master node of spark cluster
SPARK_WORKER_MEMORY: Maximum memory size that each worker node can allocate to exectors
SPARK_WORKER_CORES: Number of CPU cores per worker node
SPARK_WORKER_INSTANCES: Number of w orker nodes opened on each machine
Modify Slave File
vim slaves
The following: [Two slaves]
slave1 slave2
Distribution profiles and environment variables
-r to send directory files
# Distribute all software profiles to two slaves scp -r /usr/apps/ root@slave1:/usr/ scp -r /usr/apps/ root@slave2:/usr/
# Send profil e environment variable file to two other slaves scp /etc/profile root@slave1:/etc/ scp /etc/profile root@slave2:/etc/
Two slaves modify zookeeper's myid
# slave1 [root@slave1 ~]# cd /usr/apps/zookeeper-3.4.10/zkdata/ [root@slave1 zkdata]# echo 2 > myid [root@slave1 zkdata]# cat myid 2
#slave2 [root@salve2 ~]# cd /usr/apps/zookeeper-3.4.10/zkdata/ [root@salve2 zkdata]# echo 3 > myid [root@salve2 zkdata]# cat myid 3
Close firewall or open port
Three hosts shut down the firewall:
# Three hosts execute this command to close the firewall systemctl stop firewalld # After closing, you can execute the following command to see if the closing was successful systemctl status firewalld
Open corresponding port settings: Firewall needs to be restarted after release
# Open firewall specified port, release port firewall-cmd --add-port=Port number --permanent # service iptables restart firewall-cmd --reload
Refresh Environment Variables [Three Hosts]
source /etc/profile
Start Hadoop
Format namenode file system [master host executes it]
hdfs namenode -format
Start Hadoop's dfs system
start-dfs.sh
The results are as follows:
[root@master conf]# start-dfs.sh Starting namenodes on [master] master: starting namenode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-namenode-master.out slave1: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-slave1.out slave2: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-salve2.out master: starting datanode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-datanode-master.out Starting secondary namenodes [slave1] slave1: starting secondarynamenode, logging to /usr/apps/hadoop-2.7.4/logs/hadoop-root-secondarynamenode-slave1.out
Open yarn service for Hadoop
start-yarn.sh
The following:
[root@master apps]# start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-resourcemanager-master.out slave1: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-slave1.out slave2: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-salve2.out master: starting nodemanager, logging to /usr/apps/hadoop-2.7.4/logs/yarn-root-nodemanager-master.out
Start zookeeper [following command to be executed by all three hosts]
zkServer.sh start
Start Spark
Because Spark's startup command is the same as Hadoop's, and environment variables are configured:
Enter into the sbin directory of Spark
cd /usr/apps/spark-2.4.7/sbin/
Execution:
# Execute start-all.sh in the current directory ./start-all.sh
jps view command:
The master's Spark process name is Master
The other two are Worker s
Hadoop's hdfsUI interface:
IP:50070
Hadoop's yarnUI interface:
IP:8088
UI interface for Spark:
IP:8080