Hadoop pseudo distributed cluster installation and deployment

Posted by MasterACE14 on Sat, 19 Feb 2022 21:33:02 +0100

deploy

Download installation package
Upload and unzip the installation package
Configure environment variables
Modify profile
Format HDFS
Modify script file
Start and verify
Stop cluster
be careful:
1. The JDK environment is installed and configured by default
2. The CentOS7 Linux environment is installed and configured by default
3. Change the current host name to bigdata01
4. Pseudo distributed clusters are generally used in the learning phase

Download installation package

Download address
https://hadoop.apache.org/
Click the "download" button
-Display the latest version. If you want to download the previous version, click the following connection
-This tutorial has taken version 3.2.0 as an example
-Click to download the installation package

Upload and unzip the installation package

xshell connects to the CentOS7 server via ssh and creates a soft directory

	# Create directory
	mkdir soft

Upload installation package

	# Upload file
	rz -y

Unzip the installation package to the specified directory

	# -C specifies the decompression location
	tar -zxvf hadoop-3.2.0.tar.gz -C /usr/local

Enter Hadoop 3 2 directory and view the contents under the directory

	# Enter hadoop directory
	cd /usr/local/hadoop-3.2.0
	# View the contents of the directory
	ll

Introduction to important catalogue

Configure environment variables

Open profile

	# Global profile
	vi /etc/profile

Add the following contents at the end of the configuration file

	export HADOOP_HOME=/usr/local/hadoop-3.2.0
	export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Validation profile

	source /etc/profile

Modify profile

Enter the configuration file directory

	cd /usr/local/hadoop-3.2.0/etc/hodoop

Modify Hadoop env SH configuration file

	# Open file
	vi hadoop-env.sh
	# Add the following contents at the end of the file. The log directory can be used according to your own needs, and there is no need to be consistent with this article
	export JAVA_HOME=/usr/local/java_64
	export HADOOP_LOG_DIR=/usr/local/hadoop-3.2.0/hadoop_repo/logs/hadoop

be careful:
/usr/local/java_64 is the jdk directory
/usr/local/hadoop-3.2.0/hadoop_repo/logs/hadoop is the log directory

Modify core site XML configuration file

# Open file
vi core-site.xml
# Modify the label content of the file < configuration > < / configuration > to the following content
# bigdata01 is the current CentOS7 host name
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://bigdata01:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/usr/local/hadoop-3.2.0/hadoop_repo</value>
        </property>
</configuration>

be careful:
bigdata01 is the current CentOS7 host name
/usr/local/hadoop-3.2.0/hadoop_repo is a temporary directory

Modify HDFS site XML configuration file

# Open file
vi hdfs-site.xml
# Modify the label content of the file < configuration > < / configuration > to the following content
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>

Note: there is only one machine here, and the copy can be temporarily configured as "1"

Modify mapred site XML configuration file

# Open file
vi mapred-site.xml
# Modify the label content of the file < configuration > < / configuration > to the following content
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

Note: this configuration specifies that the specific execution engine is "yarn"

Modify Yard site XML configuration file

# Open file
vi yarn-site.xml
# Modify the label content of the file < configuration > < / configuration > to the following content
<configuration>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.nodemanager.env-whitelist</name>
                <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
        </property>
</configuration>

Note: 1
The first property specifies that the calculation framework is "mapreduce_shuffle"
The second property specifies the whitelist

Modify workers profile

# Open file
vi workers
# Change localhost to bigdata01
bigdata01

Note: there is only one machine. Both the master node and the slave node are their own. bigdata01 is the host name

Format HDFS

Enter the hadoop root directory for formatting

# Enter the root directory
cd /usr/local/hadoop-3.2.0
# Format with hdfs command
hdfs namenode -format

Observe the output. If you see the following prompt, it indicates that the format is successful
be careful:
1. When formatting is abnormal, modify the configuration according to the actual prompt, and re execute the formatting after modification
2. If the format is successful, it cannot be reformatted. Repeated formatting will cause problems
3. If you need to perform formatting multiple times, it is recommended to delete: / usr / local / hadoop-3.2.0/hadoop_ Format the contents in the repo temporary directory

Modify script file

Enter the script file directory

cd /usr/local/hadoop-3.2.0/sbin

Modify start DFS SH script file

# Open file
vi start-dfs.sh
# stay # Add the following before the sentence Start hadoop dfs daemons
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Modify stop DFS SH script file

# Open file
vi stop-dfs.sh
# stay # Add the following before the sentence Stop hadoop dfs daemons
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Modify start yarn SH script file

# Open file
vi start-yarn.sh
# stay ## @Add the following before the sentence description usage info
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Modify stop yard SH script file

# Open file
vi stop-yarn.sh
# stay ## @Add the following before the sentence description usage info
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Start and verify

Start hadoop

# Execute start command
start-all.sh

The output of startup process is as follows:
-Using jps to view process information

be careful:
1. When the process is missing, please carefully check the configuration again
2. Log location: / usr/local/hadoop-3.2.0/hadoop_repo/logs/hadoop view the log files corresponding to the process in this directory
3. If the configuration is modified, delete: / usr / local / hadoop-3.2.0/hadoop_ Format the contents in the repo temporary directory
View hdfs service in browser
Access address: http://192.168.xxx.xxx:9870 ，
View the yarn Service in the browser
Access address: http://192.168.xxx.xxx:8088

Stop cluster

Execute stop script

# Execute the following script
stop-all.sh

Execute jps command verification

Topics: Hadoop Distribution hdfs

Programmer Think