Build a Hadoop cluster using three alicloud servers (collection of cloud computing experiments)

Posted by mort on Wed, 15 Dec 2021 18:59:18 +0100

It's silly to use Alibaba cloud server to set up this hadoop cluster, which wastes me nearly a week.

1. Preparation

Prepare three alicloud servers

Namenode	121.196.224.191
Datanode1	121.196.226.12
Datanode2	47.96.42.166

Port setting (pit behind)
Namenode
Open 9000 ports manually through security groups

View the private ip address of the Namenode on the console

Datanode1 and Datanode2
Open 50010 port through security group

2. Configuration environment

We only need to configure the Namenode environment, and other datanode s can use the master transfer method
Configuring the Namenode environment

Download and unzip the required packages

ssh 121.196.224.191    //Connect Namenode node
mkdir  /home/hadoop
cd /home/hadoop/
wget http://denglab.org/cloudcomputing/download/hadoop.tar.gz
tar -zxvf hadoop.tar.gz
mkdir /home/jdk
cd /home/jdk/
wget http://denglab.org/cloudcomputing/download/jdk.tar.gz
tar -zxvf jdk.tar.gz

Set bash profile
vi ~/.bash_profile
Replace the original configuration with the following code

#export PATH
export JAVA_HOME=/home/jdk/jdk1.7.0_75
export JAVA_JRE=/home/jdk/jdk1.7.0_75/jre
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0

# path
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH=$JAVA_HOME/bin:$PATH

source ~/.bash_profile / / save the configuration

Modify the Hadoop configuration file (replace the original configuration below)
cd $HADOOP_HOME
mkdir namenode
mkdir datanode
cd etc/hadoop/
vi core-site.xml

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>hdfs://Namenode:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/hadoop/hadoop-2.6.0/tmp/hadoop-${user.name}</value>
	</property>
</configuration>

vi hdfs-site.xml

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/home/hadoop/hadoop-2.6.0/namenode/name_1, /home/hadoop/hadoop-2.6.0/namenode/name_2</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/home/hadoop/hadoop-2.6.0/datanode/data_1, /home/hadoop/hadoop-2.6.0/datanode/data_2</value>
	</property>
</configuration>

vi mapred-site.xml

<configuration>
	<property>
		<name>mapred.job.tracker</name>
		<value>Namenode:9001</value>
	</property>
</configuration>

vi hadoop-env.sh

export JAVA_HOME=/home/jdk/jdk1.7.0_75

vi slaves
Set data node

Datanode1
Datanode2

Set password free access between 3 nodes
Change host name
ssh 121.196.224.191
vi /etc/hostname

Namenode

vi /etc/hosts

121.196.224.191	Namenode  
121.196.226.12	Datanode1
47.96.42.166	Datanode2

ssh 121.196.226.12
vi /etc/hostname

Datanode1

ssh 47.96.42.166
vi /etc/hostname

Datanode2

ssh Namenode
vi /etc/hosts
Change Namenode to private ip

Copy the Namenode node configuration to the datanode

scp /etc/hosts root@121.196.226.12:/etc/hosts
scp /etc/hosts root@47.96.42.166:/etc/hosts
scp ~/.bash_profile root@121.196.226.12:~/.bash_profile
scp ~/.bash_profile root@47.96.42.166:~/.bash_profile

Key generation
SSH keygen - t RSA / / click enter three times
ssh Datanode1
SSH keygen - t RSA / / click enter three times
scp /root/.ssh/id_rsa.pub root@Namenode:/root/.ssh/id_rsa.pub.Datanode1
ssh Datanode2
SSH keygen - t RSA / / click enter three times
scp /root/.ssh/id_rsa.pub root@Namenode:/root/.ssh/id_rsa.pub.Datanode2

Key interaction
ssh Namenode
cd /root/.ssh
cat id_rsa.pub >> authorized_keys
cat id_rsa.pub.Datanode1 >> authorized_keys
cat id_rsa.pub.Datanode2 >> authorized_keys
chmod 644 authorized_keys
scp ~/.ssh/authorized_keys root@Datanode1:/root/.ssh/authorized_keys
scp ~/.ssh/authorized_keys root@Datanode2:/root/.ssh/authorized_keys

Test connection
ssh Datanode1
ssh Datanode2
ssh Namenode

Start and test hadoop
Copy hadoop, jdk files and configuration from namenode to datanode

scp -r /home/hadoop/ root@Datanode1:/home/hadoop
scp -r /home/hadoop/ root@Datanode2:/home/hadoop
scp -r /home/jdk/ root@Datanode1:/home/jdk
scp -r /home/jdk/ root@Datanode2:/home/jdk

cd $HADOOP_HOME
cd etc/hadoop
hdfs namenode -format   //Do not initialize hadoop multiple times, which may cause the generated id not to be unique
start-all.sh

Viewing HDFS file system status
hdfs dfsadmin -report
jps

Test HDFS file system
hadoop fs -ls /
vi aaa.txt
hadoop fs -put aaa.txt /aaa.txt

Congratulations on being here. Next, let's look at the more uncomfortable pit

The namenode is not started. jps cannot find the namenode. Run hdfs dfsadmin -report and report the following error
Retrying connect to server: hadoop/121.196.224.191:9000. Already tried 0 time(s);
Reason: after copying the configuration to the datanode node, the ip address of the namenode in the / etc/hosts file in the namenode is not changed to the private ip address
See this article for details Blog
Datanode is not started. Check the datanode log file and find the following error
Retrying connect to server: Namenode/121.196.224.191:9000. Already tried 0 time(s);
Reason: the 9000 port of namenode is not opened through the security group rule setting
hadoop fs -put aaa.txt /aaa.txt this step takes a long time to report this error
INFO hdfs.DFSClient: Excluding datanode 47.96.42.166:50010

Cause: not open 47.96 42.166 the 50010 port of datanode2 can be opened through security group rule settings.
4. If datanode is not started at this time, Hadoop daemon is available SH start datanode

Topics: Hadoop cloud computing Alibaba Cloud

Programmer Think

Build a Hadoop cluster using three alicloud servers (collection of cloud computing experiments)

1. Preparation

2. Configuration environment

Hot Topics