Setting up a hadoop cluster environment with a single machine

Posted by pmzq on Thu, 16 May 2019 05:04:16 +0200

One preparation

First create the folder with the following structure:

weim@weim:~/myopt$ ls
ubuntu1  ubuntu2  ubuntu3

And extract the downloaded JDK (version: 8u172), Hadoop (version: hadoop-2.9.1) into three folders, as follows:

weim@weim:~/myopt$ ls ubuntu1
hadoop  jdk
weim@weim:~/myopt$ ls ubuntu2
hadoop  jdk
weim@weim:~/myopt$ ls ubuntu3
hadoop  jdk

2 Prepare three machines

Use docker to create three machines here, using mirror ubuntu:16.04

weim@weim:~/myopt$ docker image ls
REPOSITORY                                          TAG                 IMAGE ID            CREATED             SIZE
ubuntu                                              16.04               f975c5035748        2 months ago        112MB

Start three Ubuntu containers and load the local/myopt/ubuntu1, /myopt/ubuntu2, /myopt/ubuntu3 into the container's/home/software path, respectively.

ubuntu1

weim@weim:~/myopt$ docker run --hostname ubuntu1 --name ubuntu1 -v /home/weim/myopt/ubuntu1:/home/software -it --rm  ubuntu:16.04 bash
root@ubuntu1:/# ls /home/software/
hadoop  jdk

ubuntu2

weim@weim:~/myopt$ docker run --hostname ubuntu2 --name ubuntu2 -v /home/weim/myopt/ubuntu2:/home/software -it --rm  ubuntu:16.04 bash
root@ubuntu2:/# ls /home/software/
hadoop  jdk

ubuntu3

weim@weim:~/myopt$ docker run --hostname ubuntu3 --name ubuntu3 -v /home/weim/myopt/ubuntu3:/home/software -it --rm  ubuntu:16.04 bash
root@ubuntu3:/# ls /home/software/
hadoop  jdk
root@ubuntu3:/# 

This creates the three most basic machines.

View machine information:

weim@weim:~$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
b4c6de2a4326        ubuntu:16.04        "bash"              About a minute ago   Up About a minute                       ubuntu2
53d1f6389710        ubuntu:16.04        "bash"              About a minute ago   Up About a minute                       ubuntu3
0f210a01d47f        ubuntu:16.04        "bash"              About a minute ago   Up About a minute                       ubuntu1
weim@weim:~$ 
weim@weim:~$ docker inspect --format '{{ .NetworkSettings.IPAddress }}' ubuntu1
172.17.0.2
weim@weim:~$ docker inspect --format '{{ .NetworkSettings.IPAddress }}' ubuntu2
172.17.0.4
weim@weim:~$ docker inspect --format '{{ .NetworkSettings.IPAddress }}' ubuntu3
172.17.0.3
----------------------------------------------------------------------------------
//Here is the ip address of each machine
//Three machines in the same LAN
----------------------------------------------------------------------------------

3. Install some necessary software

Install the necessary software on three machines. First execute the apt-get update command to update the ubuntu software library.

Then install the software vim, openssh-server software.

Four Environment Configuration

a Configure the java environment first, append the java path configuration below the file

root@ubuntu1:/home/software/jdk# vim /etc/profile
---------------------------------------------------------------
//Add the following configuration to the end of the profile file
#set jdk environment  
export JAVA_HOME=/home/software/jdk 
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH  
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
---------------------------------------------------------------

root@ubuntu1:/home/software/jdk# source /etc/profile  
root@ubuntu1:/home/software/jdk# java -version
java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
root@ubuntu1:/home/software/jdk# 

b Set ssh passwordless access

root@ubuntu1:/home/software/jdk# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:hSMrNTp6/1d7L/QZGKdTCPivDJspbY2tcyjke2qjpBI root@ubuntu1
The key's randomart image is:
+---[RSA 2048]----+
|          .      |
|         o .     |
|      + o o . .  |
|     o + o . o o |
|    + . S   . *  |
| E . o . .  .=.. |
|  o ..o . @..o..o|
| . .o. * @.*. o..|
|  .. .++Xo+  . o.|
+----[SHA256]-----+
root@ubuntu1:/home/software/jdk# cd ~/.ssh
root@ubuntu1:~/.ssh# ls
id_rsa  id_rsa.pub
root@ubuntu1:~/.ssh# cat id_rsa.pub >> authorized_keys
root@ubuntu1:~/.ssh# chmod 600 authorized_keys 

Once the configuration is complete, verify that the local machine can be accessed without a password by ssh localhost, and first ensure that the SSH service is started.If it is not started, you can start the service using/etc/init.d/ssh start.

root@ubuntu1:/home/software# /etc/init.d/ssh start
 * Starting OpenBSD Secure Shell server sshd                                                                                                               [ OK ] 
root@ubuntu1:/home/software# ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:chW/KhKqnlQZ8qMxDy8wgSzBIEZ08pdVycjfgJFkVSY.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-41-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

root@ubuntu1:~# exit
logout
Connection to localhost closed.
root@ubuntu1:/home/software# 

Copy the authorized_keys file to the ubuntu2,ubuntu3 container.(Here, I don't know the password for ubuntu2 root, so I don't know how to copy it through the scp command for the time being) It's a compromise.

First enter the ~/.ssh file and copy the authorized_keys file to the / home/software path.

root@ubuntu1:~/.ssh# ls
authorized_keys  id_rsa  id_rsa.pub  known_hosts
root@ubuntu1:~/.ssh# cp authorized_keys /home/software/
root@ubuntu1:~/.ssh# ls /home/software/
authorized_keys  hadoop  jdk
root@ubuntu1:~/.ssh# 

Then back to the local system, you can see the file you just copied under ~/myopt/ubuntu1 path, and copy it to ubuntu2,ubuntu3.

weim@weim:~/myopt/ubuntu1$ ls
authorized_keys  hadoop  jdk
weim@weim:~/myopt/ubuntu1$ sudo cp authorized_keys ../ubuntu2/
weim@weim:~/myopt/ubuntu1$ sudo cp authorized_keys ../ubuntu3/

Then go back to the ubuntu2,ubuntu3 container and copy the file to the ~/.ssh directory.

root@ubuntu2:/home/software# cp authorized_keys ~/.ssh
root@ubuntu2:/home/software# ls ~/.ssh
authorized_keys  id_rsa  id_rsa.pub
root@ubuntu2:/home/software# 

Verify that ubuntu1 can access ubuntu2, ubuntu3 without a password (see ip pass)

root@ubuntu1:~/.ssh# ssh root@172.17.0.3
root@ubuntu1:~/.ssh# ssh root@172.17.0.4

Five hadoop environment configuration

Take ubuntu1 for example, 2 and 3 are the same.

First, create a data save directory for hadoop.

root@ubuntu1:/home/software/hadoop# mkdir data
root@ubuntu1:/home/software/hadoop# cd data/
root@ubuntu1:/home/software/hadoop/data# mkdir tmp
root@ubuntu1:/home/software/hadoop/data# mkdir data
root@ubuntu1:/home/software/hadoop/data# mkdir checkpoint
root@ubuntu1:/home/software/hadoop/data# mkdir name

Enter / home/software/hadoop/etc/hadoop directory

Modify the hadoop-env.sh file to set java

export JAVA_HOME=/home/software/jdk

Configure core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://172.17.0.2:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/data/tmp</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>1440</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>65536</value>
  </property>
</configuration>

Configure hdfs-site.xml

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/data/name</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>67108864</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/home/hadoop/data/data</value>
  </property>
  <property>
    <name>dfs.namenode.checkpoint.dir</name>
    <value>/home/hadoop/data/checkpoint</value>
  </property>
  <property>
    <name>dfs.namenode.handler.count</name>
    <value>10</value>
  </property>
  <property>
    <name>dfs.datanode.handler.count</name>
    <value>10</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address</name>
    <value>172.17.0.2:9000</value>
  </property>
</configuration>

Configure mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

Configure yarn-site.xml

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>172.17.0.2</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>

Configure slaves

172.17.0.2
172.17.0.3
172.17.0.4

Six Starts

In ubuntu1, enter the / home/software/hadoop/bin directory, execute hdfs namenode-format to initialize hdfs

root@ubuntu1:/home/software/hadoop/bin# ./hdfs namenode -format

In ubuntu1, enter the / home/software/hadoop/sbin directory.

Execute start-all.sh

root@ubuntu1:/home/software/hadoop/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [ubuntu1]
The authenticity of host 'ubuntu1 (172.17.0.2)' can't be established.
ECDSA key fingerprint is SHA256:chW/KhKqnlQZ8qMxDy8wgSzBIEZ08pdVycjfgJFkVSY.
Are you sure you want to continue connecting (yes/no)? yes
ubuntu1: Warning: Permanently added 'ubuntu1,172.17.0.2' (ECDSA) to the list of known hosts.
ubuntu1: starting namenode, logging to /home/software/hadoop/logs/hadoop-root-namenode-ubuntu1.out
172.17.0.2: starting datanode, logging to /home/software/hadoop/logs/hadoop-root-datanode-ubuntu1.out
172.17.0.4: starting datanode, logging to /home/software/hadoop/logs/hadoop-root-datanode-ubuntu2.out
172.17.0.3: starting datanode, logging to /home/software/hadoop/logs/hadoop-root-datanode-ubuntu3.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:chW/KhKqnlQZ8qMxDy8wgSzBIEZ08pdVycjfgJFkVSY.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/software/hadoop/logs/hadoop-root-secondarynamenode-ubuntu1.out
starting yarn daemons
starting resourcemanager, logging to /home/software/hadoop/logs/yarn--resourcemanager-ubuntu1.out
172.17.0.2: starting nodemanager, logging to /home/software/hadoop/logs/yarn-root-nodemanager-ubuntu1.out
172.17.0.3: starting nodemanager, logging to /home/software/hadoop/logs/yarn-root-nodemanager-ubuntu3.out
172.17.0.4: starting nodemanager, logging to /home/software/hadoop/logs/yarn-root-nodemanager-ubuntu2.out

View startup

ubuntu1

root@ubuntu1:/home/software/hadoop/sbin# jps
3827 SecondaryNameNode
3686 DataNode
4007 ResourceManager
4108 NodeManager
4158 Jps

ubuntu2

root@ubuntu2:/home/software/hadoop/sbin# jps
3586 Jps
3477 DataNode
3545 NodeManager

ubuntu3

root@ubuntu3:/home/software/hadoop/sbin# jps
3472 DataNode
3540 NodeManager
3582 Jps

Next we visit http://172.17.0.2:50070 and http://172.17.0.2:8088 You can see some information.

Topics: Hadoop ssh JDK Ubuntu