Configure Hadoop 2.7.1 + HBase 1.1.5 + mysql8.0.27 + hive1.2.1 + sqoop1.4.6 experimental environment in docker

Posted by porco on Sun, 21 Nov 2021 00:36:54 +0100

preface

Recently, a course experiment needs to be configured with the environment shown in the title, so the author takes this opportunity to use docker to build a returnable container experiment environment to complete the experiment, and also records some errors encountered in the construction process.

1, Environmental description

->Local environment description
Docker version 20.10.11, build dea9396
Ubuntu version 20.04.2

->Description of internal environment of container
Ubuntu version 20.04.2
jdk-8u301-linux-x64

->Component Version Description:
Hadoop 2.7.1
Hbase 1.1.5
Hive 1.2.1
MySQL 8.0.27
Sqoop 1.4.6

2, Construction steps

1. docker startup

First log in to docker and enter the initial ubuntu container
The code is as follows:

hadoop@peryol-ThinkPad-T540p:~$ docker login

If the operation results are as follows, continue:

Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in xxxxxxxx(Password file address)
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

Check the mirror Directory:

hadoop@peryol-ThinkPad-T540p:~$ docker images

REPOSITORY                             TAG       IMAGE ID       CREATED        SIZE
ubuntu/all_enable                      latest    05cf8bf14058   14 hours ago   3GB
ubuntu/hadoop_hbase_hive_enable        latest    b20fe17a3ca8   15 hours ago   2.99GB
ubuntu/hadoop_hbase_enable             latest    23ec3c409a19   23 hours ago   2.79GB
ubuntu/hadoop_hbase_hive_sqoop_mysql   latest    a4ab808bd4d3   24 hours ago   2.42GB
ubuntu/mysql                           latest    bfd8141ba845   25 hours ago   1.52GB
ubuntu/master                          latest    5d47bb6b07a4   2 months ago   2.19GB
ubuntu/slave02                         latest    cd345c90bfdb   2 months ago   2.19GB
ubuntu/slave01                         latest    f27e5e2a8f80   2 months ago   2.19GB
ubuntu/hadoopinstalled                 latest    1d02675e3776   2 months ago   2.19GB
ubuntu/jdkinstalled                    latest    c4887df4b631   2 months ago   907MB
ubuntu                                 latest    fb52e22af1b0   2 months ago   72.8MB
hello-world                            latest    d1165f221234   8 months ago   13.3kB

Interactively enter the initial container for running ubuntu image, and set the shared file build at the same time

docker run -it -v /home/xxxx/build/:/root/build ubuntu

Parameter Description:
-i means interactive
-t means to start a tty, which can be understood as starting a console
-it means interacting with the image ubuntu in the current terminal

-v indicates the specified shared folder directory
-v localhost_path:container_path

root@7622623f361b:/# uname -a
Linux 7622623f361b 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
root@7622623f361b:/#

In this way, we can enter the interior of the container and officially start to build our environment

2. Shared file settings

Before the formal construction, we also need to copy the required component compressed package to the local shared folder build mentioned in the above process
After the copy is successful, you can see the required compressed package by looking at the build directory in the image
As follows:

root@586063e32312:/# cd ~
root@586063e32312:~# ls
build
root@586063e32312:~# cd build
root@586063e32312:~/build# ls
apache-hive-1.2.1-bin.tar.gz  hbase-1.1.5-bin.tar.gz      mysql-connector-java_8.0.27-1ubuntu20.04_all.deb
hadoop-2.7.1.tar.gz           jdk-8u301-linux-x64.tar.gz  sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

3. jdk installation

Unzip the jdk package to the specified directory

root@586063e32312:~/build# tar -zxvf jdk-8u301-linux-x64.tar.gz -C /usr/lib

Configure ~ /. bashrc

vim ~/.bashrc

export JAVA_HOME=/usr/lib/jdk1.8.0_301
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}

After saving and exiting, perform source to make the modification effective

source ~/.bashrc

4. hadoop 2.7.1 pseudo Distribution Construction

Building hadoop is the specific configuration of the two old configuration files core-site.xml and hdfs-site.xml. First, we unzip the hadoop package from build to the path we want

tar -zxvf hadoop-2.7.1.tar.gz -C xx/xxx(Specify path)

After decompression, you can enter the configuration file directory to start the configuration file
The following is an example of extracting it to the / usr/local directory and renaming it hadoop
(full operation):

tar -zxvf hadoop-2.7.1.tar.gz -C /usr/local
cd /usr/local
mv hadoop-2.7.1 hadoop

core-site.xml

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/usr/local/hadoop/tmp</value>
                <description>Abase for other temporary directories.</description>
        </property>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>
</configuration>

hdfs-site.xml

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/usr/local/hadoop/tmp/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/usr/local/hadoop/tmp/dfs/data</value>
        </property>
</configuration>

Finally, for ease of operation, write the path of hadoop to the ~ /. bashrc file, as shown below

export PATH=$PATH:${JAVA_PATH}:/usr/local/hadoop/sbin:/usr/local/hadoop/bin

At this point, the preliminary configuration has been completed. Next, initialize hadoop:
(note that the path variable of hadoop in the previous part must be configured before the following operations can continue)

hadoop namenode -format

After the instruction is executed, the hadoop configuration will be initialized. All the way is yes
There may be some warning s in the end, but it doesn't matter. Check whether there is success

21/11/20 03:31:04 INFO util.GSet: capacity      = 2^15 = 32768 entries
21/11/20 03:31:04 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1796055780-172.17.0.3-1637379064494
21/11/20 03:31:04 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
21/11/20 03:31:04 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/11/20 03:31:04 INFO util.ExitUtil: Exiting with status 0
21/11/20 03:31:04 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at 586063e32312/172.17.0.3
************************************************************/

You can see that there is successfully formatted in the penultimate line. It indicates that the initialization is successful, and the hadoop configuration ends here. You only need to start the service

4. hbase1.1.5 construction

At the beginning, there are three steps: decompression, renaming and writing path

tar -zxvf hbase-1.1.5-bin.tar.gz -C /usr/local/
cd /usr/local
mv hbase-1.1.5 hbase

Finally, write the path in ~ /. bashrc

/usr/local/hbase/sbin:/usr/local/hbase/bin

This completes the initial configuration. Next, configure the hbase-env.sh and hbase-site.xml files in the hbase/conf directory

hbase-env.sh

export JAVA_HOME=/usr/lib/jdk1.8.0_301
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}
export HBASE_CLASSPATH=/usr/local/hadoop/conf
export HBASE_MANAGES_ZK=true

hbase-site.xml

<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://localhost:9000/hbase</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
</configuration>

After the configuration is completed, it is basically completed, but you must actually run hbase once to test whether there is any detail error. At the same time, the starting sequence is to start hdfs first and then hbase
Since the path has been configured, you can directly enter the following command

start-dfs.sh
start-hbase.sh

Here, we must observe whether the three nodes of hbase are started successfully and can survive for a long time
Enter jps command to view:

root@2ecbf78ed0ba:/usr/local/hbase/conf# jps
3124 HQuorumPeer
709 SecondaryNameNode
443 NameNode
3211 HMaster
556 DataNode
3612 Jps
3325 HRegionServer

312432113325 is the hbase node. In particular, 3211HMaster is the most important. Once it hangs, there is a problem with the configuration. It is worth mentioning that hbase is the most prone link to problems, and there are often all kinds of strange error reports. At this time, you should learn to check the log files of hbase. The log files are generally stored in the / hbase/logs directory and can be viewed by using vim

The author also encountered some problems during this configuration. Because jdk11 downloaded from the official website was used in the initial configuration, the version is too high. It is not compatible with Hadoop 2.7.1 and hbase 1.1.5. It can make do when starting hdfs. There are frequent warning s when starting hbase. Although hbase is started, HMaster and HRegionServer nodes will hang up immediately and cannot be used normally. At first, I didn't think it was the JDK problem. I thought it was the wrong port number or storage location. Finally, I checked the log file and found that it was indeed the JDK problem. Therefore, I recommend you here. If you still make an error according to the above steps, the first choice must be to view the log file, which is the most efficient.

5. hive 1.2.1 construction

The third step is to decompress, rename and write the path

tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /usr/local/
cd /usr/local
mv apache-hive-1.2.1-bin hive

Write the path in ~ /. bashrc

/usr/local/hive/bin

Next, configure hive-default.xml
Here we need to rename the file first

mv hive-default.xml.template hive-default.xml
vim hive-default.xml

hive-default.xml:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	 <property>
				<name>javax.jdo.option.ConnectionURL</name>
				<value>jdbc:mysql://localhost:3306/hive? createDatabaseIfNotExist=true</value>
				<description>JDBC connect string for a JDBC metastore</description>
      </property>
      
      <property>
			<name>javax.jdo.option.ConnectionDriverName</name>
			<value>com.mysql.jdbc.Driver</value> 
			<description>Driver class name for a JDBC metastore</description>
      </property>
      
      <property>
			<name>javax.jdo.option.ConnectionUserName</name>
			<value>hive</value>
			<description>username to use against metastore datastore</description>
      </property>
      
      <property>
			<name>javax.jdo.option.ConnectionPassword</name>
			<value>hive</value>
			<description>password to use against metastore database</description>
      </property>	
</configuration>

Other parameters in the file are automatically configured the first time hive is started

In addition, you also need to download MySQL when building hive. The preferred version must be the ancestral version of 5.7. However, the ubuntu version of the image is slightly higher. It is not worth the loss to reinstall mysql. Moreover, the author also used MySQL 8.0.27 for a period of time, so he downloaded MySQL directly from apt

sudo apt-get update  
//Update source
sudo apt-get install mysql-server 
//install

After a brief download, you can view the MySQL status and start mysql

service mysql status
service mysql start

An error may be encountered during startup, as shown below:

su: warning: cannot change directory to /nonexistent: No such file or directory

The solution is as follows

service mysql stop
usermod -d /var/lib/mysql/ mysql
service mysql start

After the problem is solved, you can run normally without warning. Incidentally, if the shared file is set to / tmp, a more serious error will occur and mysql cannot be started at all.

After entering mysql shell, create hive database and grant permissions to hive users at the same time

mysql>create database hive;
mysql>create user 'hive'@'%' identified by 'hive';
mysql>grant all privileges on *.* to 'hive'@'%';

After the operation, download the JDBC driver corresponding to the version from the mysql official website. The website is as follows:
https://dev.mysql.com/downloads/connector/j/
The compressed package on the official website is in deb format, so the dpkg instruction is used for decompression

dpkg -i mysql-connector-java_8.0.27-1ubuntu20.04_all.deb

After decompression, use - L quality to view the decompression directory and copy. jar to hive's lib directory

dpkg -L mysql-connector-java
cd /usr/share/java
//This is the extracted directory found using - L
cp mysql-connector-java-8.0.27.jar /usr/local/hive/lib

At this point, the preparation of hive is completed. You can enter hive after hdfs is started on the command line to start hive directly

start-dfs.sh
hive

However, it should be noted that the first time you start hive, you will wait a very long time. Please wait patiently to avoid unknown errors

6. sqoop 1.4.6 construction

The first three, unzip, rename and write the path

tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /usr/local/
cd /usr/local/
mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop

Write path

/usr/local/sqoop/bin

Next, modify the configuration file

cd /usr/local/sqoop/conf/
cat sqoop-env-template.sh >> sqoop-env.sh
//Replica
vim sqoop-env.sh

Add content to the configuration file

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/hadoop

#set the path to where bin/hbase is available
export HBASE_HOME=/usr/local/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/hive

zookeeper is not configured above, so it is not set
Next, add the JDBC driver to the sqoop directory as you did when you set hive

cp /usr/share/jaba/mysql-connector-java-8.0.27.jar /usr/local/sqoop/lib

Now that the preparations are complete, start mysql to test the connectivity between sqoop and it

service mysql start
sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username hive -P

You will be prompted for a password
The details are as follows:

root@2ecbf78ed0ba:/usr/local/sqoop/bin# sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username hive -P
Warning: /usr/local/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
21/11/19 20:51:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password: 
21/11/19 20:51:27 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
mysql
information_schema
performance_schema
sys
hive

If the mysql database can be displayed at last, the installation is successful

3, Mirror export

After the above setup, the environment is ready. Next, start exporting the image
First, open another terminal and use the commit command to save the image

hadoop@peryol-ThinkPad-T540p:~$ docker ps
//Viewing running containers
CONTAINER ID   IMAGE                        COMMAND   CREATED             STATUS             PORTS     NAMES
2ecbf78ed0ba   ubuntu/hadoop_hbase_enable   "bash"    About an hour ago   Up About an hour             mystifying_perlman
hadoop@peryol-ThinkPad-T540p:~$ docker commit 2ecbf78ed0ba ubuntu/all_enable
//Save the mirror in the container to local
sha256:05cf8bf14058863bb5d08fadb80aa1f02c7b927722098deec1e9c4dca458d83e

Use the images command to view the local image

root@peryol-ThinkPad-T540p:/var/lib/docker/containers# docker images
REPOSITORY                             TAG       IMAGE ID       CREATED             SIZE
ubuntu/all_enable                      latest    05cf8bf14058   20 minutes ago      3GB
ubuntu/hadoop_hbase_hive_enable        latest    b20fe17a3ca8   About an hour ago   2.99GB
ubuntu/hadoop_hbase_enable             latest    23ec3c409a19   10 hours ago        2.79GB
ubuntu/hadoop_hbase_hive_sqoop_mysql   latest    a4ab808bd4d3   11 hours ago        2.42GB
ubuntu/mysql                           latest    bfd8141ba845   12 hours ago        1.52GB
ubuntu/master                          latest    5d47bb6b07a4   2 months ago        2.19GB
ubuntu/slave02                         latest    cd345c90bfdb   2 months ago        2.19GB
ubuntu/slave01                         latest    f27e5e2a8f80   2 months ago        2.19GB
ubuntu/hadoopinstalled                 latest    1d02675e3776   2 months ago        2.19GB
ubuntu/jdkinstalled                    latest    c4887df4b631   2 months ago        907MB
ubuntu                                 latest    fb52e22af1b0   2 months ago        72.8MB
hello-world                            latest    d1165f221234   8 months ago        13.3kB

Use the save command to export the local image

docker save -o ubuntu_allEnable.tar 05cf8bf14058

Finally, the image compressed package is successfully exported. The export process will be slow. Please wait patiently
Copy the exported tar package to other devices, then use the load command to decompress and load the image and run it in docker.

summary

The above is the construction process of the whole environment. The original intention is to reduce the time cost of the experiment. Although the components in hadoop ecology are classic, they often report errors, which is time-consuming and laborious to solve, so I finally switched from local operation to container technology. If there are any errors, please criticize and correct them. If any readers need the image compressed package, just leave an email. Thank you for reading

Topics: MySQL Docker Hadoop HBase hive

Programmer Think