preface
Recently, a course experiment needs to be configured with the environment shown in the title, so the author takes this opportunity to use docker to build a returnable container experiment environment to complete the experiment, and also records some errors encountered in the construction process.
1, Environmental description
->Local environment description
Docker version 20.10.11, build dea9396
Ubuntu version 20.04.2
->Description of internal environment of container
Ubuntu version 20.04.2
jdk-8u301-linux-x64
->Component Version Description:
Hadoop 2.7.1
Hbase 1.1.5
Hive 1.2.1
MySQL 8.0.27
Sqoop 1.4.6
2, Construction steps
1. docker startup
First log in to docker and enter the initial ubuntu container
The code is as follows:
hadoop@peryol-ThinkPad-T540p:~$ docker login
If the operation results are as follows, continue:
Authenticating with existing credentials... WARNING! Your password will be stored unencrypted in xxxxxxxx(Password file address) Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded
Check the mirror Directory:
hadoop@peryol-ThinkPad-T540p:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE ubuntu/all_enable latest 05cf8bf14058 14 hours ago 3GB ubuntu/hadoop_hbase_hive_enable latest b20fe17a3ca8 15 hours ago 2.99GB ubuntu/hadoop_hbase_enable latest 23ec3c409a19 23 hours ago 2.79GB ubuntu/hadoop_hbase_hive_sqoop_mysql latest a4ab808bd4d3 24 hours ago 2.42GB ubuntu/mysql latest bfd8141ba845 25 hours ago 1.52GB ubuntu/master latest 5d47bb6b07a4 2 months ago 2.19GB ubuntu/slave02 latest cd345c90bfdb 2 months ago 2.19GB ubuntu/slave01 latest f27e5e2a8f80 2 months ago 2.19GB ubuntu/hadoopinstalled latest 1d02675e3776 2 months ago 2.19GB ubuntu/jdkinstalled latest c4887df4b631 2 months ago 907MB ubuntu latest fb52e22af1b0 2 months ago 72.8MB hello-world latest d1165f221234 8 months ago 13.3kB
Interactively enter the initial container for running ubuntu image, and set the shared file build at the same time
docker run -it -v /home/xxxx/build/:/root/build ubuntu
Parameter Description:
-i means interactive
-t means to start a tty, which can be understood as starting a console
-it means interacting with the image ubuntu in the current terminal
-v indicates the specified shared folder directory
-v localhost_path:container_path
root@7622623f361b:/# uname -a Linux 7622623f361b 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux root@7622623f361b:/#
In this way, we can enter the interior of the container and officially start to build our environment
2. Shared file settings
Before the formal construction, we also need to copy the required component compressed package to the local shared folder build mentioned in the above process
After the copy is successful, you can see the required compressed package by looking at the build directory in the image
As follows:
root@586063e32312:/# cd ~ root@586063e32312:~# ls build root@586063e32312:~# cd build root@586063e32312:~/build# ls apache-hive-1.2.1-bin.tar.gz hbase-1.1.5-bin.tar.gz mysql-connector-java_8.0.27-1ubuntu20.04_all.deb hadoop-2.7.1.tar.gz jdk-8u301-linux-x64.tar.gz sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
3. jdk installation
Unzip the jdk package to the specified directory
root@586063e32312:~/build# tar -zxvf jdk-8u301-linux-x64.tar.gz -C /usr/lib
Configure ~ /. bashrc
vim ~/.bashrc
export JAVA_HOME=/usr/lib/jdk1.8.0_301 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin export PATH=$PATH:${JAVA_PATH}
After saving and exiting, perform source to make the modification effective
source ~/.bashrc
4. hadoop 2.7.1 pseudo Distribution Construction
Building hadoop is the specific configuration of the two old configuration files core-site.xml and hdfs-site.xml. First, we unzip the hadoop package from build to the path we want
tar -zxvf hadoop-2.7.1.tar.gz -C xx/xxx(Specify path)
After decompression, you can enter the configuration file directory to start the configuration file
The following is an example of extracting it to the / usr/local directory and renaming it hadoop
(full operation):
tar -zxvf hadoop-2.7.1.tar.gz -C /usr/local cd /usr/local mv hadoop-2.7.1 hadoop
core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property> </configuration>
Finally, for ease of operation, write the path of hadoop to the ~ /. bashrc file, as shown below
export PATH=$PATH:${JAVA_PATH}:/usr/local/hadoop/sbin:/usr/local/hadoop/bin
At this point, the preliminary configuration has been completed. Next, initialize hadoop:
(note that the path variable of hadoop in the previous part must be configured before the following operations can continue)
hadoop namenode -format
After the instruction is executed, the hadoop configuration will be initialized. All the way is yes
There may be some warning s in the end, but it doesn't matter. Check whether there is success
21/11/20 03:31:04 INFO util.GSet: capacity = 2^15 = 32768 entries 21/11/20 03:31:04 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1796055780-172.17.0.3-1637379064494 21/11/20 03:31:04 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted. 21/11/20 03:31:04 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 21/11/20 03:31:04 INFO util.ExitUtil: Exiting with status 0 21/11/20 03:31:04 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at 586063e32312/172.17.0.3 ************************************************************/
You can see that there is successfully formatted in the penultimate line. It indicates that the initialization is successful, and the hadoop configuration ends here. You only need to start the service
4. hbase1.1.5 construction
At the beginning, there are three steps: decompression, renaming and writing path
tar -zxvf hbase-1.1.5-bin.tar.gz -C /usr/local/ cd /usr/local mv hbase-1.1.5 hbase
Finally, write the path in ~ /. bashrc
/usr/local/hbase/sbin:/usr/local/hbase/bin
This completes the initial configuration. Next, configure the hbase-env.sh and hbase-site.xml files in the hbase/conf directory
hbase-env.sh
export JAVA_HOME=/usr/lib/jdk1.8.0_301 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin export PATH=$PATH:${JAVA_PATH} export HBASE_CLASSPATH=/usr/local/hadoop/conf export HBASE_MANAGES_ZK=true
hbase-site.xml
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration>
After the configuration is completed, it is basically completed, but you must actually run hbase once to test whether there is any detail error. At the same time, the starting sequence is to start hdfs first and then hbase
Since the path has been configured, you can directly enter the following command
start-dfs.sh start-hbase.sh
Here, we must observe whether the three nodes of hbase are started successfully and can survive for a long time
Enter jps command to view:
root@2ecbf78ed0ba:/usr/local/hbase/conf# jps 3124 HQuorumPeer 709 SecondaryNameNode 443 NameNode 3211 HMaster 556 DataNode 3612 Jps 3325 HRegionServer
312432113325 is the hbase node. In particular, 3211HMaster is the most important. Once it hangs, there is a problem with the configuration. It is worth mentioning that hbase is the most prone link to problems, and there are often all kinds of strange error reports. At this time, you should learn to check the log files of hbase. The log files are generally stored in the / hbase/logs directory and can be viewed by using vim
The author also encountered some problems during this configuration. Because jdk11 downloaded from the official website was used in the initial configuration, the version is too high. It is not compatible with Hadoop 2.7.1 and hbase 1.1.5. It can make do when starting hdfs. There are frequent warning s when starting hbase. Although hbase is started, HMaster and HRegionServer nodes will hang up immediately and cannot be used normally. At first, I didn't think it was the JDK problem. I thought it was the wrong port number or storage location. Finally, I checked the log file and found that it was indeed the JDK problem. Therefore, I recommend you here. If you still make an error according to the above steps, the first choice must be to view the log file, which is the most efficient.
5. hive 1.2.1 construction
The third step is to decompress, rename and write the path
tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /usr/local/ cd /usr/local mv apache-hive-1.2.1-bin hive
Write the path in ~ /. bashrc
/usr/local/hive/bin
Next, configure hive-default.xml
Here we need to rename the file first
mv hive-default.xml.template hive-default.xml vim hive-default.xml
hive-default.xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive? createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore datastore</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> </configuration>
Other parameters in the file are automatically configured the first time hive is started
In addition, you also need to download MySQL when building hive. The preferred version must be the ancestral version of 5.7. However, the ubuntu version of the image is slightly higher. It is not worth the loss to reinstall mysql. Moreover, the author also used MySQL 8.0.27 for a period of time, so he downloaded MySQL directly from apt
sudo apt-get update //Update source sudo apt-get install mysql-server //install
After a brief download, you can view the MySQL status and start mysql
service mysql status service mysql start
An error may be encountered during startup, as shown below:
su: warning: cannot change directory to /nonexistent: No such file or directory
The solution is as follows
service mysql stop usermod -d /var/lib/mysql/ mysql service mysql start
After the problem is solved, you can run normally without warning. Incidentally, if the shared file is set to / tmp, a more serious error will occur and mysql cannot be started at all.
After entering mysql shell, create hive database and grant permissions to hive users at the same time
mysql>create database hive; mysql>create user 'hive'@'%' identified by 'hive'; mysql>grant all privileges on *.* to 'hive'@'%';
After the operation, download the JDBC driver corresponding to the version from the mysql official website. The website is as follows:
https://dev.mysql.com/downloads/connector/j/
The compressed package on the official website is in deb format, so the dpkg instruction is used for decompression
dpkg -i mysql-connector-java_8.0.27-1ubuntu20.04_all.deb
After decompression, use - L quality to view the decompression directory and copy. jar to hive's lib directory
dpkg -L mysql-connector-java cd /usr/share/java //This is the extracted directory found using - L cp mysql-connector-java-8.0.27.jar /usr/local/hive/lib
At this point, the preparation of hive is completed. You can enter hive after hdfs is started on the command line to start hive directly
start-dfs.sh hive
However, it should be noted that the first time you start hive, you will wait a very long time. Please wait patiently to avoid unknown errors
6. sqoop 1.4.6 construction
The first three, unzip, rename and write the path
tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /usr/local/ cd /usr/local/ mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop
Write path
/usr/local/sqoop/bin
Next, modify the configuration file
cd /usr/local/sqoop/conf/ cat sqoop-env-template.sh >> sqoop-env.sh //Replica vim sqoop-env.sh
Add content to the configuration file
#Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/usr/local/hadoop #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/usr/local/hadoop #set the path to where bin/hbase is available export HBASE_HOME=/usr/local/hbase #Set the path to where bin/hive is available export HIVE_HOME=/usr/local/hive
zookeeper is not configured above, so it is not set
Next, add the JDBC driver to the sqoop directory as you did when you set hive
cp /usr/share/jaba/mysql-connector-java-8.0.27.jar /usr/local/sqoop/lib
Now that the preparations are complete, start mysql to test the connectivity between sqoop and it
service mysql start sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username hive -P
You will be prompted for a password
The details are as follows:
root@2ecbf78ed0ba:/usr/local/sqoop/bin# sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username hive -P Warning: /usr/local/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /usr/local/sqoop/bin/../../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 21/11/19 20:51:25 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Enter password: 21/11/19 20:51:27 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. mysql information_schema performance_schema sys hive
If the mysql database can be displayed at last, the installation is successful
3, Mirror export
After the above setup, the environment is ready. Next, start exporting the image
First, open another terminal and use the commit command to save the image
hadoop@peryol-ThinkPad-T540p:~$ docker ps //Viewing running containers CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2ecbf78ed0ba ubuntu/hadoop_hbase_enable "bash" About an hour ago Up About an hour mystifying_perlman hadoop@peryol-ThinkPad-T540p:~$ docker commit 2ecbf78ed0ba ubuntu/all_enable //Save the mirror in the container to local sha256:05cf8bf14058863bb5d08fadb80aa1f02c7b927722098deec1e9c4dca458d83e
Use the images command to view the local image
root@peryol-ThinkPad-T540p:/var/lib/docker/containers# docker images REPOSITORY TAG IMAGE ID CREATED SIZE ubuntu/all_enable latest 05cf8bf14058 20 minutes ago 3GB ubuntu/hadoop_hbase_hive_enable latest b20fe17a3ca8 About an hour ago 2.99GB ubuntu/hadoop_hbase_enable latest 23ec3c409a19 10 hours ago 2.79GB ubuntu/hadoop_hbase_hive_sqoop_mysql latest a4ab808bd4d3 11 hours ago 2.42GB ubuntu/mysql latest bfd8141ba845 12 hours ago 1.52GB ubuntu/master latest 5d47bb6b07a4 2 months ago 2.19GB ubuntu/slave02 latest cd345c90bfdb 2 months ago 2.19GB ubuntu/slave01 latest f27e5e2a8f80 2 months ago 2.19GB ubuntu/hadoopinstalled latest 1d02675e3776 2 months ago 2.19GB ubuntu/jdkinstalled latest c4887df4b631 2 months ago 907MB ubuntu latest fb52e22af1b0 2 months ago 72.8MB hello-world latest d1165f221234 8 months ago 13.3kB
Use the save command to export the local image
docker save -o ubuntu_allEnable.tar 05cf8bf14058
Finally, the image compressed package is successfully exported. The export process will be slow. Please wait patiently
Copy the exported tar package to other devices, then use the load command to decompress and load the image and run it in docker.
summary
The above is the construction process of the whole environment. The original intention is to reduce the time cost of the experiment. Although the components in hadoop ecology are classic, they often report errors, which is time-consuming and laborious to solve, so I finally switched from local operation to container technology. If there are any errors, please criticize and correct them. If any readers need the image compressed package, just leave an email. Thank you for reading