About hive
Hive is a data warehouse framework built on Hadoop, which can map structured data files into a database table and provide SQL like query function. Hive can convert SQL into MapReduce tasks for operation, and HDFS provides data storage at the bottom. Hive was originally developed by Facebook and later transferred to the Apache Software Foundation as an Apache open source project.
Hive relies on hadoop, uses hdfs to store data, prepares a node, and deploys hive.
Installing the java environment
Taking the binary installation of OpenJDK8 as an example, create the openjdk installation directory:
mkdir /opt/openjdk
Download openjdk
wget https://mirrors.tuna.tsinghua.edu.cn/AdoptOpenJDK/8/jdk/x64/linux/OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz
Decompression installation
tar -zxvf OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz -C /opt/openjdk --strip=1
Configure environment variables
cat > /etc/profile.d/openjdk.sh <<'EOF' export JAVA_HOME=/opt/openjdk export PATH=$PATH:$JAVA_HOME/bin EOF source /etc/profile
Confirm successful installation
java -version
Installing hadoop
Taking the single machine pseudo distributed hadoop cluster installation as an example, create the hadoop installation directory
mkdir -p /opt/hadoop
Download hadoop binaries:
wget https://mirrors.aliyun.com/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
Unzip hadoop
tar -zxvf hadoop-3.3.0.tar.gz -C /opt/hadoop --strip=1
Configure environment variables
cat > /etc/profile.d/hadoop.sh <<'EOF' export HADOOP_HOME=/opt/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin EOF source /etc/profile
View hadoop version
hadoop version
Configure SSH password free
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys
Modify Hadoop env SH, modify the environment variable JAVA_HOME is the absolute path and specifies the user as root
cat >> /opt/hadoop/etc/hadoop/hadoop-env.sh <<EOF export JAVA_HOME=$JAVA_HOME export HDFS_NAMENODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export HDFS_DATANODE_USER=root EOF
Modify yarn env SH change the user to root
cat >> /opt/hadoop/etc/hadoop/yarn-env.sh <<EOF export YARN_REGISTRYDNS_SECURE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root EOF
Modify Hadoop core site XML configuration file
cat > /opt/hadoop/etc/hadoop/core-site.xml <<EOF <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> </configuration> EOF
Modify Hadoop HDFS site XML configuration file
cat > /opt/hadoop/etc/hadoop/hdfs-site.xml <<EOF <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> EOF
Modify the yarn configuration file
Modify Hadoop mapred site XML configuration file
cat > $HADOOP_HOME/etc/hadoop/mapred-site.xml <<'EOF' <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration> EOF
Modify Hadoop yarn site Xmll configuration file
cat > $HADOOP_HOME/etc/hadoop/yarn-site.xml <<'EOF' <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration> EOF
Format hadoop hdfs file system:
hdfs namenode -format
Managing hadoop services using systemd
cat > /usr/lib/systemd/system/hadoop.service <<EOF [Unit] Description=hadoop After=syslog.target network.target [Service] User=root Group=root Type=oneshot ExecStart=/opt/hadoop/sbin/start-all.sh ExecStop=/opt/hadoop/sbin/stop-all.sh RemainAfterExit=yes [Install] WantedBy=multi-user.target EOF
Start the hadoop service and set it to start
systemctl enable --now hadoop
View the running status of hadoop service
[root@master ~]# systemctl status hadoop ● hadoop.service - hadoop Loaded: loaded (/usr/lib/systemd/system/hadoop.service; enabled; vendor preset: disabled) Active: active (exited) since Wed 2021-06-23 11:31:50 CST; 1h 17min ago Process: 309739 ExecStop=/opt/hadoop/sbin/stop-all.sh (code=exited, status=0/SUCCESS) Process: 318250 ExecStart=/opt/hadoop/sbin/start-all.sh (code=exited, status=0/SUCCESS) Main PID: 318250 (code=exited, status=0/SUCCESS) Tasks: 0 (limit: 49791) Memory: 0B CGroup: /system.slice/hadoop.service Jun 23 11:31:39 master start-all.sh[318250]: Starting resourcemanager Jun 23 11:31:39 master su[319099]: (to root) root on none Jun 23 11:31:39 master su[319099]: pam_unix(su-l:session): session opened for user root by (uid=0) Jun 23 11:31:39 master start-all.sh[318250]: Last login: Wed Jun 23 11:31:23 CST 2021 Jun 23 11:31:41 master su[319099]: pam_unix(su-l:session): session closed for user root Jun 23 11:31:41 master start-all.sh[318250]: Starting nodemanagers Jun 23 11:31:42 master su[319186]: (to root) root on none Jun 23 11:31:42 master su[319186]: pam_unix(su-l:session): session opened for user root by (uid=0) Jun 23 11:31:42 master start-all.sh[318250]: Last login: Wed Jun 23 11:31:39 CST 2021 Jun 23 11:31:50 master systemd[1]: Started hadoop.
View startup process
# jps 3711498 NameNode 3712428 Jps 3711661 DataNode 3712002 SecondaryNameNode
Browse the Web interface to access NameNode:
http://localhost:9870/
Install MySQL
docker run -d --name mysql \ --restart always \ -p 3306:3306 \ -e MYSQL_ROOT_PASSWORD=123456 \ -v mysql:/var/lib/mysql \ mysql
Install hive
Create hive installation directory
mkdir -p /opt/hive
Alicloud downloads hadoop:
wget https://archive.apache.org/dist/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
Unzip hive
tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/hive --strip=1
Configure environment variables
cat >> /etc/profile.d/hive.sh <<'EOF' export HIVE_HOME=/opt/hive export PATH=$HIVE_HOME/bin:$PATH EOF source /etc/profile
Modify hive profile
Modify HIV Env SH file
cp /opt/hive/conf/{hive-env.sh.template,hive-env.sh} cat > /opt/hive/conf/hive-env.sh <<EOF export JAVA_HOME=$JAVA_HOME export HADOOP_HOME=/opt/hadoop export HIVE_CONF_DIR=/opt/hive/conf EOF
Copy hive site XML file:
cp /opt/hive/conf/{hive-default.xml.template,hive-site.xml}
Use the following command to replace for &# with for to prevent encoding problems during initialization:
sed -i 's/for&#/for/g' /opt/hive/conf/hive-site.xml
Modify hive site Values of parameters related to XML files
cat >/opt/hive/conf/hive-site.xml<<'EOF' <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://dbserver:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/hive</value> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/tmp/${hive.session.id}_resources</value> </property> <property> <name>hive.querylog.location</name> <value>/tmp/hive</value> </property> <property> <name>hive.server2.logging.operation.log.location</name> <value>/tmp/hive/operation_logs</value> </property> </configuration> EOF
reference resources: https://github.com/apache/hive/blob/master/data/conf/hive-site.xml
Launch and verify Hive
1. Before starting Hive, download the corresponding JDBC driver , and put it in the / opt/hive/lib directory.
mysql download link: https://dev.mysql.com/downloads/connector/j/
wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-8.0.25.tar.gz tar -zxvf mysql-connector-java-8.0.25.tar.gz cp mysql-connector-java-8.0.25/mysql-connector-java-8.0.25.jar $HIVE_HOME/lib
Create Hive data storage directory.
hadoop fs -mkdir /tmp hadoop fs -mkdir -p /user/hive/warehouse hadoop fs -chmod g+w /tmp hadoop fs -chmod g+w /user/hive/warehouse
Create Hive log directory.
mkdir -p /opt/hive/log/ touch /opt/hive/log/hiveserver.log touch /opt/hive/log/hiveserver.err
2. Initialize Hive and handle the error message first
mv /opt/hive/lib/guava-19.0.jar{,.bak} cp /opt/hadoop/share/hadoop/hdfs/lib/guava-27.0-jre.jar /opt/hive/lib/
Perform initialization
schematool -dbType mysql -initSchema
3. Start hive metastore.
nohup hive --service metastore -p 9083 &
systemd manage metastore
cat > /etc/systemd/system/hive-meta.service <<EOF [Unit] Description=Hive metastore After=network.target [Service] User=root Group=root ExecStart=/opt/hive/bin/hive --service metastore [Install] WantedBy=multi-user.target EOF
Configure hive metastore boot
systemctl enable --now hive-meta
4. Start hive server2.
nohup hiveserver2 1>/opt/hive/log/hiveserver.log 2>/opt/hive/log/hiveserver.err &
View the startup progress.
# tail -f /opt/hive/log/hiveserver.err nohup: ignoring input which: no hbase in (/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin) 2021-01-18 11:32:22: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 824030a3-2afe-488c-a2fa-7d98cfc8f7bd Hive Session ID = 1031e326-2088-4025-b2e2-c9bb1e81b03d Hive Session ID = 32203873-49ad-44b7-987c-da1aae8b3375 Hive Session ID = d7be9389-11c6-46cb-90d6-a91a2d5199b8 OK
View ports.
netstat -anp|grep 10000
The startup is successful as shown below.
tcp6 0 0 :::10000 :::* LISTEN 27800/java
systemd management hive server2
cat > /etc/systemd/system/hive-server2.service <<EOF [Unit] Description=hive-server2 After=network.target [Service] User=root Group=root ExecStart=/opt/hive/bin/hive --service hiveserver2 [Install] WantedBy=multi-user.target EOF
Configure hive metastore boot
systemctl enable --now hive-server2
Connect with beeline on server1, and the echo information is as follows:
[root@server1 ~]# beeline -u jdbc:hive2://server1:10000 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://server1:10000 Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.2 by Apache Hive 0: jdbc:hive2://server1:10000>
5. View the created database. The verification function is as follows: successful.
0: jdbc:hive2://server1:10000> show databases; INFO : Compiling command(queryId=root_20210615105459_7420549e-49ea-40ae-a2d2-3fa263a80047): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=root_20210615105459_7420549e-49ea-40ae-a2d2-3fa263a80047); Time taken: 2.032 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=root_20210615105459_7420549e-49ea-40ae-a2d2-3fa263a80047): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=root_20210615105459_7420549e-49ea-40ae-a2d2-3fa263a80047); Time taken: 0.067 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager +----------------+ | database_name | +----------------+ | default | +----------------+ 1 row selected (3.1 seconds)
6. Exit Hive interface.
quit;