apache hive 3.x deployment

Posted by srhino on Mon, 24 Jan 2022 11:52:35 +0100

About hive

Hive is a data warehouse framework built on Hadoop, which can map structured data files into a database table and provide SQL like query function. Hive can convert SQL into MapReduce tasks for operation, and HDFS provides data storage at the bottom. Hive was originally developed by Facebook and later transferred to the Apache Software Foundation as an Apache open source project.


Hive relies on hadoop, uses hdfs to store data, prepares a node, and deploys hive.

Installing the java environment

Taking the binary installation of OpenJDK8 as an example, create the openjdk installation directory:

mkdir /opt/openjdk

Download openjdk

wget https://mirrors.tuna.tsinghua.edu.cn/AdoptOpenJDK/8/jdk/x64/linux/OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz

Decompression installation

tar -zxvf OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz -C /opt/openjdk --strip=1

Configure environment variables

cat > /etc/profile.d/openjdk.sh <<'EOF'
export JAVA_HOME=/opt/openjdk
export PATH=$PATH:$JAVA_HOME/bin
EOF

source /etc/profile

Confirm successful installation

java -version

Installing hadoop

Taking the single machine pseudo distributed hadoop cluster installation as an example, create the hadoop installation directory

mkdir -p /opt/hadoop

Download hadoop binaries:

wget https://mirrors.aliyun.com/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

Unzip hadoop

tar -zxvf hadoop-3.3.0.tar.gz -C /opt/hadoop --strip=1

Configure environment variables

cat > /etc/profile.d/hadoop.sh <<'EOF'
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
EOF

source /etc/profile

View hadoop version

hadoop version

Configure SSH password free

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Modify Hadoop env SH, modify the environment variable JAVA_HOME is the absolute path and specifies the user as root

cat >> /opt/hadoop/etc/hadoop/hadoop-env.sh <<EOF
export JAVA_HOME=$JAVA_HOME
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
EOF

Modify yarn env SH change the user to root

cat >> /opt/hadoop/etc/hadoop/yarn-env.sh <<EOF
export YARN_REGISTRYDNS_SECURE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
EOF

Modify Hadoop core site XML configuration file

cat > /opt/hadoop/etc/hadoop/core-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/hadoop/tmp</value>
    </property>
</configuration>
EOF

Modify Hadoop HDFS site XML configuration file

cat > /opt/hadoop/etc/hadoop/hdfs-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
EOF

Modify the yarn configuration file
Modify Hadoop mapred site XML configuration file

cat > $HADOOP_HOME/etc/hadoop/mapred-site.xml <<'EOF'
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>
EOF

Modify Hadoop yarn site Xmll configuration file

cat > $HADOOP_HOME/etc/hadoop/yarn-site.xml <<'EOF'
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>
EOF

Format hadoop hdfs file system:

hdfs namenode -format

Managing hadoop services using systemd

cat > /usr/lib/systemd/system/hadoop.service <<EOF  
[Unit]  
Description=hadoop  
After=syslog.target network.target  
  
[Service]  
User=root  
Group=root  
Type=oneshot  
ExecStart=/opt/hadoop/sbin/start-all.sh  
ExecStop=/opt/hadoop/sbin/stop-all.sh  
RemainAfterExit=yes  
  
[Install]  
WantedBy=multi-user.target  
EOF

Start the hadoop service and set it to start

systemctl enable --now hadoop

View the running status of hadoop service

[root@master ~]# systemctl status hadoop
● hadoop.service - hadoop
   Loaded: loaded (/usr/lib/systemd/system/hadoop.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2021-06-23 11:31:50 CST; 1h 17min ago
  Process: 309739 ExecStop=/opt/hadoop/sbin/stop-all.sh (code=exited, status=0/SUCCESS)
  Process: 318250 ExecStart=/opt/hadoop/sbin/start-all.sh (code=exited, status=0/SUCCESS)
 Main PID: 318250 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 49791)
   Memory: 0B
   CGroup: /system.slice/hadoop.service

Jun 23 11:31:39 master start-all.sh[318250]: Starting resourcemanager
Jun 23 11:31:39 master su[319099]: (to root) root on none
Jun 23 11:31:39 master su[319099]: pam_unix(su-l:session): session opened for user root by (uid=0)
Jun 23 11:31:39 master start-all.sh[318250]: Last login: Wed Jun 23 11:31:23 CST 2021
Jun 23 11:31:41 master su[319099]: pam_unix(su-l:session): session closed for user root
Jun 23 11:31:41 master start-all.sh[318250]: Starting nodemanagers
Jun 23 11:31:42 master su[319186]: (to root) root on none
Jun 23 11:31:42 master su[319186]: pam_unix(su-l:session): session opened for user root by (uid=0)
Jun 23 11:31:42 master start-all.sh[318250]: Last login: Wed Jun 23 11:31:39 CST 2021
Jun 23 11:31:50 master systemd[1]: Started hadoop.

View startup process

# jps
3711498 NameNode
3712428 Jps
3711661 DataNode
3712002 SecondaryNameNode

Browse the Web interface to access NameNode:

http://localhost:9870/

Install MySQL

docker run -d --name mysql \
    --restart always \
    -p 3306:3306 \
    -e MYSQL_ROOT_PASSWORD=123456 \
    -v mysql:/var/lib/mysql \
    mysql

Install hive

Create hive installation directory

mkdir -p /opt/hive

Alicloud downloads hadoop:

wget https://archive.apache.org/dist/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

Unzip hive

tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/hive --strip=1

Configure environment variables

cat >> /etc/profile.d/hive.sh <<'EOF'
export HIVE_HOME=/opt/hive
export PATH=$HIVE_HOME/bin:$PATH
EOF

source /etc/profile

Modify hive profile

Modify HIV Env SH file

cp /opt/hive/conf/{hive-env.sh.template,hive-env.sh}

cat > /opt/hive/conf/hive-env.sh <<EOF
export JAVA_HOME=$JAVA_HOME
export HADOOP_HOME=/opt/hadoop
export HIVE_CONF_DIR=/opt/hive/conf
EOF

Copy hive site XML file:

cp /opt/hive/conf/{hive-default.xml.template,hive-site.xml}

Use the following command to replace for &# with for to prevent encoding problems during initialization:

sed -i 's/for&#/for/g' /opt/hive/conf/hive-site.xml

Modify hive site Values of parameters related to XML files

cat >/opt/hive/conf/hive-site.xml<<'EOF'
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://dbserver:3306/hive?createDatabaseIfNotExist=true</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>

  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/hive</value>
  </property>

  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/${hive.session.id}_resources</value>
  </property>

  <property>
    <name>hive.querylog.location</name>
    <value>/tmp/hive</value>
  </property>

  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/tmp/hive/operation_logs</value>
  </property>
</configuration>
EOF

reference resources: https://github.com/apache/hive/blob/master/data/conf/hive-site.xml

Launch and verify Hive

1. Before starting Hive, download the corresponding JDBC driver , and put it in the / opt/hive/lib directory.

mysql download link: https://dev.mysql.com/downloads/connector/j/

wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-8.0.25.tar.gz
tar -zxvf mysql-connector-java-8.0.25.tar.gz
cp mysql-connector-java-8.0.25/mysql-connector-java-8.0.25.jar $HIVE_HOME/lib

Create Hive data storage directory.

hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse

Create Hive log directory.

mkdir -p /opt/hive/log/
touch /opt/hive/log/hiveserver.log
touch /opt/hive/log/hiveserver.err

2. Initialize Hive and handle the error message first

mv /opt/hive/lib/guava-19.0.jar{,.bak}
cp /opt/hadoop/share/hadoop/hdfs/lib/guava-27.0-jre.jar /opt/hive/lib/

Perform initialization

schematool -dbType mysql -initSchema

3. Start hive metastore.

nohup hive --service metastore -p 9083 &

systemd manage metastore

cat > /etc/systemd/system/hive-meta.service <<EOF
[Unit] 
Description=Hive metastore 
After=network.target 
 
[Service] 
User=root
Group=root
ExecStart=/opt/hive/bin/hive --service metastore 
 
[Install] 
WantedBy=multi-user.target
EOF

Configure hive metastore boot

systemctl enable --now hive-meta

4. Start hive server2.

nohup hiveserver2 1>/opt/hive/log/hiveserver.log 2>/opt/hive/log/hiveserver.err &

View the startup progress.

# tail -f /opt/hive/log/hiveserver.err
nohup: ignoring input
 which: no hbase in (/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
 2021-01-18 11:32:22: Starting HiveServer2
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
 SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
 Hive Session ID = 824030a3-2afe-488c-a2fa-7d98cfc8f7bd
 Hive Session ID = 1031e326-2088-4025-b2e2-c9bb1e81b03d
 Hive Session ID = 32203873-49ad-44b7-987c-da1aae8b3375
 Hive Session ID = d7be9389-11c6-46cb-90d6-a91a2d5199b8
 OK

View ports.

netstat -anp|grep 10000

The startup is successful as shown below.

tcp6 0 0 :::10000 :::* LISTEN 27800/java

systemd management hive server2

cat > /etc/systemd/system/hive-server2.service <<EOF
[Unit] 
Description=hive-server2
After=network.target 
 
[Service] 
User=root
Group=root
ExecStart=/opt/hive/bin/hive --service hiveserver2
 
[Install] 
WantedBy=multi-user.target
EOF

Configure hive metastore boot

systemctl enable --now hive-server2

Connect with beeline on server1, and the echo information is as follows:

[root@server1 ~]# beeline -u jdbc:hive2://server1:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://server1:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://server1:10000>

5. View the created database. The verification function is as follows: successful.

0: jdbc:hive2://server1:10000> show databases;
INFO  : Compiling command(queryId=root_20210615105459_7420549e-49ea-40ae-a2d2-3fa263a80047): show databases
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=root_20210615105459_7420549e-49ea-40ae-a2d2-3fa263a80047); Time taken: 2.032 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20210615105459_7420549e-49ea-40ae-a2d2-3fa263a80047): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=root_20210615105459_7420549e-49ea-40ae-a2d2-3fa263a80047); Time taken: 0.067 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (3.1 seconds)

6. Exit Hive interface.

quit;

Topics: Hadoop