Install Hive3.1.2 Tutorial under Ubuntu

Posted by shiggins on Mon, 22 Nov 2021 18:08:39 +0100

Preface

The installation of Hive needs to be built on top of Hadoop (similar to Hbase), and two other blogs by the author can be referenced for standalone and pseudo-distributed installations of Hadoop:

The Hadoop and Java environments installed in this paper are based on the requirements of Lin Ziyu's "Principles and Applications of Big Data Technology (3rd Edition)", where the Java version is 1.8.0_ 301, Hadoop version 3.2.2, where my operating system environment is Ubuntu 20.04, this installation method also applies to lower versions.

1. Install Hive

1. Download the package and unzip it

The download directory for the official website is as follows: https://dlcdn.apache.org/hive/

The download file is as follows: apache-hive-3.1.2-bin.tar.gz, go to the download directory, uncompress the package:

cd ~/Downloads
sudo tar -zxvf ./apache-hive-3.1.2-bin.tar.gz -C /usr/local 

2. Install Hive

Enter the / usr/local directory, change the folder name, and give the user permissions:

cd /usr/local
sudo mv apache-hive-3.1.2-bin hive     
sudo chown -R hadoop ./hive

3. Configuring environment variables

Modify ~/.bashrc file:

vim ~/.bashrc

Add the following lines:

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_HOME=/usr/local/hadoop

Run the following command to make the configuration take effect immediately:

source ~/.bashrc

4. Modify the configuration file

Modify the configuration hive-default.xml file:

cd /usr/local/hive/conf
sudo mv hive-default.xml.template hive-default.xml

Create a new hive-site.xml file:

sudo vim hive-site.xml

Write the following to the file:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>datanucleus.autoCreateTables</name>
    <value>True</value>
  </property>
</configuration>

Press ESC,: wq, save and exit.

2. MySQL Installation and Configuration

1. Install MySQL

Refer to my blog to install the latest version of MySQL: Ubuntu 20.04 Install MySQL and configure MySQL workbench

2. Install MySQL jdbc package

The MySQL version installed on the blog above is 8.0.27, so we need to download the corresponding version of the MySQL jdbc package with the following links: mysql-connector-java-8.0.27.tar.xz

After downloading, extract the mysql-connector-java-8.0.27-bin.jar package to the specified path:

cd ~/Downloads
tar -zxvf mysql-connector-java-8.0.27.tar.gz   #decompression
cp mysql-connector-java-8.0.27/mysql-connector-java-8.0.27-bin.jar  /usr/local/hive/lib

3. Create a MySQL account for Hive

Start the MySQL service and log in to the shell:

service mysql start #service mysql start
sudo mysql -u root -p  #Login shell interface

New hive database:

mysql> CREATE DATABASE hive;
Query OK, 1 row affected (0.02 sec)

Create a user hive and set the password (hive according to the configuration file here) so that it can connect to the hive database:

mysql> create user 'hive'@'%' identified by  'hive';
Query OK, 0 rows affected (0.03 sec)

mysql> grant all privileges on hive.* to 'hive'@'%' with grant option;
Query OK, 0 rows affected (0.02 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.02 sec)

mysql>

If the password is not matched, execute the following command before executing the above command:

mysql> set global validate_password.policy=LOW;
Query OK, 0 rows affected (0.01 sec)

mysql> set global validate_password.length=4;
Query OK, 0 rows affected (0.00 sec)

mysql>

3. Verify Hive installation and error handling

1. Start Hadoop

cd /usr/lcoal/hadoop
sbin/start-dfs.sh

2. Start hive

cd /usr/lcoal/hive
./bin/schematool -dbType mysql -initSchema
bin/hive

On normal startup, an interactive interface appears as follows:

hive>

Startup with the following error:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument

Check out this blog: Hive startup error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument

3. Run Hive Instance

Under the hive interactive interface, run the following command:

hive> create database if not exists hive; #Create a database
OK
Time taken: 0.59 seconds

hive> show databases; #View Hive Including Database
OK
datazq
default
hive
Time taken: 0.148 seconds, Fetched: 3 row(s)

hive> show databases like 'h.*'; #View databases in Hive starting with h
OK
hive
Time taken: 0.04 seconds, Fetched: 1 row(s)

hive>

summary

The biggest advantage of using Hive is that for non-programmers, you don't have to learn to write Java MapReduce code; you just need users to learn to use HiveQL, which is very easy for SQL-based users.

Topics: MySQL hive Ubuntu