Preface
The installation of Hive needs to be built on top of Hadoop (similar to Hbase), and two other blogs by the author can be referenced for standalone and pseudo-distributed installations of Hadoop:
- Detailed tutorial on single installation of Hadoop under Ubuntu (with required installation package downloads)
- Detailed tutorial on pseudo-distributed installation of Hadoop under Ubuntu
The Hadoop and Java environments installed in this paper are based on the requirements of Lin Ziyu's "Principles and Applications of Big Data Technology (3rd Edition)", where the Java version is 1.8.0_ 301, Hadoop version 3.2.2, where my operating system environment is Ubuntu 20.04, this installation method also applies to lower versions.
1. Install Hive
1. Download the package and unzip it
The download directory for the official website is as follows: https://dlcdn.apache.org/hive/
The download file is as follows: apache-hive-3.1.2-bin.tar.gz, go to the download directory, uncompress the package:
cd ~/Downloads sudo tar -zxvf ./apache-hive-3.1.2-bin.tar.gz -C /usr/local
2. Install Hive
Enter the / usr/local directory, change the folder name, and give the user permissions:
cd /usr/local sudo mv apache-hive-3.1.2-bin hive sudo chown -R hadoop ./hive
3. Configuring environment variables
Modify ~/.bashrc file:
vim ~/.bashrc
Add the following lines:
export HIVE_HOME=/usr/local/hive export PATH=$PATH:$HIVE_HOME/bin export HADOOP_HOME=/usr/local/hadoop
Run the following command to make the configuration take effect immediately:
source ~/.bashrc
4. Modify the configuration file
Modify the configuration hive-default.xml file:
cd /usr/local/hive/conf sudo mv hive-default.xml.template hive-default.xml
Create a new hive-site.xml file:
sudo vim hive-site.xml
Write the following to the file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> <property> <name>datanucleus.autoCreateTables</name> <value>True</value> </property> </configuration>
Press ESC,: wq, save and exit.
2. MySQL Installation and Configuration
1. Install MySQL
Refer to my blog to install the latest version of MySQL: Ubuntu 20.04 Install MySQL and configure MySQL workbench
2. Install MySQL jdbc package
The MySQL version installed on the blog above is 8.0.27, so we need to download the corresponding version of the MySQL jdbc package with the following links: mysql-connector-java-8.0.27.tar.xz
After downloading, extract the mysql-connector-java-8.0.27-bin.jar package to the specified path:
cd ~/Downloads tar -zxvf mysql-connector-java-8.0.27.tar.gz #decompression cp mysql-connector-java-8.0.27/mysql-connector-java-8.0.27-bin.jar /usr/local/hive/lib
3. Create a MySQL account for Hive
Start the MySQL service and log in to the shell:
service mysql start #service mysql start sudo mysql -u root -p #Login shell interface
New hive database:
mysql> CREATE DATABASE hive; Query OK, 1 row affected (0.02 sec)
Create a user hive and set the password (hive according to the configuration file here) so that it can connect to the hive database:
mysql> create user 'hive'@'%' identified by 'hive'; Query OK, 0 rows affected (0.03 sec) mysql> grant all privileges on hive.* to 'hive'@'%' with grant option; Query OK, 0 rows affected (0.02 sec) mysql> flush privileges; Query OK, 0 rows affected (0.02 sec) mysql>
If the password is not matched, execute the following command before executing the above command:
mysql> set global validate_password.policy=LOW; Query OK, 0 rows affected (0.01 sec) mysql> set global validate_password.length=4; Query OK, 0 rows affected (0.00 sec) mysql>
3. Verify Hive installation and error handling
1. Start Hadoop
cd /usr/lcoal/hadoop sbin/start-dfs.sh
2. Start hive
cd /usr/lcoal/hive ./bin/schematool -dbType mysql -initSchema bin/hive
On normal startup, an interactive interface appears as follows:
hive>
Startup with the following error:
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument
Check out this blog: Hive startup error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument
3. Run Hive Instance
Under the hive interactive interface, run the following command:
hive> create database if not exists hive; #Create a database OK Time taken: 0.59 seconds hive> show databases; #View Hive Including Database OK datazq default hive Time taken: 0.148 seconds, Fetched: 3 row(s) hive> show databases like 'h.*'; #View databases in Hive starting with h OK hive Time taken: 0.04 seconds, Fetched: 1 row(s) hive>
summary
The biggest advantage of using Hive is that for non-programmers, you don't have to learn to write Java MapReduce code; you just need users to learn to use HiveQL, which is very easy for SQL-based users.