Installation and use of hive

Posted by simenss on Thu, 27 Jan 2022 12:27:34 +0100

1. Install hive

1.1 installing java

Java must be installed on the system before Hive can be installed. Use the following command to verify whether Java has been installed. If Java has been installed on the system, you can see the following response

stay java official website Download and install as a fool

1.2 hadoop installation

download hadoop
Unzip it into the specified directory and configure the environment variables (mac system) as follows

Execute VIM ~ / bash_ Modify the environment variable of profile (it should be ~ /. bashrc under linux)

export HADOOP_HOME=/usr/local/hadoop 
export HADOOP_MAPRED_HOME=$HADOOP_HOME 
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

source ~/.bash_profile applies the current changes to the running system

Verify with the following command

hadoop configuration
Enter Hadoop configuration file directory $HADOOP_HOME/etc/hadoop

In order to use java to develop Hadoop projects, Java must be replaced by Java in the system_ Home value resets Hadoop env Java environment variables in SH file

First use / usr/libexec/java_home -V view java installation path

Modify Hadoop env SH file
Modify core site XML file, core site The XML file contains the following information, such as the memory allocated to the file system using the Hadoop instance, the memory limit port number for storing data, and the size of the read / write buffer

Open core site XML file and add the following attributes between tags
```
<configuration>

<property> 
  <name>fs.default.name</name> 
  <value>hdfs://localhost:9000</value> 
</property>

</configuration>
```
Modify HDFS site XML file, HDFS site The XML file contains the following information, such as the value of the copied data, the path of the name node, and the path of the data node of the local file system

More configuration References: hive installation

Verify hadoop installation

Verifying hadoop dfs
```
start-dfs.sh
```
Crater record: connection rejected error
reference resources 1. Guidelines for building Hadoop development environment on Mac OS X
2. Permission denied (publickey,password,keyboard-interactive).
Validate the yarn script
```
start-yarn.sh
```
The default port number for accessing Hadoop is 50070 (the default port of hadoop 3 is 9870). Use the following website to obtain the browser Hadoop service
```
http://localhost:9870/
```

1.3 hive installation

Download hive
download hive 2.3.9 , as shown in the figure below

Configure hive environment variables
Via VIM ~ / bash_ Profile, add the following statement

export HIVE_HOME=/Applications/apache-hive-2.3.9-bin
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/Applications/hadoop-3.3.1/lib/*:.
export CLASSPATH=$CLASSPATH:/Applications/apache-hive-2.3.9-bin/lib/*:.

Execute source ~ / bash_ Profile to make it effective

Configure hive

cd $HIVE_HOME/conf
cp hive-env.sh.template hive-env.sh

Edit HIV Env SH file add the following line

export HADOOP_HOME=/Applications/hadoop-3.3.1

1.4 download and install Apache Derby

As above, Hive installation completed successfully. Now, you need an external database server to configure the Metastore. We use the Apache Derby database.

2. hive use

Create database

CREATE DATABASE [IF NOT EXISTS] <database name>;

query data base
```
SHOW DATABASES;
```

Delete database

DROP DATABASE [IF EXISTS] <database name>;

Create table

CREATE TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type, col_name2 data_type2, ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]

For example, create a table of employees
CREATE TABLE IF NOT EXISTS employee ( id int, name String, salary String, destination String);

insert data
After creating a table in SQL, you can use INSERT statement to INSERT data. In Hive, you can use LOAD DATA statement to INSERT data
```
LOAD DATA [LOCAL] INPATH <filepath> [OVERWRITE] INTO TABLE <tablename>
```
LOCAL is the identifier specifying the LOCAL path; OVERWRITE overwrites the data in the table;

Change the name of the table

ALTER TABLE <name> RENAME TO <new_name>

Add a column

ALTER TABLE <name> ADD COLUMNS (col_spec[, col_spec ...])

Delete a column

ALTER TABLE <name> DROP [COLUMN] <column_name>

Change the column name / type of a column

ALTER TABLE <name> CHANGE <column_name> <new_name new_type>

Delete table
```
DROP TABLE [IF EXISTS] table_name;
```

Query statement select

SELECT [ALL | DISTINCT] <select_expr>, <select_expr>, ... 
FROM <table_name>
[WHERE <condition>] 
[GROUP BY col_list] 
[HAVING having_condition] 
[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]] 
[LIMIT number];

Query and sort

SELECT [ALL | DISTINCT] <select_expr1>, <select_expr2>, ... 
FROM <table_name> 
[WHERE where_condition] 
[GROUP BY col_list] 
[HAVING having_condition] 
ORDER BY <column_list>
[LIMIT number];

From the above syntax, we can see that the syntax of hive is very similar to that of mysql

reference resources

1. Yibai tutorial hive

Topics: Java Hadoop hive

Programmer Think