2021-5-5 101701Hive command

Posted by mohabitar on Fri, 18 Feb 2022 15:01:55 +0100

The first chapter uses Hive to start Hadoop cluster

Start HDFS HA

(check whether the 50070 web page port is started successfully)

  1. Each machine starts the zookeeper service
zkServer.sh start
  1. Start HDFS service
sbin/start-dfs.sh

Start YARN HA

(check whether the 8088 web page port is started successfully)

  1. Start YARN
 sbin/start-yarn.sh
  1. Start the resource manager on another HA node
sbin/yarn-daemon.sh start resourcemanager

Chapter 2 JDBC access using hive

Start the hiveserver2 service

bin/hiveserver2

Start beeline

bin/beeline

Connect to the hiveserver2 service

beeline> !connect jdbc:hive2://server1:10000
Connecting to jdbc:hive2://server1:10000
Enter username for jdbc:hive2://Server1:10000: server (enter)
Enter password for jdbc:hive2://server1:10000: (enter)
Connected to: Apache Hive (version 2.3.8)
Driver: Hive JDBC (version 2.3.8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://server1:10000>
0: jdbc:hive2://server1:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+

Chapter 3 local access (bin/hive)

1 - help

bin/hive -help
usage: hive
 -d,--define <key=value>          Variable substitution to apply to Hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         Execute from the command line SQL sentence
 -f <filename>                    Execute from file SQL sentence
 -H,--help                        print the help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable substitution to apply to Hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)

2 - execute SQL statements from the command line

bin/hive -e 'show databases;'
database_name
default

3 - view hdfs file system

hive (default)> dfs -ls /;

4 - view local file system

! ls /opt/module/;

5 - view hive's historical commands

[server@server1 hive]$ cat /home/server/.hivehistory

6 - view all configuration information

hive (default)> set;

Chapter IV database operation

1 - create database syntax

CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION HDFS route]
[WITH DBPROPERTIES (property_name=property_value, ...)];

2 - create the database and specify the path of the database on HDFS

hive (default)> create database db_hive2 location '/db_hive2.db';

3 - display database information

desc database default;

4 - display database details

desc database extended db_hive;

5 - switch database

hive (default)> use db_hive;

6 - delete database

hive>drop database db_hive;

You'd better judge whether the database exists

hive>drop database if exists db_hive;

If the database is not empty, you can use the cascade command to forcibly delete it

drop database db_hive cascade;

Chapter V table operation

1 - create table syntax

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)] 
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] 
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
[ROW FORMAT row_format]
[STORED AS file_format] 
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]

2 - create a normal table

create table if not exists student( id int, name string);

3 - create external tables (data in HDFS)

When deleting a table, the metadata and data of the internal table will be deleted together, while the external table only deletes the metadata and does not delete the data.

create external table stu_external( id int,name string)
row format delimited fields terminated by '\t' location '/student';

4 - create a table based on query results

create table if not exists student3 as select id, name from student;

5 - create a table based on the existing table structure

create table if not exists student4 like student;

6 - query table type

desc formatted student;

Prompt the following information

OK
col_name data_type comment

col_name data_type comment

id int
name string

Detailed Table Information

Database: db_hive
Owner: server
CreateTime: Wed May 05 11:27:47 CST 2021
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://mycluster/user/hive/warehouse/db_hive.db/student
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {"BASIC_STATS":"true"}
numFiles 0
numRows 0
rawDataSize 0
totalSize 0
transient_lastDdlTime 1620185267

Storage Information

SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.129 seconds, Fetched: 31 row(s)

7 - view table formatted data

desc formatted dept;

8 - delete external table

drop table stu_external;

9 - conversion of internal and external tables

alter table student2 set tblproperties('EXTERNAL'='TRUE');
alter table student2 set tblproperties('EXTERNAL'='FALSE');

Chapter VI operation of HDFS

1 - upload data to HDFS

dfs -mkdir /student;
dfs -put /opt/module/datas/student.txt /student;

Chapter VII zoning table

1 - Import partition table (manage according to different types)

/user/hive/warehouse/log_partition/20170702/20170702.log
/user/hive/warehouse/log_partition/20170703/20170703.log
/user/hive/warehouse/log_partition/20170704/20170704.log

2 - create partition table

hive (default)> create table dept_partition( deptno int, dname string, loc string)
partitioned by (month string)

3 - load data into partition table

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709');
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201708');
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table
default.dept_partition partition(month='201707');

Note: the partition must be specified when the partition table loads data

4 - query partition table data

select * from dept_partition where month='201709';

5 - Multi partition joint query

select * from dept_partition where month='201709' union
select * from dept_partition where month='201708' union
select * from dept_partition where month='201707';

6 - add partition

Add a single partition

alter table dept_partition add partition(month='201706') ;

Add multiple partitions

alter table dept_partition add partition(month='201705') partition(month='201704');

7 - delete partition

Delete a single partition

alter table dept_partition drop partition (month='201704');

Delete multiple partitions

alter table dept_partition drop partition (month='201705'), partition (month='201706');

8 - view partition

show partitions dept_partition;

9 - view partition table structure

desc formatted dept_partition;

10 - create secondary partition table

create table dept_partition2( deptno int, dname string, loc string)
partitioned by (month string, day string)

11 - load data into partition table

load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition2 partition(month='201709', day='13');

12 - query secondary partition table data

select * from dept_partition2 where month='201709' and day='13';

13 - associate HDFS data with partition table

  1. Use repair command after uploading
hive (default)> dfs -mkdir -p
/user/hive/warehouse/dept_partition2/month=201709/day=12;
hive	(default)>	dfs	-put	/opt/module/datas/dept.txt
/user/hive/warehouse/dept_partition2/month=201709/day=12;
msck repair table dept_partition2;
  1. Add partition after uploading
hive (default)> dfs -mkdir -p
/user/hive/warehouse/dept_partition2/month=201709/day=11;
hive	(default)>	dfs	-put	/opt/module/datas/dept.txt
/user/hive/warehouse/dept_partition2/month=201709/day=11;
hive (default)> alter table dept_partition2 add partition(month='201709', day='11');
  1. load data to partition after creating folder
hive (default)> dfs -mkdir -p
/user/hive/warehouse/dept_partition2/month=201709/day=10;
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10');

Chapter VIII addition, deletion and modification

1-grammar

Update column

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]

Add and replace columns

ALTER	TABLE	table_name	ADD|REPLACE	COLUMNS	(col_name	data_type	[COMMENT col_comment], ...)

2 - add column

alter table dept_partition add columns(deptdesc string);

3 - update column

alter table dept_partition change column deptdesc desc int;

4 - replace column

alter table dept_partition replace columns(deptno string, dname string, loc string);

Chapter IX loading data

1-grammar

hive> load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,...)];

Chapter 10 insert data

1 - add insert

hive(default)> insert into table student partition (month=' 201709' )
values(1, 'wangwu') , (2,' zhaoliu') ;

2 - overwrite insert (will overwrite existing data)

hive (default)> insert overwrite table student partition(month='201708') select id, name from student where month='201709';

3 - multi table and multi partition insertion

hive (default)> from student
insert overwrite table student partition(month='201707') select id, name where month='201709'
insert overwrite table student partition(month='201706')
select id, name where month='201709';

Chapter 11 data export and import

1-import import

Note: export first, and then import the data

hive (default)> import table student2 partition(month='201709') from '/user/hive/warehouse/export/student';

2 - export query results to local

hive (default) >insert overwrite  local directory '/opt /module/ datas/ export/student'
select * from student ;

3 - format and export the query results to local

hive (default) > insert overwrite local directory
'/opt/module/datas/export/student1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'	select * from
student;

4 - export the query results to HDFS (no local)

hive (default)> insert overwrite directory '/user/atguigu/student2' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from student;

Export 5-Hadoop command to local

hive (default)> dfs -get /user/hive/warehouse/student/month=201709/000000_0
/opt/module/datas/export/student3.txt;

6-Hive Shell command export

bin/hive -e 'select * from default.student;' >
/opt/module/datas/export/student4.txt;

7-export export

hive (default)> export table default.student to '/user/hive/warehouse/export/student';

8-Sqoop export

9 - clear table data

Note: Truncate can only delete management tables, not data in external tables

hive (default)> truncate table student;

Chapter XII inquiry

1-grammar

be careful:

(1) The SQL language is case insensitive.

(2) SQL can be written on one or more lines

(3) Keywords cannot be abbreviated or broken

(4) Each clause should be written separately.

(5) Use indentation to improve the readability of statements.

[WITH CommonTableExpression (, CommonTableExpression)*]	(Note: Only available starting with Hive 0.13.0)
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference [WHERE where_condition] [GROUP BY col_list] [ORDER BY col_list] [CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]

2-arithmetic operator

operatordescribe
A+BAdd A and B
A-BA minus B
A*BMultiply A and B
A/BA divided by B
A%BSurplus of A to B
A&BA and B are bitwise AND
A|BA and B take or by bit
A^BBitwise exclusive or of A and B
~AA reverse by bit

3 - common functions

  1. Find the total number of rows (count)

    hive (default)> select count(*) cnt from emp;

  2. Maximum wage (max)

    hive (default)> select max(sal) max_sal from emp;

  3. Minimum wage (min)

    hive (default)> select min(sal) min_sal from emp;

  4. sum of wages

    hive (default)> select sum(sal) sum_sal from emp;

  5. Average of wages (avg)

    hive (default)> select avg(sal) avg_sal from emp;

4-limit

A typical query returns multiple rows of data. The LIMIT clause is used to LIMIT the number of rows returned.

hive (default)> select * from emp limit 5;

5-Where statement

Note: use the where clause to filter out rows that do not meet the conditions. The where clause follows the FROM clause. Field aliases cannot be used in the where clause.
Case practice
Find out all employees whose salary is greater than 1000

hive (default)> select * from emp where sal >1000;

6-comparison operator (Between/In/ Is Null)

OperatorSupported data typesdescribe
A=BBasic data typeIf A is equal to B, it returns TRUE, otherwise it returns FALSE
A<=>BBasic data typeIf both A and B are NULL, it returns TRUE. The results of other operators are consistent with those of the equal sign (=) operator. If either is NULL, the result is NULL
A<>B, A!=BBasic data typeIf A or B is NULL, NULL is returned; If A is not equal to B, it returns TRUE, otherwise it returns FALSE
A<BBasic data typeIf A or B is NULL, NULL is returned; If A is less than B, it returns TRUE, otherwise it returns FALSE
A<=BBasic data typeIf A or B is NULL, NULL is returned; If A is less than or equal to B, it returns TRUE, otherwise it returns FALSE
A>BBasic data typeIf A or B is NULL, NULL is returned; If A is greater than B, it returns TRUE, otherwise it returns FALSE
A>=BBasic data typeIf A or B is NULL, NULL is returned; If A is greater than or equal to B, it returns TRUE, otherwise it returns FALSE
A [NOT] BETWEEN B AND CBasic data typeIf any of A, B or C is NULL, the result is NULL. If the value of A is greater than or equal to FALSE, and vice versa. If you use the NOT keyword, the opposite effect can be achieved.
A IS NULLAll data typesIf A is equal to NULL, it returns TRUE, otherwise it returns FALSE
A IS NOT NULLAll data typesIf A is not equal to NULL, it returns TRUE, otherwise it returns FALSE
In (value 1, value 2)All data typesUse the IN operation to display the values IN the list
A [NOT] LIKE BSTRING typeB is A simple regular expression under SQL, also known as wildcard pattern. If A matches it, it returns TRUE; Otherwise, FALSE is returned. The expression of B is described as follows: 'x%' means that A must start with the letter 'x', '% X' means that A must end with the letter 'x', and '% X%' means that A contains the letter 'x', which can be located at the beginning, end or in the middle of the string. If you use the NOT keyword, the opposite effect can be achieved.
A RLIKE B, A REGEXP BSTRING typeB is A regular expression based on java. If A matches it, it returns TRUE; Otherwise, FALSE is returned. The matching is implemented by the regular expression interface in JDK, because the regular expression is also based on the rules. For example, A regular expression must match the entire string A, not just its string.

Case:

(1) Find out all employees whose salary is equal to 5000

hive (default)> select * from emp where sal =5000;	

(2) Query employee information with salary between 500 and 1000

hive (default)> select * from emp where sal between 500 and 1000;

(3) Query all employee information with comm blank

hive (default)> select * from emp where comm is null;	

(4) Query employee information with salary of 1500 or 5000

hive (default)> select * from emp where sal IN (1500, 5000);

7-Like and RLike

1) Use the LIKE operation to select similar values

2) The selection criteria can contain characters or numbers:

%Represents zero or more characters (any character).

_ A character represents a character.

RLIKE clause is an extension of this function in Hive, which can specify matching conditions through the more powerful language of Java regular expression.

(1) Find employee information with salary starting with 2

hive (default)> select * from emp where sal LIKE '2%';

(2) Find the employee information for the salary with the second value of 2

hive (default)> select * from emp where sal LIKE '_2%';

(3) Find employee information with 2 in salary

hive (default)> select * from emp where sal RLIKE '[2]';

8-logical operator

Operatormeaning
ANDLogical Union
ORLogical or
NOTLogical no

9 - grouping

9.1 Group By

The GROUP BY statement is usually used with aggregation functions to group one or more queued results, and then aggregate each group.

Case:

Calculate the average salary of each department in the emp table

hive (default)> select t.deptno, avg(t.sal) avg_sal from emp t group by t.deptno;

Calculate emp the maximum salary for each position in each department

hive (default)> select t.deptno, t.job, max(t.sal) max_sal from emp t group by t.deptno, t.job;

9.2 Having

Differences between having and where
(1) Grouping functions cannot be written after where, while grouping functions can be used after having.

(2) having is only used for group by group statistics statements.

Case:

Ask for departments with an average salary of more than 2000 per department

hive (default)> select deptno, avg(sal) from emp group by deptno;

Ask for departments with an average salary of more than 2000 per department

hive (default)> select deptno, avg(sal) avg_sal from emp group by deptno having avg_sal > 2000;

10 join statement

Hive supports common SQL JOIN statements, but only supports equivalent connections, not non equivalent connections.

Case:

Query employee number, employee name and department name according to the same department number in employee table and department table;

hive (default)> select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno;

Topics: MySQL Big Data hive