Basic operation of Hive

Posted by danwatt on Mon, 01 Nov 2021 00:56:00 +0100

3, Query operation

▎ [3.1] which databases in hive are queried

 show databases;

Before creating other databases, hive will provide a default database for us to use default   When operating HiveQL, all operations are performed in the default data without specifying the database. The database name and table name of hive are not case sensitive

▎ [3.2] query database information

desc database bd1814;
bd1814  hdfs://ip:9000/user/hive/warehouse/bd1814.db zxiao USER
 If we want to see, we use with dbproperties Additional attributes added
desc database extended bd1814;
Select database
use Database name;

▌ IV. deletion

▎ [4.1] delete database

 drop database Database name //Only those without tables can be deleted
 drop database Database name cascade; //Cascade deletion can be deleted

▎ [4.2] delete table

 drop table Table name;

▌ v. loading data

▎ [5.1] local loading

 load data local inpath 'Local data path' into/[overwrite into] table Table name
 into The keyword is append overwirter Keyword is override

In essence, the file is loaded into the directory corresponding to the table

▎ [5.2] HDFS loading

 load data inpath 'HDFS route' into table Table name
 into Keyword coverage and addition

▎ [5.3] add data to the table through query statements

 insert into table Inserted table name select * from Save data sheet
 into Is additional
 insert overwrite table Inserted table name select * from Save data sheet
 overwriter Is coverage

▎ [5.4] Hive add data problem

hive uses a strict "read time" mode.

Data integrity is not checked when loading data. If the format of the data is found to be incorrect during reading, null will be used instead.

When creating a table in hive, if you do not specify a division symbol in the data, the division symbol provided by the system is used by default

▎ [5.5] summary:

If the file is loaded locally, it is equivalent to copying the file and uploading it to the table under the database corresponding to / user/hive/warehouse under HDFS to store the file.

If you use HDFS to load a file, it is equivalent to cutting the file into the table under the database corresponding to / user/hive/warehouse under HDFS to store the file.

▎ [5.6] copy table

  • Use the like keyword
 You can copy partitioned tables or ordinary tables
 Copy table without data --> Copy the structure of a table
 create table if not exists t_4 like t_5;
 Copy table with data
 create table if not exists t_6 like t_5 location '/user/hive/warehouse/bd1814.db/t_5';

For this copy, we use the location keyword, so t_6 and will not appear in the directory of HDFS. T here_ 6 this table and t_ This table shares data. All the databases or tables we are creating under hive are directories

  • Copy using select
 Copy by query select * from
 ps:Individuals are highly recommended, and data replication is very flexible
 In this way, metadata and directories will be created just like the root table
 create table if not exists Table to create as select * from A table that provides data and table structure where Add condition;
 ps: where Can not write according to the needs
 create table if not exists t_7 as select * from t_user where uid>1;

▌ VI. modification of table attributes

▎ [6.1] modification indication

 alter table Original table name rename to New table name

▎ [6.2] modify the name or data type of the column

 alter table Table name change column Original field name new field name field data type

▎ [6.3] order of modifying fields

 use after keyword.Place the modified field after a field
 alter table Table name change column Original field name new field name field data type after Just a field
 However, the order of the original data is still the original order, so parsing errors may occur

▎ [6.4] add field

 alter table Table name add columns (Field name data type ....);

▎ [6.5] delete field

 alter table Table name replace columns(Field name,data type....);

▌ I. four SQL statements

▎ [2.1] Data Definition Language DDL data definition language

Keywords such as create, drop and alter define the logical structure of the database, such as database, table and index

▎ [2.2] data query language DQL data query language

Keywords such as SELECT query operation

▎ [2.3] data manipulation language

Keywords such as insert, update and delete operate the data in the database

▎ [2.4] the data control function is DCL data control language. Keywords such as grant, revoke, commit and rollback operate the permissions and security levels of the database

▌ II. Create

▎ [2.1] create database

 create database bd1814 comment "" with dbproperties(key=value,key=value) 
//Both key and value should be in the form of string

Creating a "database" in Hive essentially creates a "folder (directory)" on HDFS

 /usr/hive/warehouse store in hive Databases, tables, data created in

If you use Mysql to store Hive metadata, you need to create a Hive database in MySQL. The relevant information is set in hive-site.xml. There are many tables that store Hive metadata, one of which is DBS, which records the metadata of all the databases we create

▎ [2.2] add attribute information based on the original database

[2.3] create table

 create [external] table if not exists Table name(Column name data type,Column name data type) location "";
 //In general, location can specify where the table exists, and can also obtain the location of the data

The essence of Hive creating a table is to create a file directory under the corresponding warehouse Path of HDFS. The name of the file directory is the table name. The data added to the table is not stored in the database, but in HDFS.

▏ in Hive, there are two types of tables:

  • MANAGED_ Table (internal table)
    The presentation directory will be created under the warehouse (/ user/hive/warehouse) directory of HDFS
  • EXTERNAL_ Table (external table)
    The external table creates a table based on the path specified by the location keyword
    If no location is specified, the table is created under the same path as the internal

▏ Hive does two things when creating tables

  • Create the corresponding table directory on HDFS
  • The metadata information of the corresponding table is stored in the metadata storage database

▏ there is essentially no difference between internal tables and external tables. Once a drop occurs, different features appear

  • During drop, the data directory of the internal table will be deleted, but the data directory of the external table will not be deleted
  • During drop, the metadata of the internal table or external table will be deleted

▏ conversion of internal tables and external tables

 alter table Table name set tblproperties('EXTERNAL'='TRUE/false')
 ps :there TRUE Be sure to capitalize false indifferent
 Inner surface to outer surface
 alter table t_newuser set tblproperties('EXTERNAL'='TRUE')
 Outer surface to inner surface
 alter table t_newuser set tblproperties('EXTERNAL'='false');

▎ [2.4] specify separation when creating a table

 create [external] table if not exists Table name(Column name data type,Column name data type) 
row format delimited fields terminated by '\t'
 alter database Database name set dbproperties('Attribute name'='Attribute value');

Topics: Database hive SQL