HIVE installation and SQL tutorial

Posted by Teen0724 on Mon, 07 Mar 2022 23:39:51 +0100

Install brew

The following script is used for domestic brew installation:

/bin/zsh -c "$(curl -fsSL https://gitee.com/cunkai/HomebrewCN/raw/master/Homebrew.sh)"

Install hive

brew install hive

Hide execute sql statement

hive can directly convert sql statements into mapreduce code (the link below is very good)

https://geek-docs.com/hive/hive-tutorial/introduction-of-hive.html

Because the company has a configured HIVE system, it is not necessary to install it on this machine.

So go straight to the SQL tutorial

SQL tutorial

SQL is case insensitive

Query and update instructions form the DML part of SQL:

SELECT - get data from database table
UPDATE - updates the data in the database table
DELETE - deletes data from a database table
INSERT INTO - inserts data into a database table

The most important DDL statements in SQL:

CREATE DATABASE - create a new database
ALTER DATABASE - modify database
CREATE TABLE - create a new table
ALTER TABLE - change (Alter) database tables
DROP TABLE - delete table
CREATE INDEX - CREATE INDEX (search key)
DROP INDEX - DROP INDEX

SELECT

SELECT LastName,FirstName FROM Persons
#Select all columns
SELECT * FROM Persons
#Returns a unique value
SELECT DISTINCT Column name FROM Table name

Where

SELECT * FROM Persons WHERE City='Beijing'
#About the value of a value
 This is true:
SELECT * FROM Persons WHERE FirstName='Bush'
SELECT * FROM Persons WHERE Year>1965

AND and OR

#Use parentheses to use and and or together
SELECT * FROM Persons WHERE (FirstName='Thomas' OR FirstName='William')
AND LastName='Carter'

Order by

SELECT Company, OrderNumber FROM Orders ORDER BY Company DESC, OrderNumber ASC

Insert

INSERT INTO Persons (LastName, Address) VALUES ('Wilson', 'Champs-Elysees')

Update

# UPDATE table name SET column name = new value WHERE column name = a value
UPDATE Person SET Address = 'Zhongshan 23', City = 'Nanjing'
WHERE LastName = 'Wilson'

TOP

#Select the first few or the first 50%
SELECT TOP 2 * FROM Persons
SELECT TOP 50 PERCENT * FROM Persons

Like

Specify pattern

Now, we want to start from the above "Persons" Select the person who lives in "N" People in the beginning City

SELECT * FROM Persons
WHERE City LIKE 'N%'

Next, we want to start from "Persons" Select the person who lives in "g" People in the city at the end:

SELECT * FROM Persons
WHERE City LIKE '%g

IN

Specify the selected value range
SELECT * FROM Persons
WHERE LastName IN ('Adams','Carter')

BETWEEN

Specify data range
SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2
 To display in alphabetical order between "Adams"((including) and "Carter"((not included), please use the following SQL: 
SELECT * FROM Persons
WHERE LastName
BETWEEN 'Adams' AND 'Carter'

ALIAS

Table alias

Suppose we have two tables:"Persons" and "Product_Orders". We assign aliases to each of them "p" and "po". 
Now, we want to list "John Adams" All orders.

SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p, Product_Orders AS po
WHERE p.LastName='Adams' AND p.FirstName='John'

Aliases are used for easy reading and to simplify sql statements

JOIN

where and join

**where**
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons, Orders
WHERE Persons.Id_P = Orders.Id_P 

**join**
SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
INNER JOIN Orders
ON Persons.Id_P = Orders.Id_P
ORDER BY Persons.LastName

inner join: understood as "effective join", the data in both tables will be displayed
left join: understood as "left display". For example, if on a.field=b.field, all data in table a and data in both a and B will be displayed. Data in a and not in B will be displayed as null
right join: understood as "right display". For example, if on a.field=b.field, all data in table B and data in a and B will be displayed, and data in B and not in a will be displayed as null
Full join: understood as "full join". All data in the two tables are displayed. In fact, it is inner + (left inner) + (right inner)

Join is equivalent to inner join is equivalent to where

Full join

Now, we want to list all the people, their orders, all the orders, and the people who ordered them.

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
FULL JOIN Orders
ON Persons.Id_P=Orders.Id_P
ORDER BY Persons.LastName

The FULL JOIN keyword returns all rows from the left table (Persons) and the right table (Orders). If the rows in "Persons" do not match in the table "Orders", or if the rows in "Orders" do not match in the table "Persons", these rows are also listed.

UNION

The UNION operator is used to combine the result sets of two or more SELECT statements.

# Only two tables with different values are returned
SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2
# Return all values of two tables
SELECT column_name(s) FROM table_name1
UNION ALL
SELECT column_name(s) FROM table_name2

Tables and databases

database: database

CREATE DATABASE database_name

Table: table

This example demonstrates how to create a file named "Person" Table of.
The table contains five columns with the following names:"Id_P","LastName","FirstName","Address" as well as "City": 

CREATE TABLE Persons
(
Id_P int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)

NULL

If a column in the table is optional, we can insert new records or update existing records without adding values to the column. This means that the field will be saved with a NULL value.
NULL values are handled differently from other values.
NULL is used as a placeholder for unknown or inapplicable values.

# How can we just select records with NULL values in the "Address" column?
SELECT LastName,FirstName,Address FROM Persons
WHERE Address IS NULL

# How do we select records without NULL value in the "Address" column?
SELECT LastName,FirstName,Address FROM Persons
WHERE Address IS NOT NULL

SQL function

SELECT function(column) FROM surface

The FIRST() function returns the value of the first record in the specified field.
Tip: you can use the ORDER BY statement to sort records.

SELECT FIRST(OrderPrice) AS FirstOrderPrice FROM Orders

And Max, min, sum

Group by

The last thing I understand is that there is no more than one row of data returned by group

Having

Adding HAVING clause in SQL because WHERE keyword cannot be used with aggregate function.

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value

Now we want to find that the customer "Bush" or "Adams" has a total order amount of more than 1500.
We added a common WHERE clause to the SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders
WHERE Customer='Bush' OR Customer='Adams'
GROUP BY Customer
HAVING SUM(OrderPrice)>1500

Topics: hive SQL