Alibaba cloud Tianchi Longzhu program SQL training camp - Task02 learning notes

Posted by ddubs on Thu, 03 Mar 2022 12:17:45 +0100

This note is the learning content of Alibaba cloud Tianchi Longzhu program SQL training camp. The link is: https://tianchi.aliyun.com/specials/promotion/aicampsql

Basis of select statement

  • SELECT statement

The basic SELECT statement contains two clause s, SELECT and FROM. Examples are as follows:

SELECT <Listing>,
  FROM <Table name>;

Among them, the SELECT clause lists the names of the columns you want to query FROM the table, while the FROM clause specifies the name of the table FROM which the data is selected.

  • WHERE statement

The SELECT statement specifies the conditions for querying data through the WHERE clause. In the WHERE clause, you can specify conditions such as "the value of a column is equal to this string" or "the value of a column is greater than this number". Execute the SELECT statement containing these conditions to query the records that only meet the conditions.

SELECT <Listing>, ......
  FROM <Table name>
 WHERE <Conditional expression>;
  • rule

An asterisk (*) means all columns.

Line breaks can be used in SQL at will, which does not affect the execution of statements (but empty lines cannot be inserted).

When setting Chinese aliases, you need to use double quotation marks (").

Use DISTINCT in a SELECT statement to remove duplicate rows.

Comments are the parts of SQL statements that are used to identify descriptions or precautions. There are two kinds of "/ * * /" including one line comment "--" and multi line comment.

-- When you want to query all columns, you can use asterisks representing all columns(*). 
SELECT *
  FROM <Table name>;
-- SQL Statement can be used AS Keyword sets the alias for the column (double quotation marks ("") are required when using Chinese).
SELECT product_id     AS id,
       product_name   AS name,
       purchase_price AS "Purchase unit price"
  FROM product;
-- use DISTINCT delete product_type Duplicate data in column
SELECT DISTINCT product_type
  FROM product;

 

Arithmetic and comparison operators

  • Arithmetic operator (addition, subtraction, multiplication and division)
  • Comparison operator (greater than, less than, equal to.)
  • Common rules

Constants or expressions can be used in the SELECT clause.

When using comparison operators, be sure to pay attention to the position of unequal sign and equal sign.

In principle, the data of string type shall be sorted according to the dictionary order and shall not be confused with the size order of numbers.

When you want to select NULL records, you need to use the IS NULL operator in the conditional expression. When you want to select a record that IS NOT NULL, you need to use the IS NOT NULL operator in the conditional expression.

-- SQL Operational expressions can also be used in statements
SELECT product_name, sale_price, sale_price * 2 AS "sale_price x2"
  FROM product;
-- WHERE Clause can also be used in conditional expressions
SELECT product_name, sale_price, purchase_price
  FROM product
 WHERE sale_price-purchase_price >= 500;
/* Use unequal sign on string
 First create chars and insert data
 SELECT a SELECT statement greater than '2'*/
-- DDL: Create table
CREATE TABLE chars
(chr CHAR(3)NOT NULL,
PRIMARY KEY(chr));
-- Select greater than'2'Of data SELECT sentence('2'As string)
SELECT chr
  FROM chars
 WHERE chr > '2';
-- selection NULL Record of
SELECT product_name,purchase_price
  FROM product
 WHERE purchase_price IS NULL;
-- Select not as NULL Record of
SELECT product_name,purchase_price
  FROM product
 WHERE purchase_price IS NOT NULL;
  • Logical operator

NOT operator

When you want to express "NOT...", in addition to the < > operator mentioned above, there is another operator that represents negation and is more widely used: NOT.

NOT cannot be used alone, for example:

-- Select a record with a sales unit price greater than or equal to 1000 yen
SELECT product_name, product_type, sale_price
  FROM product
 WHERE sale_price >= 1000;
 -- To code listing 2-30 Add to query criteria NOT operator
SELECT product_name, product_type, sale_price
  FROM product
 WHERE NOT sale_price >= 1000;

AND operator AND OR operator

When you want to use multiple query criteria at the same time, you can use the AND OR operator.

AND is equivalent to "AND", similar to the intersection in mathematics;

OR is equivalent to "OR", similar to the union set in mathematics.

Priority through parentheses

What should I do if I want to find such a product?

"The commodity type is office supplies" and "the registration date is September 11, 2009 or September 20, 2009"
The ideal result is "punch", but when you enter the following information, you will get the wrong result

-- If you write the query condition into the condition expression intact, you will get an error result
SELECT product_name, product_type, regist_date
  FROM product
 WHERE product_type = 'Office Supplies'
   AND regist_date = '2009-09-11'
    OR regist_date = '2009-09-20';

The reason for the error is that the AND operator takes precedence over the OR operator. If you want to perform the OR operation first, you can use parentheses:

-- By using parentheses OR Operator before AND Operator execution
SELECT product_name, product_type, regist_date
  FROM product
 WHERE product_type = 'Office Supplies'
   AND ( regist_date = '2009-09-11'
        OR regist_date = '2009-09-20');

Truth table

A TRUE value is a value that is one of TRUE or FALSE.

AND operator * *: * * returns true when the true values on both sides are true, AND false otherwise.

OR operator * *: * * returns true if one of the truth values on both sides is not false, and returns false only if the truth values on both sides are false.

NOT operator * *: * * simply converts true to false and false to true.

True value with NULL

The true value of NULL results in neither true nor false because such a value is not known.

How to express it?

At this time, the true value is the third value besides true and false - UNKNOWN. This third value does not exist in general logical operations. Languages other than SQL basically only use true and false values. In contrast to the usual logical operation called binary logic, only the logical operation in SQL is called ternary logic.

The truth table of AND and OR under ternary logic is:

Aggregate query on table

Aggregate function

The function used for summary in SQL is called aggregate function. The following five are the most commonly used aggregate functions:

  • COUNT: calculate the number of records (rows) in the table
  • SUM: calculate the total value of the data in the value column in the table
  • AVG: calculate the average value of the data in the value column in the table
  • MAX: find the maximum value of data in any column in the table
  • MIN: find the minimum value of data in any column in the table

Please follow the data in Chapter 1 and use the following functions:

-- Calculate the number of rows of all data (including NULL)
SELECT COUNT(*)
  FROM product;
-- calculation NULL Rows of data other than
SELECT COUNT(purchase_price)
  FROM product;
-- Calculate the total value of sales unit price and purchase unit price
SELECT SUM(sale_price), SUM(purchase_price) 
  FROM product;
-- Calculate the average value of sales unit price and purchase unit price
SELECT AVG(sale_price), AVG(purchase_price)
  FROM product;
-- MAX and MIN It can also be used for non numeric data
SELECT MAX(regist_date), MIN(regist_date)
  FROM product;

Use aggregate functions to remove duplicate values

-- Calculate the number of data rows after removing duplicate data
SELECT COUNT(DISTINCT product_type)
 FROM product;
 
-- Whether to use DISTINCT Movement difference during( SUM Function)
SELECT SUM(sale_price), SUM(DISTINCT sale_price)
 FROM product;

Common rules

  • The result of the COUNT function varies according to the parameters. COUNT(*) will get the number of data rows containing NULL, while COUNT(< column name >) will get the number of data rows other than NULL.
  • The aggregate function excludes NULL. The exception to COUNT(*) does not exclude NULL.
  • The MAX/MIN function is applicable to columns of almost all data types. The SUM/AVG function is only applicable to columns of numeric type.
  • When you want to calculate the type of value, you can use DISTINCT in the parameter of the COUNT function.
  • Use DISTINCT in the parameters of the aggregate function to remove duplicate data.

Group tables

5.1 GROUP BY statement

Previously, the aggregation function was used to process the data of the whole table. When you want to group and summarize (that is, summarize and count the existing data according to a column), GROUP BY can help you:

SELECT <Column name 1>,<Column name 2>, <Listing 3>, ......
  FROM <Table name>
 GROUP BY <Column name 1>, <Column name 2>, <Listing 3>, ......;

See the difference between whether to use the GROUP BY statement:

-- Statistical data lines by commodity type
SELECT product_type, COUNT(*)
  FROM product
 GROUP BY product_type;
-- Excluding GROUP BY,Will report an error
SELECT product_type, COUNT(*)
  FROM product;

5.2 when the aggregate key contains NULL

The columns specified in the GROUP BY clause are called aggregate keys or group columns.

Take the purchase_price as the aggregation key, for example:

SELECT purchase_price, COUNT(*)
  FROM product
 GROUP BY purchase_price;

+----------------+----------+
| purchase_price | COUNT(*) |
+----------------+----------+
|            500 |        1 |
|            320 |        1 |
|           2800 |        2 |
|           5000 |        1 |
|           NULL |        2 |
|            790 |        1 |
+----------------+----------+

NULL is treated as a special set of data

5.3 GROUP BY writing position

There are strict requirements for the clause writing order of GROUP BY. Failure to comply with the requirements will lead to the failure of normal SQL execution. The current clause writing * * * * order is:

1**.**SELECT → 2. FROM → 3. WHERE → 4. GROUP BY

The first three items are used to filter data, and GROUP BY processes the filtered data

5.4 use GROUP BY in WHERE clause

SELECT purchase_price, COUNT(*)
  FROM product
 WHERE product_type = 'clothes'
 GROUP BY purchase_price;

+----------------+----------+
| purchase_price | COUNT(*) |
+----------------+----------+
|            500 |        1 |
|           2800 |        1 |
+----------------+----------+

5.5 common errors

When using aggregate functions and GROUP BY clauses, common errors are:

  1. Columns other than the aggregate key are written in the SELECT clause of the aggregate function (when using aggregate functions such as COUNT, if the column name appears in the SELECT clause, it can only be the column name specified in the GROUP BY Clause (that is, the aggregate key)
  2. The alias in the by group clause can be used to specify the alias in the by group clause, but it cannot be used in the by group clause. Because in DBMS, the SELECT clause is executed after the GROUP BY clause.
  3. The reason for using aggregate function in WHERE is that the premise of using aggregate function is that the result set has been determined, and WHERE is still in the process of determining the result set, so contradictions will lead to errors. If you want to use the aggregate (order) function in the SELECT clause, you can use the aggregate (order) function below.

Specify criteria for aggregate results

6.1 get specific groups with HAVING

After grouping the table with GROUP BY, how can I get only two groups?

WHERE is not feasible here, because the WHERE clause can only specify the conditions of records (rows), not the conditions of groups (for example, "the number of data rows is 2 rows" or "the average value is 500").

You can use the HAVING clause after GROUP BY.

HAVING is similar to WHERE

6.2 HAVING features

The HAVING clause is used to filter groups, using numbers, aggregate functions, and column names (aggregate keys) specified in GROUP BY.

-- number
SELECT product_type, COUNT(*)
  FROM product
 GROUP BY product_type
HAVING COUNT(*) = 2;

+--------------+----------+
| product_type | COUNT(*) |
+--------------+----------+
| clothes         |        2 |
| Office Supplies      |        2 |
+--------------+----------+

-- Wrong form (because) product_name Not included in GROUP BY Aggregate key (in)
SELECT product_type, COUNT(*)
  FROM product
 GROUP BY product_type
HAVING product_name = 'ball pen';

ERROR 1054 (42S22): Unknown column 'product_name' in 'having clause'

Sort query results

7.1 ORDER BY

The execution results in SQL are arranged randomly. When it is necessary to sort in a specific order, the ORDER BY clause can be used.

SELECT <Column name 1>, <Column name 2>, <Listing 3>, ......
  FROM <Table name>
 ORDER BY <Sort base column 1>, <Sort base column 2>, ......

The default is ascending order and DESC is descending order

-- Descending order
SELECT product_id, product_name, sale_price, purchase_price
  FROM product
 ORDER BY sale_price DESC;
+------------+--------------+------------+----------------+
| product_id | product_name | sale_price | purchase_price |
+------------+--------------+------------+----------------+
| 0005       | pressure cooker       |       6800 |           5000 |
| 0003       | motion T Shirt      |       4000 |           2800 |
| 0004       | kitchen knife         |       3000 |           2800 |
| 0001       | T shirt        |       1000 |            500 |
| 0007       | Dish cleaning board       |        880 |            790 |
| 0002       | Punch       |        500 |            320 |
| 0006       | Fork         |        500 |           NULL |
| 0008       | ball pen       |        100 |           NULL |
+------------+--------------+------------+----------------+

-- Multiple sort keys
SELECT product_id, product_name, sale_price, purchase_price
  FROM product
 ORDER BY sale_price, product_id;
+------------+--------------+------------+----------------+
| product_id | product_name | sale_price | purchase_price |
+------------+--------------+------------+----------------+
| 0008       | ball pen       |        100 |           NULL |
| 0002       | Punch       |        500 |            320 |
| 0006       | Fork         |        500 |           NULL |
| 0007       | Dish cleaning board       |        880 |            790 |
| 0001       | T shirt        |       1000 |            500 |
| 0004       | kitchen knife         |       3000 |           2800 |
| 0003       | motion T Shirt      |       4000 |           2800 |
| 0005       | pressure cooker       |       6800 |           5000 |
+------------+--------------+------------+----------------+

-- When the column name used for sorting contains NULL When, NULL It will be summarized at the beginning or end.
SELECT product_id, product_name, sale_price, purchase_price
  FROM product
 ORDER BY purchase_price;
+------------+--------------+------------+----------------+
| product_id | product_name | sale_price | purchase_price |
+------------+--------------+------------+----------------+
| 0006       | Fork         |        500 |           NULL |
| 0008       | ball pen       |        100 |           NULL |
| 0002       | Punch       |        500 |            320 |
| 0001       | T shirt        |       1000 |            500 |
| 0007       | Dish cleaning board       |        880 |            790 |
| 0003       | motion T Shirt      |       4000 |           2800 |
| 0004       | kitchen knife         |       3000 |           2800 |
| 0005       | pressure cooker       |       6800 |           5000 |
+------------+--------------+------------+----------------+

7.2 alias can be used for column names in order by

As mentioned earlier in GROUP BY, the alias defined in the SELECT clause cannot be used in the GROUP BY clause, but the alias can be used in the ORDER BY clause. Why not in GROUP BY but in ORDER BY?

This is because when SQL uses the HAVING clause, the execution order of the SELECT statement is * * * *:

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

The execution order of SELECT is after the GROUP BY clause and before the ORDER BY clause. In other words, when using an alias in ORDER BY, you already know that the alias set in SELECT exists, but you don't know that the alias exists when using an alias in GROUP BY, so you can use an alias in ORDER BY, but you can't use an alias * * * * in GROUP BY.

 

 

Exercises - Part 1

Exercise 1

Write an SQL statement and select the goods with "registration date (register after April 28, 2009)" from the product table. The query result should include two columns: product_name and register_date.

Answer:

select product_name , regist_date from product where regist_date > '2009-04-28';

Exercise 2

Please state the returned results of the following three SELECT statements on the product table.

SELECT *
  FROM product
 WHERE purchase_price = NULL;

Answer: empty

SELECT *
  FROM product
 WHERE purchase_price <> NULL;

Answer: empty

SELECT *
  FROM product
 WHERE product_name > NULL;

Answer: empty

Exercise 3

The SELECT statement in code listing 2-22 (section 2-2) can extract the goods whose "sales price is more than 500 yen higher than purchase price" from the product table. Please write two SELECT statements that can get the same result. The execution results are as follows.

product_name | sale_price | purchase_price 
-------------+------------+------------
T shirt         |   1000    | 500
 motion T Shirt       |    4000    | 2800
 pressure cooker        |    6800    | 5000

Answer:

select product_name,sale_price,purchase_price from product where sale_price-purchase_price>=500;
select product_name,sale_price,purchase_price from product where sale_price-500>=purchase_price;
select product_name,sale_price,purchase_price from product where sale_price>=500+purchase_price;

Exercise 4

Please write a SELECT statement and SELECT the records that meet the conditions of "office supplies and kitchen appliances with a profit of more than 100 yen after 10% discount of the sales unit price" from the product table. Query results should include product_name column, product_ The type column and the profit after 10% discount of the sales unit price (alias is set to profit).

Tip: a 10% discount on the sales unit price can be obtained by multiplying the value of the saleprice column by 0.9, and the profit can be obtained by subtracting purchase from this value_ The value of the price column is obtained.

Answer:

select product_name,product_type,sale_price*0.9-purchase_price as profit from product where sale_price*0.9-purchase_price >=100;

Exercises - Part 2

Exercise 5

Please point out all syntax errors in the following SELECT statement.

SELECT product_id, SUM(product_name)
--book SELECT There is an error in the statement.
  FROM product 
 GROUP BY product_type 
 WHERE regist_date > '2009-09-01';

Answer: 1 Column names other than aggregate keys appear in the select statement; 2. The where statement should precede group by; 3.product_name cannot aggregate with sum

Exercise 6

Please write a SELECT statement to find out the commodity category whose total value of sales unit price (sales_price column) is 1.5 times greater than the total value of purchase unit price (purchase_price column). The execution results are as follows.

product_type | sum  | sum 
-------------+------+------
clothes         | 5000 | 3300
 Office Supplies      |  600 | 320

Answer:

SELECT product_type, SUM(sale_price), SUM(purchase_price)
 FROM product
 GROUP BY product_type
HAVING SUM(sale_price) > SUM(purchase_price) * 1.5;

+--------------+-----------------+---------------------+
| product_type | SUM(sale_price) | SUM(purchase_price) |
+--------------+-----------------+---------------------+
| clothes         |            5000 |                3300 |
| Office Supplies     |             600 |                 320 |
+--------------+-----------------+---------------------+

Exercise 7

Previously, we used the SELECT statement to SELECT all records in the product table. At that time, we used the order by clause to specify the order, but now we can't remember how to specify it. Think about the contents of the order by clause based on the following execution results.

select * from product
ORDER BY regist_date DESC,sale_price;

+------------+--------------+--------------+------------+----------------+-------------+
| product_id | product_name | product_type | sale_price | purchase_price | regist_date |
+------------+--------------+--------------+------------+----------------+-------------+
| 0008       | ball pen       | Office Supplies     |        100 |           NULL | 2009-11-11  |
| 0006       | Fork         | kitchenware     |        500 |           NULL | 2009-09-20  |
| 0001       | T shirt        | clothes         |       1000 |            500 | 2009-09-20  |
| 0004       | kitchen knife         | kitchenware     |       3000 |           2800 | 2009-09-20  |
| 0002       | Punch       | Office Supplies     |        500 |            320 | 2009-09-11  |
| 0005       | pressure cooker       | kitchenware     |       6800 |           5000 | 2009-01-15  |
| 0007       | Dish cleaning board       | kitchenware     |        880 |            790 | 2008-04-28  |
| 0003       | motion T Shirt      | clothes         |       4000 |           2800 | NULL        |
+------------+--------------+--------------+------------+----------------+-------------+