MySQL must know and know -- Chapter 13 grouping data

Posted by asanvicente on Sat, 22 Jan 2022 18:16:20 +0100

Grouping data

This chapter describes how to group data so that you can summarize a subset of table content. This involves two new SELECT sentences, GROUP BY and HAVING clauses.

Data grouping

Last chapter( MySQL must know and know - Chapter 12 summary data )We use SQL aggregate functions to aggregate data. This allows us to count, calculate and average rows without retrieving all the data.

The calculations so far have been performed on all data in the table or on data that matches a specific WHERE clause.
For example, return the number of products provided by supplier 1003:

mysql> SELECT COUNT(*) AS num_prods
    -> FROM products
    -> WHERE vend_id = 1003;
+-----------+
| num_prods |
+-----------+
|         7 |
+-----------+
1 row in set (0.01 sec)

But what if we want to return the number of products provided by each supplier, or the products provided by suppliers that only provide single products, or the suppliers that provide more than 10 products?.

Obviously, this simple aggregation function is difficult to do. We can use grouping. Grouping allows data to be divided into multiple logical groups so that aggregation calculations can be performed on each group.

Create group

Grouping is established in the GROUP BY clause of the SELECT statement.

Returns the number of products provided by each supplier:

mysql> SELECT vend_id, COUNT(*) AS num_prods
    -> FROM products
    -> GROUP BY vend_id;
+---------+-----------+
| vend_id | num_prods |
+---------+-----------+
|    1001 |         3 |
|    1002 |         2 |
|    1003 |         7 |
|    1005 |         2 |
+---------+-----------+
4 rows in set (0.00 sec)

This SELECT statement specifies two columns, vend_id contains the ID of the product supplier, num_prods is the calculated field. The GROUP BY clause instructs MySQL to press vent_ ID sorts and groups data. That is, COUNT(*) for each vend_id is calculated once.

The GROUP BY clause instructs MySQL to group data and then aggregate each group instead of the entire result set.

Some important provisions for using the GROUP BY clause:

  • The GROUP BY clause can contain any number of columns. This can nest packets and provide more detailed control over data grouping.
  • If groups are nested in the GROUP BY clause, the data will be summarized on the last specified group (data cannot be retrieved from individual columns).
  • Each column listed in the GROUP BY clause must be a retrieval column or a valid expression (but not an aggregate function). If you use an expression in SELECT, you must specify the same expression in the GROUP BY clause. Alias cannot be used.
  • Except for the aggregate calculation statement, each column in the SELECT statement must appear in the GROUP BY clause.
  • If there is a NULL value in the grouping column, NULL is returned as a grouping.
  • The GROUP BY clause must appear after the WHERE clause and before the ORDER BY clause.

Using the WITH ROLLUP keyword, you can get the value of each group and the summary level of each group, such as

mysql> SELECT vend_id, COUNT(*) AS num_prods
    -> FROM products
    -> GROUP BY vend_id WITH ROLLUP;
+---------+-----------+
| vend_id | num_prods |
+---------+-----------+
|    1001 |         3 |
|    1002 |         2 |
|    1003 |         7 |
|    1005 |         2 |
|    NULL |        14 |
+---------+-----------+
5 rows in set (0.00 sec)

Filter grouping

In addition to grouping data with GROUP BY, MySQL also allows filtering groups, specifying which groups to include and which groups to exclude.

When we talk about filtering, we think of the WHERE clause( MySQL must know and know -- Chapter 6 filtering data ). But WHERE filters the specified rows, not grouping, so we need another clause, HAVING clause.

HAVING is very similar to WHERE. At present, all WHERE clauses can be replaced by HAVING. The only difference is that WHERE filters rows and HAVING filters grouping.

  • HAVING supports all WHERE operators. All the techniques and options we have learned about WHERE are applicable to HAVING.

Filter groups with two or more orders:

mysql> SELECT cust_id, COUNT(*) AS orders
    -> FROM orders
    -> GROUP BY cust_id
    -> HAVING COUNT(*) >= 2;
+---------+--------+
| cust_id | orders |
+---------+--------+
|   10001 |      2 |
+---------+--------+
1 row in set (0.01 sec)
  • The difference between HAVING and WHERE where WHERE is filtered before data grouping, and HAVING is filtered after data grouping. WHERE excluded rows are not included in the grouping, which may change the calculated value and affect the grouping filtered in the HAVING clause.

We can use both the WHERE and HAVING clauses.

List suppliers with more than 2 (including) products and more than 10 (including) products:

mysql> SELECT vend_id, COUNT(*) AS num_prods
    -> FROM products
    -> WHERE prod_price >= 10
    -> GROUP BY vend_id
    -> HAVING COUNT(*) >= 2;
+---------+-----------+
| vend_id | num_prods |
+---------+-----------+
|    1003 |         4 |
|    1005 |         2 |
+---------+-----------+
2 rows in set (0.00 sec)

In this statement, the WHERE clause filters all prods_ A line with a price of at least 10, and then press vent_ ID to group data, and then filter packets with a count of at least 2 through the HAVING clause.

Comparison without WHERE:

mysql> SELECT vend_id, COUNT(*) AS num_prods
    -> FROM products
    -> GROUP BY vend_id
    -> HAVING COUNT(*) >= 2;
+---------+-----------+
| vend_id | num_prods |
+---------+-----------+
|    1001 |         3 |
|    1002 |         2 |
|    1003 |         7 |
|    1005 |         2 |
+---------+-----------+
4 rows in set (0.00 sec)

Grouping and sorting

GROUP BY and ORDER BY often do the same work, but they are very different.

Difference between ORDER BY and GROUP BY:

ORDER BYGROUP BY
Sorting resultsGroup rows. But the output may not be in the order of grouping
Any column can be used (even non selected columns)Only selection columns or expression columns can be used, and selection expressions must be used
Not necessarilyIt is necessary to use columns (or expressions) with aggregate functions

We often find that the data grouped by group is output in the order of grouping, but this is not always the case. We should explicitly provide the ORDER BY clause to sort.

  • Don't forget that ORDER BY should also be given when using the GROUP BY clause. This is the only way to ensure that the data is sorted correctly.

Retrieve the order number and total order price of orders with total order price greater than or equal to 50:

mysql> SELECT order_num, SUM(quantity*item_price) AS ordertotal
    -> FROM orderitems
    -> GROUP BY order_num
    -> HAVING SUM(quantity*item_price) >= 50;
+-----------+------------+
| order_num | ordertotal |
+-----------+------------+
|     20005 |     149.87 |
|     20006 |      55.00 |
|     20007 |    1000.00 |
|     20008 |     125.00 |
+-----------+------------+
4 rows in set (0.00 sec)

We are sorting the output by order total price:

mysql> SELECT order_num, SUM(quantity*item_price) AS ordertotal
    -> FROM orderitems
    -> GROUP BY order_num
    -> HAVING SUM(quantity*item_price) >= 50
    -> ORDER BY ordertotal;
+-----------+------------+
| order_num | ordertotal |
+-----------+------------+
|     20006 |      55.00 |
|     20008 |     125.00 |
|     20005 |     149.87 |
|     20007 |    1000.00 |
+-----------+------------+
4 rows in set (0.00 sec)

SELECT clause order

Review the order of sub sentences in the current SELECT statement. The list order is as follows:

clauseexplainMust I use
SELECTThe column or expression to returnyes
FROMTable from which to retrieve dataUsed only when selecting data from a table
WHERERow level filteringno
GROUP BYGroup descriptionUsed only when grouping and calculating aggregates
HAVINGGroup level filteringno
ORDER BYOutput sort orderno
LIMITNumber of rows to retrieveno

Topics: Database MySQL SQL