MySQL window function, the stranger you are most familiar with~

Posted by playaz on Thu, 11 Nov 2021 17:07:02 +0100

_Previously, I voted for fans to find the most familiar stranger in MySQL ~~Which technical points in MySQL are you familiar with and unfamiliar with?

The top three ranks are not as bad as I expected, respectively:

  • 1. Cursors
  • 2. Window Functions
  • 3. Cluster Index

_Although these three points are rarely used, they are often asked in interviews. It's worth mentioning that many interviewers are even half-informed about the problem.

_Today I want to talk to you about window functions. MySQL supports window functions since 8.0. Perhaps your company's version of MySQL won't refresh you yet, but I suggest you try it locally. It's really fragrant!

OK, let's not talk a lot. As a rule, start with an appetizer and check out today's test sheet data.

_The test table used in this paper for demonstration is chh_baozipu, translated as emmm... Chen Haha's bun shop. Tell you quietly that Hago has sold buns this year. This is the profit of the bun shop for half a year~

mysql> SELECT * from chh_baozipu ;
+----+--------------------+-------+---------+
| id | product            | sales | month   |
+----+--------------------+-------+---------+
|  1 | Pork and scallion bun*       |   600 | 2021-11 |
|  2 | Pork and scallion bun*       |  1600 | 2021-10 |
|  3 | Pork and scallion bun*       |  1000 | 2021-09 |
|  4 | Pork and scallion bun*       |   800 | 2021-08 |
|  5 | Pork and scallion bun*       |  1600 | 2021-07 |
|  6 | Pork and scallion bun*       |  1000 | 2021-06 |
|  7 | Pasta stuffed buns         |   700 | 2021-11 |
|  8 | Pasta stuffed buns         |   200 | 2021-10 |
|  9 | Pasta stuffed buns         |   300 | 2021-09 |
| 10 | Pasta stuffed buns         |     0 | 2021-08 |
| 11 | Pasta stuffed buns         |   100 | 2021-07 |
| 12 | Pasta stuffed buns         |   200 | 2021-06 |
+----+--------------------+-------+---------+
12 rows in set (0.00 sec)

What do you mean? When do you come to my store, please have pasta stuffed buns.

1. What is a window function

1. How do you understand windows?

_In fact, the concept of window is very important, if you want to learn window functions, you can't just know one thing or the other. We need to figure out what a window represents before we know when to use it.

Take the test form for a simple example, and make a statistic: the baozi shop's pork and onion buns have accumulated profits from this half year to each month*

SELECT *,SUM(sales) over(ORDER BY `month`) as Accumulated Profit 
	from chh_baozipu where product='Pork and scallion bun*';
mysql> SELECT *,SUM(sales) over(ORDER BY `month`) as Accumulated Profit from chh_baozipu where product='Pork and scallion bun*';
+----+--------------------+-------+---------+--------------+
| id | product            | sales | month   | Accumulated Profit     |
+----+--------------------+-------+---------+--------------+
|  6 | Pork and scallion bun*       |  1000 | 2021-06 |         1000 |
|  5 | Pork and scallion bun*       |  1600 | 2021-07 |         2600 |
|  4 | Pork and scallion bun*       |   800 | 2021-08 |         3400 |
|  3 | Pork and scallion bun*       |  1000 | 2021-09 |         4400 |
|  2 | Pork and scallion bun*       |  1600 | 2021-10 |         6000 |
|  1 | Pork and scallion bun*       |   600 | 2021-11 |         6600 |
+----+--------------------+-------+---------+--------------+
6 rows in set (0.00 sec)

_From this SQL, you can see that the window with the first row id=6 is the first row, and the window with the second row id=5 is the first two rows, and so on (see figure below).

_Visible, a window is the meaning of a range, which can be understood as a collection of records (rows); A window function is a special function that performs calculations on a set of records that meet certain criteria.

_For each record, the function is executed in this window. Some functions have fixed window size with different records, which is a static window. Some functions, on the contrary, correspond to different windows for different records. This dynamic window is called a sliding window. Read this article and come back to this sentence I believe will understand it more thoroughly [Manual Dog].

2. What is a window function

Window functions, also known as OLAP functions (Online Anallytical Processing), allow real-time analysis and processing of data.

What scenarios are window functions used in? There are two main types:

  • Ranking issues, such as: the monthly ranking of the profit of the bakery;
  • TOPN problems, for example: check the two months with the highest profit per bun;

There are some common window and aggregate functions:

  • Private window functions: rank(), dense_rank(), row_number()
  • Aggregation functions: max(), min(), count(), sum(), avg()

_Because aggregate functions can also be used in window functions, it is easy to confuse window functions with general aggregate functions, which are distinguished as follows:

  1. Aggregation functions aggregate multiple records into one; The window function is executed for each record, with several records finished or several.
  2. Aggregation functions can also be used in window functions, which I'll illustrate with examples.

2. Use of window functions

Basic grammar:

<Window Functions> OVER (PARTITION BY <Column name for grouping> ORDER BY <Column name for sorting>);
-- over The keyword is used to specify the window range of the function.
-- partition by Used to group tables,
-- order by Clauses are used to sort the results after grouping.

Note: Window functions operate on the results of where or group by clause processing again, so they are run in the order of the SQL statement. Window functions are usually placed in the select clause (before the front), such as the previous SQL, and can be dragged up to see ~

What are window functions? I'm too lazy to draw. I'll use Lulin 916's guide instead.

  • Sequence number function: row_number() / rank() / dense_rank()
  • Distribution function: percent_rank() / cume_dist()
  • Front and Back Functions: lag() / lead()
  • Head-end function: first_val() / last_val()
  • Other functions: nth_value() / nfile()

Let's take a look at each of these examples:

1. Sequence number function: row_number() / rank() / dense_rank()

  • ROW_NUMBER(): Ordered Sort - 1, 2, 3
  • RANK(): Sort side by side, skip duplicate serial numbers - 1, 1, 3
  • DENSE_RANK(): Sort side by side without skipping duplicate ordinal numbers - 1, 1, 2
mysql> SELECT *,ROW_NUMBER() over(ORDER BY sales desc) as pro_ROW_NUMBER,rank() over(ORDER BY sales desc) as pro_rank,DENSE_RANK() over(ORDER BY sales desc) as pro_DENSE_RANK from chh_baozipu where product='Pork and scallion bun*';
+----+--------------------+-------+---------+----------------+----------+----------------+
| id | product            | sales | month   | pro_ROW_NUMBER | pro_rank | pro_DENSE_RANK |
+----+--------------------+-------+---------+----------------+----------+----------------+
|  2 | Pork and scallion bun*       |  1600 | 2021-10 |              1 |        1 |              1 |
|  5 | Pork and scallion bun*       |  1600 | 2021-07 |              2 |        1 |              1 |
|  3 | Pork and scallion bun*       |  1000 | 2021-09 |              3 |        3 |              2 |
|  6 | Pork and scallion bun*       |  1000 | 2021-06 |              4 |        3 |              2 |
|  4 | Pork and scallion bun*       |   800 | 2021-08 |              5 |        5 |              3 |
|  1 | Pork and scallion bun*       |   600 | 2021-11 |              6 |        6 |              4 |
+----+--------------------+-------+---------+----------------+----------+----------------+
6 rows in set (0.00 sec)

_As the example above shows, three window function services and three different typical business needs are sufficient to accommodate our sorting statistics.

_When students are asked in an interview or a written test in the future, please don't talk about lowB schemes like self-inquiry nesting anymore, or you don't know me~dogs

2. Distribution function: percent_rank() / cume_dist()

This distribution function is basically unnecessary, let alone say. Interested students own Baidu~

3. Front and Back Functions: lag(expr,n) / lead(expr,n)

expr can be an expression or a column name.

The forward and backward functions are often used to return the value of the expr in the first n rows (LAG(expr,n)) or the last n rows (LEAD(expr,n)) of the current line

Scenario: Query the difference between the results of the first n students and those of the current students

_Inner SQL first obtains the results of the first student through LAG() function, and then outer SQL obtains the result difference diff by doing a bad job between the current and the first student.

_It's a bit embarrassing to switch to a Hago test form here. But you certainly understand that. Let's check it out:

mysql> SELECT *,lag(sales,1) over win as pro_lag,lead(sales,1) over win as pro_lead from chh_baozipu WINDOW win as (PARTITION BY product ORDER BY sales desc);
+----+--------------------+-------+---------+---------+----------+
| id | product            | sales | month   | pro_lag | pro_lead |
+----+--------------------+-------+---------+---------+----------+
|  2 | Pork and scallion bun*       |  1600 | 2021-10 |    NULL |     1600 |
|  5 | Pork and scallion bun*       |  1600 | 2021-07 |    1600 |     1000 |
|  3 | Pork and scallion bun*       |  1000 | 2021-09 |    1600 |     1000 |
|  6 | Pork and scallion bun*       |  1000 | 2021-06 |    1000 |      800 |
|  4 | Pork and scallion bun*       |   800 | 2021-08 |    1000 |      600 |
|  1 | Pork and scallion bun*       |   600 | 2021-11 |     800 |     NULL |
|  7 | Pasta stuffed buns         |   700 | 2021-11 |    NULL |      300 |
|  9 | Pasta stuffed buns         |   300 | 2021-09 |     700 |      200 |
|  8 | Pasta stuffed buns         |   200 | 2021-10 |     300 |      200 |
| 12 | Pasta stuffed buns         |   200 | 2021-06 |     200 |      100 |
| 11 | Pasta stuffed buns         |   100 | 2021-07 |     200 |        0 |
| 10 | Pasta stuffed buns         |     0 | 2021-08 |     100 |     NULL |
+----+--------------------+-------+---------+---------+----------+
12 rows in set (0.00 sec)

Here I want to ask if my classmates find this SQL different from the previous one? What are the differences?

SELECT *,
lag(sales,1) over win as pro_lag,
lead(sales,1) over win as pro_lead
from chh_baozipu where product='Pork and scallion bun*' 
WINDOW win as (PARTITION BY product ORDER BY sales desc);

1. Alias the window by extracting it

_In fact, this is to bring the window out and set the alias: win, like the alias we use when writing SQL, so it looks simpler and more comfortable, right.

_Someone asked the programmer what to be concise? Code cow B can't be understood by others. This kind of classmate has never been beaten by the society at first glance. When you encounter the centuries-old ancestral code, you will know what to call the Avenue Simple (use Fatty Goto).

2. Add PARTITION BY product to the window

In the over clause, this keyword means to control the contents of the window. In the above basic grammar, I tell you that there are two keywords in over:

  • partition by is the grouping of window contents;
  • order by is to sort the contents of windows after grouping them;

In fact, there are more interesting ways to control the window range ~~

_There are two ways to specify the range of a sliding window, line-based and range-based. I'll focus on the commonly used line-based methods to control the range of a window.

Usually BETWEEN frame_is used Start AND frame_end syntax to represent line range, frame_start and frame_end can support the following keywords to determine different dynamic row records:

  • CURRENT ROW boundaries are the current line and are generally used with other range keywords
  • The UNBOUNDED PRECEDING boundary is the first row in the partition
  • The UNBOUNDED FOLLOWING boundary is the last line in the partition
  • expr PRECEDING boundary is the current line minus expr value
  • expr FOLLOWING boundary is the value of the current line plus expr

Here are a few examples:

(1) Calculate aggregate window functions for the current row and the first n rows (n+1 rows in total)

In the following example, the control window size is the sum of profits for the current month + the first two months to see the effect:

SELECT *,SUM(sales) OVER win as 'Profit added up in the last three months'
FROM chh_baozipu 
WINDOW win as (PARTITION BY product ORDER BY `month` ROWS 2 PRECEDING);
mysql> SELECT *,SUM(sales) OVER win as 'Profit added up in the last three months'
    -> FROM chh_baozipu 
	-> WINDOW win as (PARTITION BY product ORDER BY `month` ROWS 2 PRECEDING);
+----+--------------------+-------+---------+--------------------------+
| id | product            | sales | month   | Profit added up in the last three months         |
+----+--------------------+-------+---------+--------------------------+
|  6 | Pork and scallion bun*       |  1000 | 2021-06 |                     1000 |
|  5 | Pork and scallion bun*       |  1600 | 2021-07 |                     2600 |
|  4 | Pork and scallion bun*       |   800 | 2021-08 |                     3400 |
|  3 | Pork and scallion bun*       |  1000 | 2021-09 |                     3400 |
|  2 | Pork and scallion bun*       |  1600 | 2021-10 |                     3400 |
|  1 | Pork and scallion bun*       |   600 | 2021-11 |                     3200 |
| 12 | Pasta stuffed buns         |   200 | 2021-06 |                      200 |
| 11 | Pasta stuffed buns         |   100 | 2021-07 |                      300 |
| 10 | Pasta stuffed buns         |     0 | 2021-08 |                      300 |
|  9 | Pasta stuffed buns         |   300 | 2021-09 |                      400 |
|  8 | Pasta stuffed buns         |   200 | 2021-10 |                      500 |
|  7 | Pasta stuffed buns         |   700 | 2021-11 |                     1200 |
+----+--------------------+-------+---------+--------------------------+
12 rows in set (0.00 sec)

(2) Calculate the aggregate window function of the current row and the first n1 rows and the last n2 rows

In the following example, the control window size is the sum of profits from the previous month to the next month of the current month to see the effect:

SELECT *,SUM(sales) OVER win as 'First three months profit plus' 
FROM chh_baozipu 
WINDOW win as (PARTITION BY product ORDER BY `month` ROWS BETWEEN n1 PRECEDING AND n2 FOLLOWING);
mysql> SELECT *,SUM(sales) OVER win as 'Incremental Profit from Previous Month to Next Month' FROM chh_baozipu WINDOW win as (PARTITION BY product ORDER BY `month` ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING);
+----+--------------------+-------+---------+--------------------------+
| id | product            | sales | month   |Incremental Profit from Previous Month to Next Month|
+----+--------------------+-------+---------+--------------------------+
|  6 | Pork and scallion bun*       |  1000 | 2021-06 |                     2600 |
|  5 | Pork and scallion bun*       |  1600 | 2021-07 |                     3400 |
|  4 | Pork and scallion bun*       |   800 | 2021-08 |                     3400 |
|  3 | Pork and scallion bun*       |  1000 | 2021-09 |                     3400 |
|  2 | Pork and scallion bun*       |  1600 | 2021-10 |                     3200 |
|  1 | Pork and scallion bun*       |   600 | 2021-11 |                     2200 |
| 12 | Pasta stuffed buns         |   200 | 2021-06 |                      300 |
| 11 | Pasta stuffed buns         |   100 | 2021-07 |                      300 |
| 10 | Pasta stuffed buns         |     0 | 2021-08 |                      400 |
|  9 | Pasta stuffed buns         |   300 | 2021-09 |                      500 |
|  8 | Pasta stuffed buns         |   200 | 2021-10 |                     1200 |
|  7 | Pasta stuffed buns         |   700 | 2021-11 |                      900 |
+----+--------------------+-------+---------+--------------------------+
12 rows in set (0.00 sec)

4. Head and tail functions: FIRST_VALUE(expr), LAST_VALUE(expr)

The head-end function is applied to: return the value of the first or last expr;

Scenario: As of now, query the current maximum and minimum monthly income by date.

SELECT *,
FIRST_VALUE(sales) over win as 'Current maximum monthly income',
LAST_VALUE(sales) over win as 'Current minimum monthly income' 
from chh_baozipu 
WINDOW win as (PARTITION BY product ORDER BY `month`);
mysql> SELECT *,FIRST_VALUE(sales) over win as 'Current maximum monthly income',LAST_VALUE(sales) over win as 'Current minimum monthly income' from chh_baozipu WINDOW win as (PARTITION BY product ORDER BY `month`);
+----+--------------------+-------+---------+-----------------------+-----------------------+
| id | product            | sales | month   | Current maximum monthly income        | Current minimum monthly income        |
+----+--------------------+-------+---------+-----------------------+-----------------------+
|  6 | Pork and scallion bun*       |  1000 | 2021-06 |                  1000 |                  1000 |
|  5 | Pork and scallion bun*       |  1600 | 2021-07 |                  1000 |                  1600 |
|  4 | Pork and scallion bun*       |   800 | 2021-08 |                  1000 |                   800 |
|  3 | Pork and scallion bun*       |  1000 | 2021-09 |                  1000 |                  1000 |
|  2 | Pork and scallion bun*       |  1600 | 2021-10 |                  1000 |                  1600 |
|  1 | Pork and scallion bun*       |   600 | 2021-11 |                  1000 |                   600 |
| 12 | Pasta stuffed buns         |   200 | 2021-06 |                   200 |                   200 |
| 11 | Pasta stuffed buns         |   100 | 2021-07 |                   200 |                   100 |
| 10 | Pasta stuffed buns         |     0 | 2021-08 |                   200 |                     0 |
|  9 | Pasta stuffed buns         |   300 | 2021-09 |                   200 |                   300 |
|  8 | Pasta stuffed buns         |   200 | 2021-10 |                   200 |                   200 |
|  7 | Pasta stuffed buns         |   700 | 2021-11 |                   200 |                   700 |
+----+--------------------+-------+---------+-----------------------+-----------------------+
12 rows in set (0.00 sec)

5. Other functions: nth_value() / nfile()

nfile() is not commonly used and will not be repeated. Here we only mention NTH_VALUE(expr,n) function;

NTH_VALUE Purpose: Returns the value of the nth expr in the window.

Scenario: As of now, it shows the profit of the 2nd and 3rd place in the monthly profit list of Chen Habaozi Shop.

SELECT *,
nth_value(sales,2) over win as 'Currently ranked 2nd monthly revenue',
nth_value(sales,3) over win as 'Current 3rd Monthly Revenue' 
from chh_baozipu 
WINDOW win as (PARTITION BY product ORDER BY `month`);
mysql> SELECT *,nth_value(sales,2) over win as 'Currently ranked 2nd monthly revenue',nth_value(sales,3) over win as 'Current 3rd Monthly Revenue' from chh_baozipu WINDOW win as (PARTITION BY product ORDER BY `month`);
+----+--------------------+-------+---------+--------------------------------+--------------------------------+
| id | product            | sales | month   | Currently ranked 2nd monthly revenue           | Current 3rd Monthly Revenue           |
+----+--------------------+-------+---------+--------------------------------+--------------------------------+
|  6 | Pork and scallion bun*       |  1000 | 2021-06 |                           NULL |                           NULL |
|  5 | Pork and scallion bun*       |  1600 | 2021-07 |                           1600 |                           NULL |
|  4 | Pork and scallion bun*       |   800 | 2021-08 |                           1600 |                            800 |
|  3 | Pork and scallion bun*       |  1000 | 2021-09 |                           1600 |                            800 |
|  2 | Pork and scallion bun*       |  1600 | 2021-10 |                           1600 |                            800 |
|  1 | Pork and scallion bun*       |   600 | 2021-11 |                           1600 |                            800 |
| 12 | Pasta stuffed buns         |   200 | 2021-06 |                           NULL |                           NULL |
| 11 | Pasta stuffed buns         |   100 | 2021-07 |                            100 |                           NULL |
| 10 | Pasta stuffed buns         |     0 | 2021-08 |                            100 |                              0 |
|  9 | Pasta stuffed buns         |   300 | 2021-09 |                            100 |                              0 |
|  8 | Pasta stuffed buns         |   200 | 2021-10 |                            100 |                              0 |
|  7 | Pasta stuffed buns         |   700 | 2021-11 |                            100 |                              0 |
+----+--------------------+-------+---------+--------------------------------+--------------------------------+
12 rows in set (0.00 sec)

Summary

As far as window functions are concerned, window functions are new things that I have discovered since I came into contact with MySQL8. Suddenly, the MySQL development team is still very intelligent. Every version will add some new ways of playing, which is also very useful. I hope MySQL 9.0 will bring us more surprises.

Well, let's not say more. I advise you to use rat tail juice, but I recommend you pay attention to me, because I will let you learn a lot in happiness!

MySQL Series Articles Summary and MySQL River and Lake Road | Column Catalog

Previous popular MySQL articles:

Topics: Database MySQL