Detailed explanation of total record optimization of count query in paging component

Posted by silas101 on Wed, 18 Mar 2020 08:21:33 +0100

1 background

When using the paging function of mybatis plus (hereinafter referred to as MBP). Found a JsqlParserCountOptimize The official didn't give a detailed introduction to the paging optimization processing class of, and didn't find a few words on the Internet to analyze the logic of this class, which we dare not use. It's also convenient for others to analyze in depth.

2 Principle

First of all, the principle of PaginationInterceptor is not covered here (the implementation principle of mybatis general pagination encapsulation is quite simple, which is the same thing). Finally, the query is basically divided into two sql: total count records + real pagination records. This class uses optimization to check the count. How to optimize the count query? Here's a real scenario to help you understand: if there are two tables, user, user address and user account, which record user and user address and user account respectively, a user may have multiple addresses, i.e. 1-to-many relationship; a user can only have one account, i.e. 1-to-1 relationship.

2.1 optimize order by

First look at the following sql and put it under paging query

select * from user order by age desc, update_time desc

Traditional paging components are often

check count: 
select count(1) from (select * from user order by age desc, update_time desc)
//Record:
select * from user order by age desc, update_time desc limit 0,50

Any problems? The order by can be completely removed when checking the count! In the case of complex query, large table, sorting of non index fields and so on, it's very slow to query records. Count again! Therefore, it is obvious that you want to optimize count to select count(1) from (select * from user).

2.1.1 restriction

But not all scenarios can be optimized, such as queries with group by

2.1.2 source code

So the MBP source code is implemented as follows. If there is no group by and order by statement, remove the order by

// Add include groupBy without removing orderBy
if (null == groupBy && CollectionUtils.isNotEmpty(orderBy)) {
        plainSelect.setOrderByElements(null);
        sqlInfo.setOrderBy(false);
}

2.2 optimize the join scenario

In the join operation, there is also the possibility of optimization. See the following sql

select u.id,ua.account from user u left join user_account ua on u.id=ua.uid

In this case, you can delete the left join direct query user when querying the count page by page, because the relationship between user and user account is 1-to-1, as follows

check count: 
select count(1) from user u
//Record: 
select u.id,ua.account from user u left join user_account ua on u.id=ua.uid limit 0,50

2.2.1 restriction

There are many restrictions on whether count can remove the join direct query of the first table, as follows:

The number of records cannot be enlarged after the table records the join

From the above case, if the number of records after left join is larger than the total number of records in the first direct query table, this optimization cannot be carried out. For example, three users each record two addresses

select u.id,ua.address from user u left join user_address ua on u.id=ua.uid (6 Strip)
vs
select count(1) from user u (3 Strip)

In this case, if you remove the left join and check the count, you will get less total records. Note that this may turn into a join. MBP cannot automatically determine whether the paging query will be record enlarged. Therefore, optimizeJoin is off by default. If you want to open the JsqlParserCountOptimize bean that needs to be declared by yourself, and set optionjoin to true, as follows

 @Bean
    public PaginationInterceptor paginationInterceptor() {
        PaginationInterceptor paginationInterceptor = new PaginationInterceptor();
        paginationInterceptor.setCountSqlParser(new JsqlParserCountOptimize(true));
        return paginationInterceptor;
    }

In fact, the source code design here is somewhat unreasonable, because after opening, you have to carefully review the paging codes of your various left join s. If there is amplification, when you can only construct Page objects, set the optimizeCountSql to false (default true), which is equivalent to turning off all count optimizations in this query. Then not only join, but also order by and other optimizations will not be carried out. It is recommended to obtain the optimizeJoin from Page (or ThreadLocal?) instead, and change it to the configuration that can be configured for each query level, which is off by default. Only after the developer confirms that the join can be optimized can the active setting of this query level be turned on.

left join only

If it is an inner join or a right join, the number of records will always be enlarged, so MBP optimization will automatically judge that if there is any non left join in multiple joins, it will not be optimized. For example, from a left join b.. Right join c.. Left join D will not be optimized directly at this time

on statement has query conditions

such as

select u.id,ua.account from user u left join user_account ua on u.id=ua.uid and ua.account > ?

where statement contains conditions for joining tables

such as

select u.id,ua.account from user u left join user_account ua on u.id=ua.uid where ua.account > ?

2.2.2 source code

MBP's join optimization source code is roughly as follows, corresponding to the above optimization and limitations

List<Join> joins = plainSelect.getJoins();
// Whether optimizeJoin is enabled globally (it is recommended that you can also set it from Page by each query)
if (optimizeJoin && CollectionUtils.isNotEmpty(joins)) {
    boolean canRemoveJoin = true;
    String whereS = Optional.ofNullable(plainSelect.getWhere()).map(Expression::toString).orElse(StringPool.EMPTY);
    for (Join join : joins) {
            // left join only
            if (!join.isLeft()) {
                    canRemoveJoin = false;
                    break;
            }
            Table table = (Table) join.getRightItem();
            String str = Optional.ofNullable(table.getAlias()).map(Alias::getName).orElse(table.getName()) + StringPool.DOT;
            String onExpressionS = join.getOnExpression().toString();
            /* If the join contains? (represents that the on statement has query conditions) 
            perhaps 
            where Statement contains conditions for joining tables
            Do not remove the join */
            if (onExpressionS.contains(StringPool.QUESTION_MARK) || whereS.contains(str)) {
                    canRemoveJoin = false;
                    break;
            }
    }
    if (canRemoveJoin) {
            plainSelect.setJoins(null);
    }
}

2.3 optimize the location of select count(1)

In traditional pagination, select count(1) is often set in the outer layer of the original query sql, such as

select count(1) from (select * from user)

The real purpose of count is to get the number of records, which does not need the extra time of select * in the original query, so it can be optimized as follows

select count(1) from user

2.3.1 restriction

In the same way, there are some scenarios where count location cannot be optimized

The field of select contains parameters

This optimization can not be performed if the select contains parameters {}, ${} waiting to be replaced, because the subsequent placeholder replacement real value stage will result in an error due to the decrease of the number of placeholders, such as

select count(1) from (select power(#{aSelectParam},2) from user_account where uid=#{uidParam}) ua
vs
select count(1) from user_account where uid=#{uidParam} ua

MBP official issue Chen 95 registered this issue

Include distinct

select contains the statement of distinct de duplication, which may increase the number of count records, so this optimization cannot be carried out. such as

select count(1) from (select distinct(uid) from user_address) ua
vs
select count(1) from user_address ua  #Number of records may increase

Include group by

For statements containing group by, because there are always aggregation functions in select, the built-in semantics of count(1) becomes aggregation functions, which cannot be optimized. such as

select count(1) from (select uid,count(1) from user_address group by uid) ua #Return the total records of single row and single column
vs
select count(1) from user_address group by uid #Return the number of aggregate count of multiple rows and single columns

2.3.2 source code

The related source code in MBP is as follows

//The parameter contained in the field of select is not optimized
for (SelectItem item : plainSelect.getSelectItems()) {
        if (item.toString().contains(StringPool.QUESTION_MARK)) {
                return sqlInfo.setSql(SqlParserUtils.getOriginalCountSql(selectStatement.toString()));
        }
}
// Include distinct, groupBy not optimized
if (distinct != null || null != groupBy) {
        return sqlInfo.setSql(SqlParserUtils.getOriginalCountSql(selectStatement.toString()));
}
...
// To optimize SQL, count select item is actually a select count(1) statement
plainSelect.setSelectItems(COUNT_SELECT_ITEM);

3 Summary

In fact, this paper aims at some optimization ideas for the step of checking the number of count records in the general paging component

Optimize order by
Optimize join statement
Optimize the location of select count(1)
Pay attention to the restrictions corresponding to the above optimization, otherwise business errors may be caused (especially the join optimization, which is relatively hidden)

In fact, it's not limited to MBP. You can also try to use the customized page blocker, which has a significant effect on page optimization

"Used to record the evolution of life, the iteration of story. Looking forward to be a platform to help and think for you

Topics: Java SQL Mybatis less

Programmer Think