introduction
when I first started the project, I had little development experience. In case of problems, I just needed to query the results. As for the efficiency of query, I may not have much consideration. It's OK when there is little data. As soon as there is more data, the efficiency problem will appear. Every time when the query is slow, the project manager will ask: is it using select * again? Is the query criteria indexed? A word awakens the dreamer and quickly checks... That's it!
sometimes we write sql statement When, it does not consider its performance or has no awareness of optimization at all, although it can temporarily meet the basic functions. When the amount of data is large, the consequences can be imagined.
Let's talk about some common methods of sql optimization:
How to write SQL statements when querying the database will be more efficient?
-
-
-
- 1. sql statement model structure optimization guide
- 2. Composite index (index in the form of (x,y,uid) index)
- 3. like statement optimization
- 4. Use the where clause= Or < > operator optimization
- 5. Try to avoid null value judgment (IS NULL or IS NOT NULL) on the field in the where clause, because null judgment will lead to full table scanning rather than index scanning. In this case, you can consider creating a database default value for this column.
- 6. Optimization of where clause using or
- 7. where clause uses the optimization of IN or NOT IN
- 8. Optimization of expression operations on fields in the where clause
- 9. Do not use SELECT anywhere*
- 10. Staging intermediate results using temporary tables
- 11. limit paging optimization
- 12. Batch insert optimization
- 13. Use limit 1 and top 1 to get a row
- 14. Try not to use the BY RAND() command
- 15. Index problem of sorting
- 16. Try to replace union with union all
- 17. Inner join and left join, right join, subquery
- 18. Usage scenarios of exist and in
- 19. Rational use of index
- 20. Page first and then query
- 21. Try to indicate the column when making statistics
- 22. If the column can use numeric type, do not use character type
- 23. Try to use variable varchar for column types
-
-
1. sql statement model structure optimization guide
A. ORDER BY + LIMIT Combined index optimization SELECT [column1],[column2],.... FROM [TABLE] ORDER BY [sort] LIMIT [offset],[LIMIT]; sort Upper simple index B. WHERE + ORDER BY + LIMIT Combined index optimization SELECT [column1],[column2],.... FROM [TABLE] WHERE [columnX] = [VALUE] ORDER BY [sort] LIMIT [offset],[LIMIT]; Create a federated index(columnX,sort) C. WHERE+ORDER BY Multiple fields+LIMIT SELECT * FROM [table] WHERE uid=1 ORDER BY x,y LIMIT 0,10; Build composite index(uid,x,y)
2. Composite index (index in the form of (x,y,uid) index)
select* from users where area ='beijing' and age=22; Indexing( area,age,salary),It is equivalent to creating( area,age,salary),(area,age),(area)Three indexes, which is called the best left prefix feature.
3. like statement optimization
SELECT id FROM A WHERE name like '%abc%' because abc It's used in the front“%",Therefore, the query must follow the full table query. Unless necessary, do not add keywords before keywords%,Optimized as follows SELECT id FROM A WHERE name like 'abc%'
4. Use the where clause= Or < > operator optimization
stay where Used in Clause!= or <>Operator, the index will be discarded and the full table query will be performed. as SQL:SELECT id FROM A WHERE ID != 5 Optimized to: SELECT id FROM A WHERE ID>5 OR ID<5
5. Try to avoid null value judgment (IS NULL or IS NOT NULL) on the field in the where clause, because null judgment will lead to full table scanning rather than index scanning. In this case, you can consider creating a database default value for this column.
stay where Used in Clause IS NULL or IS NOT NULL After judgment, the index will be abandoned and the full table query will be performed. as SQL:SELECT id FROM A WHERE num IS NULL Optimized into num Set the default value of 0 on the table to ensure that num No, null Value, then SQL Is: SELECT id FROM A WHERE num=0
6. Optimization of where clause using or
It is often used union all or nuin(When necessary)Replace by“ or"Will get better results. where Used in Clause or,The index will be discarded. as SQL:SELECT id FROM A WHERE num =10 or num = 20 Optimized to: SELECT id FROM A WHERE num = 10 union all SELECT id FROM A WHERE num=20
7. where clause uses the optimization of IN or NOT IN
in and not in Also use it with caution, otherwise it will also lead to full table scanning. Scheme I: between replace in as SQL:SELECT id FROM A WHERE num in(1,2,3) Optimized to: SELECT id FROM A WHERE num between 1 and 3 Scheme II: exist replace in as SQL:SELECT id FROM A WHERE num in(select num from b ) Optimized to: SELECT id FROM A WHERE num exists(select 1 from B where B.num = A.num) Scheme III: left join replace in as SQL:SELECT id FROM A WHERE num in(select num from B) Optimized to: SELECT id FROM A LEFT JOIN B ON A.num = B.num
8. Optimization of expression operations on fields in the where clause
Don't be where In Clause“="Perform function, arithmetic operation or other expression operation on the left, otherwise the system may not use the index correctly. A:as SQL:SELECT id FROM A WHERE num/2 = 100 Optimized to: SELECT id FROM A WHERE num = 100*2 B:as SQL:SELECT id FROM A WHERE year(addate) <2016 Optimized to: SELECT id FROM A where addate<'2016-01-01'
9. Do not use SELECT anywhere*
select * from table ,Replace with a specific list of fields"*",Do not return unused fields
10. Staging intermediate results using temporary tables
Advantages of using temporary tables to temporarily store intermediate results: (1)Avoid scanning the main table many times in the program, reduce the program execution "shared lock" and blocking "update lock", reduce the blocking and improve the concurrency performance. (2)Try to use table variables instead of temporary tables. If a table variable contains a large amount of data, note that the index is very limited (only the primary key index). (3)Avoid frequent creation and deletion of temporary tables to reduce the waste of system resources. (4)Try to avoid returning large amounts of data to the client. If the amount of data is too large, consider whether the corresponding requirements are reasonable.
11. limit paging optimization
The total data is about 5 million, as shown in the following example select * from wl_tagindex where byname='f' order by id limit 300000,10; Line time is 3.21s After optimization: select * from ( select id from wl_tagindex where byname='f' order by id limit 300000,10 ) a left join wl_tagindex b on a.id=b.id Execution time is 0.11s Speed increased significantly. It should be noted that the fields used here are byname ,id You need to make a composite index of these two fields, otherwise the effect will not be improved significantly.
12. Batch insert optimization
INSERT into person(name,age) values('A',14); INSERT into person(name,age) values('B',15); INSERT into person(name,age) values('C',16); Optimized as: INSERT into person(name,age) values('A',14),('B',15),('C',16);
13. Use limit 1 and top 1 to get a row
Sometimes when you want to query a table, you know that you only need to look at one record. You may query a special record. have access to limit 1 perhaps top 1 To terminate the database index and continue scanning the entire table or index. as SQL: SELECT id FROM A LIKE 'abc%' Optimized as: SELECT id FROM A LIKE 'abc%' limit 1
14. Try not to use the BY RAND() command
BY RAND()This function may be executed for each independent row in the table BY RAND()Command, which consumes the processing power of the processor.
15. Index problem of sorting
Mysql The query uses only one index, so if where If an index has been used in the clause, then order by Columns in do not use indexes. Therefore, if the default sorting of the database meets the requirements, do not use sorting operation; Try not to include the sorting of multiple columns. If necessary, it is best to create a composite index for these columns.
16. Try to replace union with union all
union and union all The main difference is that the former needs to merge two (or more) result sets before uniqueness filtering, which involves sorting and adds a lot of data cpu Operations, Increase resource consumption and delay. So when we can confirm that duplicate result sets are impossible or don't care about duplicate result sets, try to use them union all instead of union
17. Inner join and left join, right join, subquery
First: inner join Inner connection is also called equivalent connection, left/rightjoin Is an external connection. SELECT A.id,A.name,B.id,B.name FROM A LEFT JOIN B ON A.id =B.id; SELECT A.id,A.name,B.id,B.name FROM A RIGHT JOIN ON B A.id= B.id; SELECT A.id,A.name,B.id,B.name FROM A INNER JOIN ON A.id =B.id; It has been proved in many ways inner join The performance is fast because inner join It is an equivalent connection. Maybe the number of rows returned is relatively small. However, we should remember that some statements implicitly use equivalent connections, such as: SELECT A.id,A.name,B.id,B.name FROM A,B WHERE A.id = B.id; Recommendation: available inner join Connect as much as possible inner join connect Second: the performance of subqueries is slower than that of external connections. Try to replace subqueries with external connections. Third: use JOIN When, small results should drive large results( left join The results in the left table should be as small as possible. If conditions permit, they should be placed on the left and processed first, right join Similarly, reverse) and try to involve multiple tables at the same time Split multiple federated queries query (Multiple table queries are inefficient and easy to lock and block). For example: Select * from A left join B on A.id=B.ref_id where A.id>10; Can be optimized as: select * from (select * from A wehre id >10) T1 left join B on T1.id=B.ref_id;
18. Usage scenarios of exist and in
SELECT * from A WHERE id in (SELECT id from B) SELECT * from A WHERE id EXISTS(SELECT 1 from B where A.id= B.id) in Is to traverse the comparison in memory, exist You need to query the database, so when B When there is a large amount of data, exists Efficiency is better than in. in()Only once, put B All in the table id The fields are cached and then checked A Tabular id Whether with B In table id Equal if id Equal will A The records of the table are added to the result set until the traversal is completed A All records of the table. In The process principle of operation is like the following code List resultSet={}; Array A=(select * from A); Array B=(select id from B); for(int i=0;i<A.length;i++) { for(int j=0;j<B.length;j++) { if(A[i].id==B[j].id) { resultSet.add(A[i]); break; } } } return resultSet; It can be seen that when B It is not suitable for use when the table data is large in(),Because it will B All table data are traversed once For example: A There are 10000 records in the table, B If the table has 1000000 records, it is possible to traverse 10000 at most*1000000 Second, the efficiency is very poor. Another example: A There are 10000 records in the table, B If the table has 100 records, it is possible to traverse 10000 at most*100 Times, the traversal times are greatly reduced and the efficiency is greatly improved. Conclusion: in()fit B Table ratio A Small table data exist()Will execute A.length()Once, the execution process is like the following code List resultSet={}; Array A=(select * from A); for(int i=0;i<A.length;i++) { if(exists(A[i].id) { //Execute select 1 from B where B.id=A.id to return whether there are records resultSet.add(A[i]); } } return resultSet; When B Table ratio A The table is suitable for use when the data is large exists(),Because it doesn't have so many traversal operations, it just needs to execute the query again. For example: A There are 10000 records in the table, B If the table has 1000000 records, then exists()Will execute 10000 times to judge A In table id Whether with B In table id equal. When B Table ratio A The table is suitable for use when the data is large exists(),Because it doesn't have so many traversal operations, it just needs to execute the query again. For example: A There are 10000 records in the table, B If the table has 1000000 records, then exists()Will execute 10000 times to judge A In table id Whether with B In table id equal. For example: A There are 10000 records in the table, B If the table has 100000000 records, then exists()Or 10000 times, because it only executes A.length Times, visible B The more table data, the more suitable exists()It works. Another example: A There are 10000 records in the table, B If there are 100 records in the table, then exists()It's still 10000 times. It's better to use it in()Traverse 10000*100 Times, because in()Is to traverse the comparison in memory, and exists()You need to query the database, We all know that querying the database consumes higher performance and faster memory. Conclusion: exists()fit B Table ratio A Large table data When A Table data and B When the table data is the same size, in And exists The efficiency is almost the same. You can choose one to use.
19. Rational use of index
where Clause and order by The involved columns should be indexed as much as possible, not necessarily all, depending on the business situation. For multiple where Clause, it is recommended to build an index. The more indexes, the better, The index can certainly improve the corresponding efficiency select But it also reduces the efficiency insert and update Efficiency.
20. Page first and then query
For multiple tables with a large amount of data (hundreds here are even large) JOIN,You have to page first and then JOIN,Otherwise, the logical read will be very high and the performance will be very poor.
21. Try to indicate the column when making statistics
select count(*) from table;This is unconditional count It will cause full table scanning and has no business significance. It must be eliminated. Can be changed to select count(id) from table.
22. If the column can use numeric type, do not use character type
Try to use numeric fields. If fields containing only numeric information are not designed as characters, it will reduce the performance of query and connection and increase the storage overhead.
23. Try to use variable varchar for column types
Use as much as possible varchar replace char,Firstly, the storage space of variable length fields is small, which can save storage space. Secondly, for queries, the search efficiency in a relatively small field is obviously higher.