PostgreSQL execution plan

Posted by cigardude on Tue, 20 Aug 2019 11:56:18 +0200

Links to the original text: https://blog.csdn.net/JAVA528416037/article/details/91998019

Reproduced from: https://blog.csdn.net/JAVA528416037/article/details/91998019

brief introduction

PostgreSQL is "the most advanced open source relational database in the world". Because of the late appearance, the customer base is less than MySQL, but the momentum of development is very strong, the biggest advantage is completely open source.

MySQL is the most popular open source relational database in the world. At present, the customer base is large. With the acquisition of Oracle, the degree of open source decreases. Especially recently, the free MariaDB branch has been pulled alone, which indicates that MySQL tends to be closed source.

As for the merits and demerits of the two, this is not the focus of this article. In general, there is no big difference. Next, we will only discuss the implementation plan in PG.

Implementation Plan

In the process of Query Planning path, different execution schemes of query requests are expressed by establishing different paths. After generating more qualified paths, the path with the least cost should be selected from them, which can be transformed into an execution plan and passed to the executor for execution. So how to generate the least-cost plan? Statistical information is used to estimate the cost of each node in the plan, and the related parameters are as follows:

Calculate the cost:

# Estimated cost:
total_cost = seq_page_cost * relpages + cpu_tuple_cost *  reltuples


# Sometimes we don't want to use the default execution plan of the system. At this time, we can enforce control of the execution plan by forbidding/opening the grammar of some operation:
enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on #Index Scanning
enable_indexonlyscan = on #Read-Only Index Scanning
enable_material = on #Materialized view
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
enable_tidscan = on

# According to the scanning mode above and filtering cost:
Cost = seq_page_cost * relpages + cpu_tuple_cost *  reltuples + cpu_operation_cost * reltuples

Each SQL statement has its own execution plan. We can use the explain instruction to get the execution plan. The grammar is as follows:

nsc=# \h explain;
Command:     EXPLAIN
Description: show the execution plan of a statement
Syntax:
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

where option can be one of:

    ANALYZE [ boolean ]  -- Whether it is really implemented by default false
    VERBOSE [ boolean ]  -- Whether to display details, default false
    COSTS [ boolean ]    -- Whether to display cost information, default true
    BUFFERS [ boolean ]  -- Whether to display cache information, default false，Pre-event is analyze
    TIMING [ boolean ]   -- Whether to display time information
    FORMAT { TEXT | XML | JSON | YAML }  -- Input format, default is text

As shown in the figure below, cost is an important indicator. cost=1000.00..1205.30. The cost of executing SQL is divided into two parts. The first part represents the start time (startup) of 1000 ms, and the second part represents the total time (total) of 1205.30 ms, which is the cost of executing the whole SQL. Rows represent the number of predicted rows, which may differ from the actual number of records. The database often vacuum or analyze, and the closer the value is to the actual value. The total width of all fields representing the query results is 285 bytes.

You can add the analyze keyword after explain ing to get the real execution plan and execution time by executing the SQL. The first number in the actual time represents the time required to return the first line, and the second number represents the time taken to execute the entire sql. Loops is the number of loops for the node. When loops are greater than 1, the total cost is: actual time * loops.

Execution Planning Node Type

In the execution plan of PostgreSQL, it is read from top to bottom. Usually, the execution plan has relevant index to represent different plan nodes. There are four types of plan nodes: Control Node, Scan Node, Materialization Node, Join Nod. E).

Control node: append, an execution node that organizes multiple word tables or sub-queries, is mainly used for union operations.

Scanning Node: Used to scan objects such as tables for tuples

Seq Scan: Read all the data blocks of the table from beginning to end, and select the eligible data blocks.

Index Scan: In order to speed up the query, find the physical location of the required data rows in the index, and then read out the corresponding data in the table data block, such as B tree, GiST, GIN, BRIN, HASH.

Bitmap Index/Heap Scan (Bitmap Index/Result Scan): Build a bitmap in memory for the qualified rows or blocks. After scanning the index, read the corresponding data according to the data file of the bitmap list. First, find the qualified rows in the index through Bitmap Index Scan, and build a bitmap in memory. Then scan Bitmap Heap Scan in the table.

Physical Node: Can cache the execution results into the cache, that is, the result tuple cache generated when the first execution, waiting for the upper node to use. For example, the sort node can get all tuples returned by the lower node and sort them according to the specified attributes, and cache the ranking results. Every time the upper node takes tuple, it slows down. Read on demand in storage.

Materialize: Caching tuples returned by lower-level nodes (such as when joining tables)

Sort: Sort the nodes returned from the lower layer (if memory exceeds the specified size of the iwork_mem parameter, the workspace of the node switches to temporary files, resulting in a dramatic performance degradation)

Group: Group operations on lower-level sorting tuples

Agg: Execute aggregation functions (sum/max/min/avg)

Conditional filtering usually adds filtering conditions after where. When scanning data rows, the rows satisfying filtering conditions will be found. Conditional filtering displays Filter in the execution plan. If there is an index on the column of condition, it may go indexing, but it will not go through filtering.

Connecting Nodes: Connecting operations in relational algebra can be implemented in a variety of ways (conditional connection/left connection/right connection/full connection/natural connection)

Nestedloop Join: The inner table is driven by the outer table. Every row returned by the outer table must search for the matching row in the inner table, so the result set returned by the whole query can not be too large. The table returned with a smaller subset should be regarded as the outer table, and the join field of the inner table should be indexed. The execution process is to determine one outer table and another inner table, each row of which is associated with the corresponding record in the inner table.

Hash Join: The optimizer uses two comparative tables and uses join attributes to create hash tables in memory, then scans larger tables and detects hash tables to find rows that match hash tables.

Merge Join: Usually hash connections perform better than merge connections, but if the source data is indexed or the results have been sorted, merge connections perform better than hash connections.

Explain

Operational type	Operational instructions	Is there a start-up time?
Seq Scan	Sequential Scanning Table	No startup time
Index Scan	Index Scanning	No startup time
Bitmap Index Scan	Index Scanning	Start-up time
Bitmap Heap Scan	Index Scanning	Start-up time
Subquery Scan	Subquery	No startup time
Tid Scan	Line Number Retrieval	No startup time
Function Scan	Function Scanning	No startup time
Nested Loop Join	nested loops join	No startup time
Merge Join	Merge connection	Start-up time
Hash Join	HASH join	Start-up time
Sort	Order by	Start-up time
Hash	Hashing Operation	Start-up time
Result	Function scanning, independent of specific tables	No startup time
Unique	distinct/union	Start-up time
Limit	limit/offset	Start-up time
Aggregate	Aggregation functions such as count, sum,avg	Start-up time
Group	group by	Start-up time
Append	union operation	No startup time
Materialize	Subquery	Start-up time
SetOp	intersect/except	Start-up time

Explanation of examples

Slow sql is as follows:

SELECT
	te.event_type,
	sum(tett.feat_bytes) AS traffic
FROM t_event te
LEFT JOIN t_event_traffic_total tett
ON tett.event_id = te.event_id
WHERE
	((te.event_type >= 1 AND te.event_type <= 17) OR (te.event_type >= 23 AND te.event_type <= 26) OR (te.event_type >= 129 AND te.event_type <= 256))
AND te.end_time >= '2017-10-01 09:39:41+08:00'
AND te.begin_time <= '2018-01-01 09:39:41+08:00'
AND tett.stat_time >= '2017-10-01 09:39:41+08:00'
AND tett.stat_time < '2018-01-01 09:39:41+08:00'
GROUP BY te.event_type
ORDER BY total_count DESC
LIMIT 10

Time-consuming: about 4 seconds

Function: Event tables are associated with event flow tables to identify TOP10 event types arranged by total traffic size over a period of time

Number of records:

select count(1) from t_event;  -- 535881 strip
select count(1) from t_event_traffic_total; -- 2123235 strip

Result:

event_type    traffic
17	2.26441505638877E17
2	2.25307250128674E17
7	1.20629298837E15
26	285103860959500
1	169208970599500
13	47640495350000
6	15576058500000
3	12671721671000
15	1351423772000
11	699609230000

Implementation plan:

Limit  (cost=5723930.01..5723930.04 rows=10 width=12) (actual time=3762.383..3762.384 rows=10 loops=1)
  Output: te.event_type, (sum(tett.feat_bytes))
  Buffers: shared hit=1899 read=16463, temp read=21553 written=21553
  ->  Sort  (cost=5723930.01..5723930.51 rows=200 width=12) (actual time=3762.382..3762.382 rows=10 loops=1)
        Output: te.event_type, (sum(tett.feat_bytes))
        Sort Key: (sum(tett.feat_bytes))
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=1899 read=16463, temp read=21553 written=21553
        ->  HashAggregate  (cost=5723923.69..5723925.69 rows=200 width=12) (actual time=3762.360..3762.363 rows=18 loops=1)
              Output: te.event_type, sum(tett.feat_bytes)
              Buffers: shared hit=1899 read=16463, temp read=21553 written=21553
              ->  Merge Join  (cost=384982.63..4390546.88 rows=266675361 width=12) (actual time=2310.395..3119.886 rows=2031023 loops=1)
                    Output: te.event_type, tett.feat_bytes
                    Merge Cond: (te.event_id = tett.event_id)
                    Buffers: shared hit=1899 read=16463, temp read=21553 written=21553
                    ->  Sort  (cost=3284.60..3347.40 rows=25119 width=12) (actual time=21.509..27.978 rows=26225 loops=1)
                          Output: te.event_type, te.event_id
                          Sort Key: te.event_id
                          Sort Method: external merge  Disk: 664kB
                          Buffers: shared hit=652, temp read=84 written=84
                          ->  Append  (cost=0.00..1448.84 rows=25119 width=12) (actual time=0.027..7.975 rows=26225 loops=1)
                                Buffers: shared hit=652
                                ->  Seq Scan on public.t_event te  (cost=0.00..0.00 rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
                                      Output: te.event_type, te.event_id
                                      Filter: ((te.end_time >= '2017-10-01 09:39:41+08'::timestamp with time zone) AND (te.begin_time <= '2018-01-01 09:39:41+08'::timestamp with time zone) AND (((te.event_type >= 1) AND (te.event_type <= 17)) OR ((te.event_type >= 23) AND (te.event_type <= 26)) OR ((te.event_type >= 129) AND (te.event_type <= 256))))
                                ->  Scanning sub-table process, omitting...
                    ->  Materialize  (cost=381698.04..392314.52 rows=2123296 width=16) (actual time=2288.881..2858.256 rows=2123235 loops=1)
                          Output: tett.feat_bytes, tett.event_id
                          Buffers: shared hit=1247 read=16463, temp read=21469 written=21469
                          ->  Sort  (cost=381698.04..387006.28 rows=2123296 width=16) (actual time=2288.877..2720.994 rows=2123235 loops=1)
                                Output: tett.feat_bytes, tett.event_id
                                Sort Key: tett.event_id
                                Sort Method: external merge  Disk: 53952kB
                                Buffers: shared hit=1247 read=16463, temp read=21469 written=21469
                                ->  Append  (cost=0.00..49698.20 rows=2123296 width=16) (actual time=0.026..470.610 rows=2123235 loops=1)
                                      Buffers: shared hit=1247 read=16463
                                      ->  Seq Scan on public.t_event_traffic_total tett  (cost=0.00..0.00 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=1)
                                            Output: tett.feat_bytes, tett.event_id
                                            Filter: ((tett.stat_time >= '2017-10-01 09:39:41+08'::timestamp with time zone) AND (tett.stat_time < '2018-01-01 09:39:41+08'::timestamp with time zone))
                                      ->  Scanning sub-table process, omitting...
Total runtime: 3771.346 ms

Interpretation of the implementation plan:

Line 40 - > 30: Scan the t_event_traffic_total table sequentially through the index created on the end time, filter out the eligible data according to the time span of three months, totaling 2123235 records.

Line 26 - > 21: According to the time, 26225 qualified records in t_event table were filtered out.

Line 30 - > 27: sort by traffic size.

Line 12 - > 09: Two tables perform join operation and complete 200 records;

Line 08 - > 04: Sort the final 200 records by size;

Line 01: Execute limit to take 10 records.

The longest time spent in the whole execution plan is to filter the t_event_traffic_total table according to the time condition. Because there are more vocabularies and more records, it will cost 2.8s. So the idea of our optimization is relatively simple. According to the actual time, we can check whether there are indexes in the table directly and spend more subtables. Not many, there is no room for optimization, and after investigation, found that a sub-table of data amounted to 153,1147.

Welcome to Technical Public: Architect Growth Camp

Topics: SQL Database MySQL PostgreSQL

Programmer Think

PostgreSQL execution plan

Hot Topics