Label
PostgreSQL , Heap Only Tuple , HOT
background
PostgreSQL's current default storage engine generates a new version in the heap when updating records. The old version needs to recycle all index POINT associated with this version before it needs to recycle the old version after VACUUM is used.
The KEY of the index of PG is the value of the index field or expression, and the VALUE is the line number.
Before 8.3, each TUPLE version (line number) had its corresponding index POINT, so the update enlarged a lot.
Starting with 8.3, the concept of HOT is introduced. When updating records, if two conditions can be met, all TUPLE versions are strung through the LINK inside HEAP PAGE, so the index remains unchanged.
HOT must satisfy the following two conditions:
Necessary Condition A: UPDATE does not change any of the index keys Necessary Condition B: The new version should fit in the same old block – HOT chains can not cross block boundary. 1,The value of the index field remains unchanged.(If the value of any of these index fields changes, all indexes need to be updated.) 2,The new version is the same as the old one. HEAP PAGE Medium.
After 10 years, secondary index can be used to solve the problem of index enlargement introduced by update:
PostgreSQL 10.0 preview Performance Enhancement - Secondary Index
Interpretation of HOT Examples
1. Create test tables and write 10 test data
postgres=# create table a(id int, c1 int, c2 int, c3 int); CREATE TABLE postgres=# insert into a select generate_series(1,10), random()*100, random()*100, random()*100; INSERT 0 10
2. Creating Index
postgres=# create index idx_a_1 on a (id); CREATE INDEX postgres=# create index idx_a_2 on a (c1); CREATE INDEX postgres=# create index idx_a_3 on a (c2); CREATE INDEX
3. Observing the contents of index pages through pageinspect plug-ins
postgres=# SELECT * FROM bt_metap('idx_a_1'); magic | version | root | level | fastroot | fastlevel --------+---------+------+-------+----------+----------- 340322 | 2 | 1 | 0 | 1 | 0 (1 row) postgres=# SELECT * FROM bt_metap('idx_a_2'); magic | version | root | level | fastroot | fastlevel --------+---------+------+-------+----------+----------- 340322 | 2 | 1 | 0 | 1 | 0 (1 row) postgres=# SELECT * FROM bt_metap('idx_a_3'); magic | version | root | level | fastroot | fastlevel --------+---------+------+-------+----------+----------- 340322 | 2 | 1 | 0 | 1 | 0 (1 row)
postgres=# SELECT * FROM bt_page_items('idx_a_1',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_2',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_3',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00 (10 rows)
4. HOT updates (there is no index on the updated field, and the new version is recorded on the same HEAP PAGE)
postgres=# update a set c3=c3+1 where id=1; UPDATE 1 postgres=# SELECT * FROM bt_page_items('idx_a_1',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_2',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_3',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00 (10 rows)
postgres=# update a set c3=c3 where id=1 returning ctid,*; ctid | id | c1 | c2 | c3 --------+----+----+----+---- (0,13) | 1 | 13 | 20 | 15 (1 row) UPDATE 1 postgres=# SELECT * FROM bt_page_items('idx_a_1',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_2',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_3',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00 (10 rows)
postgres=# update a set c3=c3 where id=1; UPDATE 1 postgres=# SELECT * FROM bt_page_items('idx_a_1',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_2',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_3',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00 (10 rows)
5. HOT updates. Fields with indexes are updated, but the value remains unchanged. And the new version is in the same PAGE.
postgres=# update a set c2=c2 where id=1 returning ctid,*; ctid | id | c1 | c2 | c3 --------+----+----+----+---- (0,14) | 1 | 13 | 20 | 15 (1 row) UPDATE 1 postgres=# SELECT * FROM bt_page_items('idx_a_1',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_2',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_3',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00 (10 rows)
With the above HOT updates, all indexes have not changed.
6. NON-HOT updates, updates the values of index fields, all indexes have changed, at least three index IO s have occurred.
postgres=# update a set c2=c2+1 where id=1 returning ctid,*; ctid | id | c1 | c2 | c3 --------+----+----+----+---- (0,15) | 1 | 13 | 21 | 15 (1 row) UPDATE 1 postgres=# SELECT * FROM bt_page_items('idx_a_1',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,15) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00 3 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00 4 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00 6 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 7 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00 8 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00 9 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00 10 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00 11 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00 (11 rows) postgres=# SELECT * FROM bt_page_items('idx_a_2',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00 3 | (0,15) | 16 | f | f | 0d 00 00 00 00 00 00 00 4 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00 5 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00 6 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00 7 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00 8 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00 9 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00 10 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00 11 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00 (11 rows) postgres=# SELECT * FROM bt_page_items('idx_a_3',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00 5 | (0,15) | 16 | f | f | 15 00 00 00 00 00 00 00 6 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00 7 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00 8 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00 9 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00 10 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00 11 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00 (11 rows)
7. Garbage collection, first recycle index garbage version, and finally recycle table garbage version.
postgres=# vacuum verbose a; INFO: vacuuming "public.a" INFO: scanned index "idx_a_1" to remove 1 row versions DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: scanned index "idx_a_2" to remove 1 row versions DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: scanned index "idx_a_3" to remove 1 row versions DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: "a": removed 1 row versions in 1 pages DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: index "idx_a_1" now contains 10 row versions in 2 pages DETAIL: 1 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_a_2" now contains 10 row versions in 2 pages DETAIL: 1 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: index "idx_a_3" now contains 10 row versions in 2 pages DETAIL: 1 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: "a": found 5 removable, 10 nonremovable row versions in 1 out of 1 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 716311280 There were 4 unused item pointers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM
8. After garbage collection, the garbage version of the index is removed.
postgres=# SELECT * FROM bt_page_items('idx_a_1',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,15) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_2',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00 3 | (0,15) | 16 | f | f | 0d 00 00 00 00 00 00 00 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00 (10 rows) postgres=# SELECT * FROM bt_page_items('idx_a_3',1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00 4 | (0,15) | 16 | f | f | 15 00 00 00 00 00 00 00 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00 (10 rows)
performance testing
1. Create 32 indexes of test tables.
do language plpgsql $$ declare sql text; begin sql := 'create table a (id int primary key,'; for i in 1..32 loop sql := sql||'c'||i||' int default random()*1000,'; end loop; sql := rtrim(sql,','); sql := sql||') with (fillfactor=80)'; execute sql; for i in 2..32 loop execute 'create index idx_a_c'||i||' on a (c'||i||')'; end loop; end; $$;
2. Write 10 million records
insert into a (id) select generate_series(1,10000000);
3. non-hot update
vi test_non_hot.sql \set id random(1,10000000) update a set c2=c2+random()*100-100 where id=:id;
4. HOT Update
vi test_hot1.sql \set id random(1,10000000) update a set c1=c1+random()*100-100 where id=:id; vi test_hot2.sql \set id random(1,10000000) update a set c2=c2 where id=:id;
5. Performance comparison
5.1,HOT
pgbench -M prepared -n -r -P 1 -f ./test_hot1.sql -c 28 -j 28 -T 120 transaction type: ./test_hot1.sql scaling factor: 1 query mode: prepared number of clients: 28 number of threads: 28 duration: 120 s number of transactions actually processed: 9139010 latency average = 0.368 ms latency stddev = 0.187 ms tps = 76157.798606 (including connections establishing) tps = 76174.469712 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 0.001 \set id random(1,10000000) 0.366 update a set c1=c1+random()*100-100 where id=:id; Total DISK READ : 0.00 B/s | Total DISK WRITE : 13.14 M/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 13.82 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 45828 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 1.06 % postgres: postgres postgres 127.0.0.1(41326) UPDATE 45810 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 0.99 % postgres: postgres postgres 127.0.0.1(41290) UPDATE 45821 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 0.98 % postgres: postgres postgres 127.0.0.1(41312) idle 45820 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.96 % postgres: postgres postgres 127.0.0.1(41310) UPDATE 45822 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.96 % postgres: postgres postgres 127.0.0.1(41314) UPDATE 45819 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.94 % postgres: postgres postgres 127.0.0.1(41308) UPDATE 45806 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.92 % postgres: postgres postgres 127.0.0.1(41282) idle 45824 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.90 % postgres: postgres postgres 127.0.0.1(41318) UPDATE 45827 be/4 postgres 0.00 B/s 54.58 K/s 0.00 % 0.89 % postgres: postgres postgres 127.0.0.1(41324) UPDATE 45814 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.89 % postgres: postgres postgres 127.0.0.1(41298) UPDATE 45818 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.88 % postgres: postgres postgres 127.0.0.1(41306) idle 45823 be/4 postgres 0.00 B/s 15.59 K/s 0.00 % 0.87 % postgres: postgres postgres 127.0.0.1(41316) UPDATE 45805 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.87 % postgres: postgres postgres 127.0.0.1(41280) UPDATE 45826 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.86 % postgres: postgres postgres 127.0.0.1(41322) UPDATE 45809 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.86 % postgres: postgres postgres 127.0.0.1(41288) UPDATE 45808 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 0.85 % postgres: postgres postgres 127.0.0.1(41286) UPDATE 45825 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.85 % postgres: postgres postgres 127.0.0.1(41320) UPDATE 45804 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 0.84 % postgres: postgres postgres 127.0.0.1(41278) UPDATE 49040 be/4 postgres 0.00 B/s 12.85 M/s 0.00 % 0.84 % postgres: wal writer process 45816 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.66 % postgres: postgres postgres 127.0.0.1(41302) idle 45829 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.64 % postgres: postgres postgres 127.0.0.1(41328) UPDATE 45803 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.62 % postgres: postgres postgres 127.0.0.1(41276) BIND 45795 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.59 % postgres: postgres postgres 127.0.0.1(41274) UPDATE 45807 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.58 % postgres: postgres postgres 127.0.0.1(41284) UPDATE 45812 be/4 postgres 0.00 B/s 15.59 K/s 0.00 % 0.56 % postgres: postgres postgres 127.0.0.1(41294) UPDATE 45811 be/4 postgres 0.00 B/s 15.59 K/s 0.00 % 0.55 % postgres: postgres postgres 127.0.0.1(41292) UPDATE 45817 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.55 % postgres: postgres postgres 127.0.0.1(41304) UPDATE 45815 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.55 % postgres: postgres postgres 127.0.0.1(41300) UPDATE 45813 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.51 % postgres: postgres postgres 127.0.0.1(41296) UPDATE
5.2,NON-HOT
pgbench -M prepared -n -r -P 1 -f ./test_non_hot.sql -c 28 -j 28 -T 120 transaction type: ./test_non_hot.sql scaling factor: 1 query mode: prepared number of clients: 28 number of threads: 28 duration: 120 s number of transactions actually processed: 6472445 latency average = 0.519 ms latency stddev = 0.707 ms tps = 53922.273197 (including connections establishing) tps = 53933.908671 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 0.002 \set id random(1,10000000) 0.517 update a set c2=c2+random()*100-100 where id=:id; Total DISK READ : 0.00 B/s | Total DISK WRITE : 191.66 M/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 142.11 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 49040 be/4 postgres 0.00 B/s 136.56 M/s 0.00 % 7.17 % postgres: wal writer process 45997 be/4 postgres 0.00 B/s 1918.73 K/s 0.00 % 1.60 % postgres: postgres postgres 127.0.0.1(41384) BIND 45983 be/4 postgres 0.00 B/s 1903.25 K/s 0.00 % 1.56 % postgres: postgres postgres 127.0.0.1(41356) UPDATE 45977 be/4 postgres 0.00 B/s 1829.75 K/s 0.00 % 1.54 % postgres: postgres postgres 127.0.0.1(41344) UPDATE 45984 be/4 postgres 0.00 B/s 1918.73 K/s 0.00 % 1.53 % postgres: postgres postgres 127.0.0.1(41358) UPDATE 45985 be/4 postgres 0.00 B/s 1616.99 K/s 0.00 % 1.50 % postgres: postgres postgres 127.0.0.1(41360) UPDATE 45986 be/4 postgres 0.00 B/s 1748.52 K/s 0.00 % 1.49 % postgres: postgres postgres 127.0.0.1(41362) UPDATE 45995 be/4 postgres 0.00 B/s 1616.99 K/s 0.00 % 1.47 % postgres: postgres postgres 127.0.0.1(41380) UPDATE 45988 be/4 postgres 0.00 B/s 1910.99 K/s 0.00 % 1.46 % postgres: postgres postgres 127.0.0.1(41366) UPDATE 45979 be/4 postgres 0.00 B/s 1763.99 K/s 0.00 % 1.46 % postgres: postgres postgres 127.0.0.1(41348) UPDATE 45976 be/4 postgres 0.00 B/s 1918.73 K/s 0.00 % 1.45 % postgres: postgres postgres 127.0.0.1(41342) UPDATE 45982 be/4 postgres 0.00 B/s 1887.78 K/s 0.00 % 1.43 % postgres: postgres postgres 127.0.0.1(41354) UPDATE 45987 be/4 postgres 0.00 B/s 2019.31 K/s 0.00 % 1.42 % postgres: postgres postgres 127.0.0.1(41364) UPDATE 45992 be/4 postgres 0.00 B/s 1616.99 K/s 0.00 % 1.40 % postgres: postgres postgres 127.0.0.1(41374) UPDATE 45994 be/4 postgres 0.00 B/s 1941.94 K/s 0.00 % 1.38 % postgres: postgres postgres 127.0.0.1(41378) UPDATE 45990 be/4 postgres 0.00 B/s 1794.94 K/s 0.00 % 1.36 % postgres: postgres postgres 127.0.0.1(41370) UPDATE 45975 be/4 postgres 0.00 B/s 1934.20 K/s 0.00 % 1.35 % postgres: postgres postgres 127.0.0.1(41340) UPDATE 45974 be/4 postgres 0.00 B/s 1910.99 K/s 0.00 % 1.35 % postgres: postgres postgres 127.0.0.1(41338) UPDATE 45980 be/4 postgres 0.00 B/s 2003.83 K/s 0.00 % 1.28 % postgres: postgres postgres 127.0.0.1(41350) UPDATE 45991 be/4 postgres 0.00 B/s 1748.52 K/s 0.00 % 1.13 % postgres: postgres postgres 127.0.0.1(41372) UPDATE 45996 be/4 postgres 0.00 B/s 2003.83 K/s 0.00 % 1.04 % postgres: postgres postgres 127.0.0.1(41382) UPDATE 45993 be/4 postgres 0.00 B/s 1918.73 K/s 0.00 % 1.02 % postgres: postgres postgres 127.0.0.1(41376) UPDATE 45972 be/4 postgres 0.00 B/s 1903.25 K/s 0.00 % 1.02 % postgres: postgres postgres 127.0.0.1(41334) UPDATE 45978 be/4 postgres 0.00 B/s 1740.78 K/s 0.00 % 1.02 % postgres: postgres postgres 127.0.0.1(41346) UPDATE 45998 be/4 postgres 0.00 B/s 1841.36 K/s 0.00 % 1.02 % postgres: postgres postgres 127.0.0.1(41386) UPDATE 45981 be/4 postgres 0.00 B/s 1818.15 K/s 0.00 % 1.01 % postgres: postgres postgres 127.0.0.1(41352) UPDATE 45989 be/4 postgres 0.00 B/s 1895.52 K/s 0.00 % 1.01 % postgres: postgres postgres 127.0.0.1(41368) UPDATE 45973 be/4 postgres 0.00 B/s 1941.94 K/s 0.00 % 0.99 % postgres: postgres postgres 127.0.0.1(41336) idle 45961 be/4 postgres 0.00 B/s 1872.31 K/s 0.00 % 0.96 % postgres: postgres postgres 127.0.0.1(41332) UPDATE 49039 be/4 postgres 0.00 B/s 4.37 M/s 0.00 % 0.00 % postgres: writer process 49036 be/4 postgres 0.00 B/s 3.87 K/s 0.00 % 0.00 % postgres: logger process
Using HOT technology, TPS has been upgraded from 53922 to 76157. IO resource consumption decreased from 192MB/s to 14MB/s.
Summary
HOT, secondary index and zheap storage engine can solve the problem of index enlargement introduced by update.
This paper introduces HOT, which must satisfy the following two conditions:
Necessary Condition A: UPDATE does not change any of the index keys Necessary Condition B: The new version should fit in the same old block – HOT chains can not cross block boundary. 1,The value of the index field remains unchanged.(If the value of any of these index fields changes, all indexes need to be updated.) 2,The new version is the same as the old one. HEAP PAGE Medium.
Reference resources
PostgreSQL 11 preview - Surjective indexes - index HOT enhancement (expression) update Evaluation
HOT Inside - The Technical Architecture
PostgreSQL 10.0 preview Performance Enhancement - Secondary Index
src/backend/access/heap/README.HOT