PostgreSQL Multi-field Arbitrary Combination Search

Posted by JNorman on Fri, 06 Sep 2019 11:37:56 +0200

In real life, we may encounter such a scenario: when you open Taobao to buy things, you often face as many as cattle hair goods, which may be screened according to the conditions you want, and then slowly choose, for example, to choose a certain brand, price, color and so on.
At this point, you will encounter any combination of columns query, because you do not know what conditions the user will filter according to, you may say that I can not index all columns? It is indeed a method, but if there are hundreds of columns in a table, PostgreSQL provides us with several different kinds of columns. Method.

  • gin index (queries that support any combination of fields)
  • bloom Index (Equivalent Queries Supporting Any Read-Only Combination)
  • Each single column btree index (queries that support any combination of fields)

1. bloom Index
Bloom index interface is an index interface constructed by PostgreSQL based on Bloom filter (refer to this article if you don't know the algorithm of bloom filter: https://blog.csdn.net/hguisu/article/details/7866173), which belongs to lossy index and can converge the result set (excluding the results that absolutely do not meet the conditions, and then select the full result in the remaining results). As a result of sufficient conditions, a second check is needed. Bloom supports the equivalent query of any combination of columns.
bloom stores signatures. The larger the signature, the more space it consumes, but the exclusion is more accurate. There are both advantages and disadvantages.
Grammar:

CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)  
       WITH (length=80, col1=2, col2=2, col3=4);
//Signature length 80 bit, maximum allowable 4096 bits
col1 - col32,Specify each column separately bits,Default length 2, maximum allowable 4095 bits.
bill=# create table t1(c1 int, c2 int, c3 int, c4 int, c5 int);  
CREATE TABLE
bill=# 
bill=# create index idx_bloom on t1 using bloom(c1,c2,c3,c4,c5);
CREATE INDEX
bill=# explain select * from t1 where c1 =10; 
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=14.05..20.56 rows=8 width=20)
   Recheck Cond: (c1 = 10)
   ->  Bitmap Index Scan on idx_bloom  (cost=0.00..14.05 rows=8 width=0)
         Index Cond: (c1 = 10)
(4 rows)


bill=# explain select * from t1 where c1 =10 and c2 =20;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=18.30..19.62 rows=1 width=20)
   Recheck Cond: ((c1 = 10) AND (c2 = 20))
   ->  Bitmap Index Scan on idx_bloom  (cost=0.00..18.30 rows=1 width=0)
         Index Cond: ((c1 = 10) AND (c2 = 20))
(4 rows)
bill=# explain select * from t1 where c1 =10 and c2 =20 and c3 =30;     
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=22.55..23.87 rows=1 width=20)
   Recheck Cond: ((c1 = 10) AND (c2 = 20) AND (c3 = 30))
   ->  Bitmap Index Scan on idx_bloom  (cost=0.00..22.55 rows=1 width=0)
         Index Cond: ((c1 = 10) AND (c2 = 20) AND (c3 = 30))
(4 rows)

However, the effect of bloom filtering method is limited. It is recommended to observe and use it.

2. gin index

bill=# create index idx_gin on t1 using gin(c1,c2,c3,c4,c5);
CREATE INDEX
bill=# explain select * from t1 where c1 =10 or c2 =20;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=5.34..15.59 rows=17 width=20)
   Recheck Cond: ((c1 = 10) OR (c2 = 20))
   ->  BitmapOr  (cost=5.34..5.34 rows=17 width=0)
         ->  Bitmap Index Scan on idx_gin  (cost=0.00..2.66 rows=8 width=0)
               Index Cond: (c1 = 10)
         ->  Bitmap Index Scan on idx_gin  (cost=0.00..2.66 rows=8 width=0)
               Index Cond: (c2 = 20)
(7 rows)

bill=# explain select * from t1 where c1 =10 or c2 =20 and c3 =30;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=6.57..14.07 rows=9 width=20)
   Recheck Cond: ((c1 = 10) OR ((c2 = 20) AND (c3 = 30)))
   ->  BitmapOr  (cost=6.57..6.57 rows=9 width=0)
         ->  Bitmap Index Scan on idx_gin  (cost=0.00..2.66 rows=8 width=0)
               Index Cond: (c1 = 10)
         ->  Bitmap Index Scan on idx_gin  (cost=0.00..3.90 rows=1 width=0)
               Index Cond: ((c2 = 20) AND (c3 = 30))
(7 rows)

GIN index implements internal bitmap and or bitmap Or, which is actually equivalent to establishing a separate B-Tree index for each field. The composite index method of GIN is used. However, when the data volume is very large or the columns are very large, GIN index will be relatively large.
At the same time, GIN suggests using the features of fastupdate and delayed merging to speed up insertion, deletion and update operations.

3,multi-btree

bill=# create index idx_t11 on t1(c1);
CREATE INDEX
bill=# create index idx_t12 on t1(c2); 
CREATE INDEX
bill=# create index idx_t13 on t1(c3); 
CREATE INDEX
bill=# create index idx_t14 on t1(c4); 
CREATE INDEX
bill=# create index idx_t15 on t1(c5); 
CREATE INDEX
bill=# explain select * from t1 where  c1 = 10  and c2 =20;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=3.28..4.59 rows=1 width=20)
   Recheck Cond: ((c2 = 20) AND (c1 = 10))
   ->  BitmapAnd  (cost=3.28..3.28 rows=1 width=0)
         ->  Bitmap Index Scan on idx_t12  (cost=0.00..1.51 rows=8 width=0)
               Index Cond: (c2 = 20)
         ->  Bitmap Index Scan on idx_t11  (cost=0.00..1.51 rows=8 width=0)
               Index Cond: (c1 = 10)
(7 rows)

bill=# explain select * from t1 where  c1 = 10  and c2 =20 or c3 =30;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=4.79..12.29 rows=9 width=20)
   Recheck Cond: (((c2 = 20) AND (c1 = 10)) OR (c3 = 30))
   ->  BitmapOr  (cost=4.79..4.79 rows=9 width=0)
         ->  BitmapAnd  (cost=3.28..3.28 rows=1 width=0)
               ->  Bitmap Index Scan on idx_t12  (cost=0.00..1.51 rows=8 width=0)
                     Index Cond: (c2 = 20)
               ->  Bitmap Index Scan on idx_t11  (cost=0.00..1.51 rows=8 width=0)
                     Index Cond: (c1 = 10)
         ->  Bitmap Index Scan on idx_t13  (cost=0.00..1.51 rows=8 width=0)
               Index Cond: (c3 = 30)
(10 rows)

This method is also more convenient in the case of fewer columns.

In addition, the above three methods can be used without changing the table structure, but in practical application, we can also consider such as merging multiple columns into a single text, using PostgreSQL full-text retrieval function with gin index to query, which is also an optimization idea.

Topics: PostgreSQL