Optimization of Scalar Quantum Query

Posted by sonoton345 on Wed, 03 Jul 2019 22:09:26 +0200

First, let's understand scalar quantum queries: sub-queries before from after select are called scalar quantum queries. For example, select num1,cal,(select name from t2 where t2.id = t1.id)from t1.

The disadvantage of scalar quantum query is very obvious: the driver table is fixed by the appearance t1, and the result set value t2 returned by T1 gets the result. So if the T1 table is too large. This will cause a lot of performance problems [the data warehouse batch process should disable scalar quantum queries]

Today, this SQL has been running for five hours without any results... It's good to have to be patient. I usually wait up to ten minutes.

SELECT /*+ NO_USE_HASH(C,B)*/
 C.CUST_ACCT_NO,
 C.PRIM_ACCT,
 ACCOUNT_SYSTEM,
 CUSTOMER_TYPE,
 CUSTOMER_STATUS,
 CREATE_DT,
 HOME_BRANCH_NO,
 COMPANY_SIZE,
 NOTICE_IND,
 NOTICE_CUST_NO,
 STMT_FREQUENCY,
 STMT_CYCLE,
 STMT_DAY,
 ID_NO,
 ID_TYPE,
 SHORT_NAME,
 EMAIL_ADD1,
 EMAIL_ADD2,
 CREDIT_RANKING,
 TITLE_CODE,
 NAME1,
 ADD1,
 POSTCODE,
 PHONE_NO_RES,
 PHONE_RES_EXT,
 PHONE_NO_BUS,
 PHONE_BUS_EXT,
 FAX_NO,
 TELEX_NO,
 PCODE_RGSTER,
 REGSTR_ADD1,
 REGSTR_ADD2,
 PHONE_RGSTR_NO,
 PHONE_RGSTR_EXT,
 BIRTH_DATE_1,
 SEX_CODE,
 EMPLOYER_NAME,
 EMPLOYED_FROM,
 EMPLOYER_ADDR,
 OCCUP_DESCRIP,
 OCCUPATION_CODE,
 INCOME,
 INCOME_WMY,
 COMPANY_NO,
 BUSINESS_NO,
 LICENCE_NO,
 BOSS_NAME,
 BOSS_BDAY,
 BUS_RGSTR_DATE,
 CAPITAL_AMT,
 CONTACT_REL_1,
 PHONE_NO_1,
 ADD2,
 ADD3,
 ADD4,
 MOBILE_NO,
 FXSP_TYPE,
 INDUSTRY_CODE,
 BUS_SECTOR_CODE,
 CUST_SUB_TYPE,
 DEP_STMT_TYPE,
 ID_ISSUE_DATE,
 ID_EXP_DATE,
 REGISTRY_ADD,
 ID_ISSUE_PLAC,
 LST_MNT_DATE,
 B.BRANCH_NO
  FROM CUSM_T C
 INNER JOIN (SELECT
             DISTINCT CUSTOMER_NO,
                      (SELECT SJJGM
                         FROM JGDY H
                        WHERE H.JGM = CB_ACCT.BRANCH_NO) BRANCH_NO
               FROM CB_ACCT) B ON C.CUST_ACCT_NO = B.CUSTOMER_NO;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 2079508004
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |    18M|    14G|       |  1793K  (1)| 00:04:41 |
|*  1 |  INDEX SKIP SCAN              | JGDY_IDX3 |     1 |     8 |       |     1   (0)| 00:00:01 |
|   2 |  MERGE JOIN                   |           |    18M|    14G|       |  1793K  (1)| 00:04:41 |
|   3 |   SORT JOIN                   |           |    18M|   397M|       |   147K  (1)| 00:00:24 |
|   4 |    VIEW                       |           |    18M|   397M|       |   147K  (1)| 00:00:24 |
|   5 |     HASH UNIQUE               |           |    18M|   380M|  1107M|   147K  (1)| 00:00:24 |
|   6 |      TABLE ACCESS STORAGE FULL| CB_ACCT2  |    36M|   760M|       | 71431   (1)| 00:00:12 |
|*  7 |   SORT JOIN                   |           |    19M|    14G|    35G|  1645K  (1)| 00:04:18 |
|   8 |    TABLE ACCESS STORAGE FULL  | CUSM_T    |    19M|    14G|       |   306K  (1)| 00:00:48 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("H"."JGM"=:B1)
       filter("H"."JGM"=:B1)
   7 - access("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO")
       filter("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO")
Both SQL and PLAN can easily find scalar quantum queries.

1. Before from after select, of course, the SQL scalar is hidden in the inline view.


(SELECT
             DISTINCT CUSTOMER_NO,
                      (SELECT SJJGM
                         FROM JGDY H
                        WHERE H.JGM = CB_ACCT.BRANCH_NO) BRANCH_NO
               FROM CB_ACCT) B

2.PLAN's id=1 and 2 steps, indentation is the same, and there is no way to connect the father node

|*  1 |  INDEX SKIP SCAN              | JGDY_IDX3 |     1 |     8 |       |     1   (0)| 00:00:01 |
|   2 |  MERGE JOIN                   |           |    18M|    14G|       |  1793K  (1)| 00:04:41 |

From these two points, we can judge that there are scalars in SQL. If the SQL is too long, just look at PLAN directly.

Whether scalars cause performance problems depends on the number of rows returned from the main table (appearance). In fact, we all know that such data warehouse batch tables cannot be small. Symbolic Check


I have sent this script in my previous blog, and I can write it myself. It's very simple.

It can be seen that the number of rows in the table is very large and not suitable for scalar (> 10000 rows)

For scalar quantum queries, they can only be rewritten [scalar quantum queries can be rewritten equivalently to external links]

Of course, scalar rewriting here is very simple. Some complex scalar queries, such as aggregate classes, unequal values and tree queries, need to pay attention to whether the rewriting is equivalent before and after.

SELECT /*+ NO_USE_HASH(C,B)*/
 C.CUST_ACCT_NO,
 C.PRIM_ACCT,
 ACCOUNT_SYSTEM,
 CUSTOMER_TYPE,
 CUSTOMER_STATUS,
 CREATE_DT,
 HOME_BRANCH_NO,
 COMPANY_SIZE,
 NOTICE_IND,
 NOTICE_CUST_NO,
 STMT_FREQUENCY,
 STMT_CYCLE,
 STMT_DAY,
 ID_NO,
 ID_TYPE,
 SHORT_NAME,
 EMAIL_ADD1,
 EMAIL_ADD2,
 CREDIT_RANKING,
 TITLE_CODE,
 NAME1,
 ADD1,
 POSTCODE,
 PHONE_NO_RES,
 PHONE_RES_EXT,
 PHONE_NO_BUS,
 PHONE_BUS_EXT,
 FAX_NO,
 TELEX_NO,
 PCODE_RGSTER,
 REGSTR_ADD1,
 REGSTR_ADD2,
 PHONE_RGSTR_NO,
 PHONE_RGSTR_EXT,
 BIRTH_DATE_1,
 SEX_CODE,
 EMPLOYER_NAME,
 EMPLOYED_FROM,
 EMPLOYER_ADDR,
 OCCUP_DESCRIP,
 OCCUPATION_CODE,
 INCOME,
 INCOME_WMY,
 COMPANY_NO,
 BUSINESS_NO,
 LICENCE_NO,
 BOSS_NAME,
 BOSS_BDAY,
 BUS_RGSTR_DATE,
 CAPITAL_AMT,
 CONTACT_REL_1,
 PHONE_NO_1,
 ADD2,
 ADD3,
 ADD4,
 MOBILE_NO,
 FXSP_TYPE,
 INDUSTRY_CODE,
 BUS_SECTOR_CODE,
 CUST_SUB_TYPE,
 DEP_STMT_TYPE,
 ID_ISSUE_DATE,
 ID_EXP_DATE,
 REGISTRY_ADD,
 ID_ISSUE_PLAC,
 LST_MNT_DATE,
 B.BRANCH_NO
  FROM CUSM_T C
 INNER JOIN (SELECT DISTINCT CUSTOMER_NO,
                      sjjgm BRANCH_NO
               FROM CB_ACCT  LEFT JOIN jgdy  ON  cb_acct.branch_no=jgm
               ) B ON C.CUST_ACCT_NO = B.CUSTOMER_NO;
Plan hash value: 2285049241
 
----------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |    36M|    28G|       |  1834K  (1)| 00:04:47 |
|   1 |  MERGE JOIN                    |           |    36M|    28G|       |  1834K  (1)| 00:04:47 |
|   2 |   SORT JOIN                    |           |    36M|   829M|       |   188K  (1)| 00:00:30 |
|   3 |    VIEW                        |           |    36M|   829M|       |   188K  (1)| 00:00:30 |
|   4 |     HASH UNIQUE                |           |    36M|  1037M|  1384M|   188K  (1)| 00:00:30 |
|*  5 |      HASH JOIN RIGHT OUTER     |           |    36M|  1037M|       | 71501   (1)| 00:00:12 |
|   6 |       INDEX FULL SCAN          | JGDY_IDX3 |  1241 |  9928 |       |     1   (0)| 00:00:01 |
|   7 |       TABLE ACCESS STORAGE FULL| CB_ACCT2  |    36M|   760M|       | 71431   (1)| 00:00:12 |
|*  8 |   SORT JOIN                    |           |    19M|    14G|    35G|  1645K  (1)| 00:04:18 |
|   9 |    TABLE ACCESS STORAGE FULL   | CUSM_T    |    19M|    14G|       |   306K  (1)| 00:00:48 |
----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access("CB_ACCT"."BRANCH_NO"="JGM"(+))
   8 - access("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO")
       filter("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO")
After rewriting, the scalar disappears. SQL runs for 7 minutes to produce results. But there is no unequal connection in this SQL. It is obviously meaningless to go MERGE JOIN. Obviously HASH is the best choice

Has been unable to understand the meaning of /* + NO_USE_HASH(C,B)*/ above SQL, and finally developed the response that this HINT is to let SQL go nested loop, because NL is faster. I heard the reason, too!

Here I briefly talk about how to choose NL, HASH and SMJ in practical work.

Nested loops:
If the number of returned SQL statements is too large, it is usually wrong.
Look at the number of rows returned by the driver table. Generally, the number of rows returned cannot exceed 1w, preferably within 1k.
See if the linked column of the driven table is included in the index (must be included in the index)
See distinct, group by, sum () generally do not take nested loops (data volume is super many to group by) of course, data volume is small, you can also take NL.

Hash connections can only be used for equivalent connections

The only role of sorted merge joins: non-equivalence joins

After removing /* + NO_USE_HASH(C,B)*/ SQL runs for 30 seconds and the result comes out.

SELECT
 C.CUST_ACCT_NO,
 C.PRIM_ACCT,
 ACCOUNT_SYSTEM,
 CUSTOMER_TYPE,
 CUSTOMER_STATUS,
 CREATE_DT,
 HOME_BRANCH_NO,
 COMPANY_SIZE,
 NOTICE_IND,
 NOTICE_CUST_NO,
 STMT_FREQUENCY,
 STMT_CYCLE,
 STMT_DAY,
 ID_NO,
 ID_TYPE,
 SHORT_NAME,
 EMAIL_ADD1,
 EMAIL_ADD2,
 CREDIT_RANKING,
 TITLE_CODE,
 NAME1,
 ADD1,
 POSTCODE,
 PHONE_NO_RES,
 PHONE_RES_EXT,
 PHONE_NO_BUS,
 PHONE_BUS_EXT,
 FAX_NO,
 TELEX_NO,
 PCODE_RGSTER,
 REGSTR_ADD1,
 REGSTR_ADD2,
 PHONE_RGSTR_NO,
 PHONE_RGSTR_EXT,
 BIRTH_DATE_1,
 SEX_CODE,
 EMPLOYER_NAME,
 EMPLOYED_FROM,
 EMPLOYER_ADDR,
 OCCUP_DESCRIP,
 OCCUPATION_CODE,
 INCOME,
 INCOME_WMY,
 COMPANY_NO,
 BUSINESS_NO,
 LICENCE_NO,
 BOSS_NAME,
 BOSS_BDAY,
 BUS_RGSTR_DATE,
 CAPITAL_AMT,
 CONTACT_REL_1,
 PHONE_NO_1,
 ADD2,
 ADD3,
 ADD4,
 MOBILE_NO,
 FXSP_TYPE,
 INDUSTRY_CODE,
 BUS_SECTOR_CODE,
 CUST_SUB_TYPE,
 DEP_STMT_TYPE,
 ID_ISSUE_DATE,
 ID_EXP_DATE,
 REGISTRY_ADD,
 ID_ISSUE_PLAC,
 LST_MNT_DATE,
 B.BRANCH_NO
  FROM CUSM_T C
 INNER JOIN (SELECT DISTINCT CUSTOMER_NO,
                      sjjgm BRANCH_NO
               FROM CB_ACCT  LEFT JOIN jgdy  ON  cb_acct.branch_no=jgm
               ) B ON C.CUST_ACCT_NO = B.CUSTOMER_NO

Plan hash value: 967350049
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |    36M|    28G|       |  1059K  (1)| 00:02:46 |
|*  1 |  HASH JOIN                    |           |    36M|    28G|  1244M|  1059K  (1)| 00:02:46 |
|   2 |   VIEW                        |           |    36M|   829M|       |   188K  (1)| 00:00:30 |
|   3 |    HASH UNIQUE                |           |    36M|  1037M|  1384M|   188K  (1)| 00:00:30 |
|*  4 |     HASH JOIN RIGHT OUTER     |           |    36M|  1037M|       | 71501   (1)| 00:00:12 |
|   5 |      INDEX FULL SCAN          | JGDY_IDX3 |  1241 |  9928 |       |     1   (0)| 00:00:01 |
|   6 |      TABLE ACCESS STORAGE FULL| CB_ACCT2  |    36M|   760M|       | 71431   (1)| 00:00:12 |
|   7 |   TABLE ACCESS STORAGE FULL   | CUSM_T    |    19M|    14G|       |   306K  (1)| 00:00:48 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("C"."CUST_ACCT_NO"="B"."CUSTOMER_NO")
   4 - access("CB_ACCT"."BRANCH_NO"="JGM"(+))
In fact, this SQL can continue to optimize, the step of ID=5 INDEX FULL SCAN is read from a single block to a full table scan can be increased by 100 + times, plus the integrated machine itself full table scan optimization TABLE ACCESS STORAGE FULL. Promotion will be more!!!





Topics: SQL