ClickHouse Advanced Optimize Explain View Execution Plan

Posted by Doug G on Wed, 22 Sep 2021 19:44:28 +0200

This article is about Learning Guide for Big Data Specialists from Zero (Full Upgrade) ClickHouse: Partial supplement.

0 Explain View Execution Plan

To view the execution plan of an sql statement before Clickhouse version 20.6, you need to set the log level to trace to be visible and can only actually execute sql and view it in the execution log. Native execution plan syntax was introduced in version 20.6. Version 20.6.3 became an official version of functionality.

This document is based on the newer stable version 21.7.3.14.

1 Basic Grammar

EXPLAIN [AST | SYNTAX | PLAN | PIPELINE] [setting = value, ...] 
SELECT ... [FORMAT ...]

PLAN: Used to view the execution plan, default value.

*◼header Prints the head er description of each step in the plan, off by default, with a default value of 0;

*◼Description Prints the description of each step in the plan, turned on by default, default value 1;

*◼actions prints the details of each step in the plan, turned off by default, with a default value of 0.

_AST: used to view the grammar tree;

_SYNTAX: used to optimize grammar;

_PIPELINE: Used to view PIPELINE plans.

*◼header prints the head er description of each step in the plan, turned off by default;

*◼The graph describes the pipeline graph in DOT graphics language. By default, it is closed. The related graph needs to be viewed with graphviz.

*◼ActionsIf graph is turned on, compact printing is turned on by default.

Note: PLAN and PPELINE can also have additional display settings, as shown in the parameters above.

2 Case Practices

2.1 New version uses EXPLAIN

You can install another version above 20.6, or you can test it directly on the official website by selecting a higher version.

Official Web Test Link:https://play.clickhouse.tech/?file=welcome

1) View PLAIN

Simple Query

explain plan select arrayJoin([1,2,3,null,null]);

Execution plan for complex SQL

explain select database,table,count(1) cnt from system.parts where 
database in ('datasets','system') group by database,table order by 
database,cnt desc limit 2 by database;

Open execution plan for all parameters

EXPLAIN header=1, actions=1,description=1 SELECT number from 
system.numbers limit 10;

2) AST Grammar Tree

EXPLAIN AST SELECT number from system.numbers limit 10;

3) SYNTAX syntax optimization

//Make a query first
SELECT number = 1 ? 'hello' : (number = 2 ? 'world' : 'atguigu') FROM 
numbers(10);
//View grammar optimization
EXPLAIN SYNTAX SELECT number = 1 ? 'hello' : (number = 2 ? 'world' : 
'atguigu') FROM numbers(10);
//Turn on ternary operator optimization
SET optimize_if_chain_to_multiif = 1;
//Review grammar optimization again
EXPLAIN SYNTAX SELECT number = 1 ? 'hello' : (number = 2 ? 'world' : 
'atguigu') FROM numbers(10);
//Return optimized statement
SELECT multiIf(number = 1, \'hello\', number = 2, \'world\', \'xyz\')
FROM numbers(10)

4) View PIPELINE

EXPLAIN PIPELINE SELECT sum(number) FROM numbers_mt(100000) GROUP BY number % 20; //Open Other Parameters

EXPLAIN PIPELINE header=1,graph=1 SELECT sum(number) FROM numbers_mt(10000) GROUP BY number%20;

2.2 Old Version View Execution Plan

clickhouse-client -h host name --send_logs_level=trace <<< "sql" > /dev/null

The send_logs_level parameter specifies that the log level is trace, <<<< Redirect the SQL statement to clickhouse-client for query, >/dev/null redirects the query result to empty device swallow for observation of the log.

Be careful:

1. By setting ClickHouse's service log to the DEBUG or TRACE level, EXPLAIN queries can be implemented in disguised form.

2. Executing a real SQL query is required for CH to print the plan log, so if the table has a large amount of data, it is best to use the LIMIT clause to reduce the amount of data returned by the query.

Topics: SQL clickhouse