Cloud native component notes - ClickHouse

Posted by Perry Mason on Tue, 04 Jan 2022 02:19:56 +0100

1, Introduction

1.1 INTRODUCTION

ClickHouse is a column storage database (DBMS) open source by Yandex of Russia in 2016. It is written in C + + language and is mainly used for online analysis and query processing (OLAP). It can generate analysis data reports in real time using SQL queries.

characteristic

Distributed columnar storage: faster statistical operation and easier data compression;
Diversification engine: plug in the table level storage engine. Different storage engines can be set according to different requirements of the table. At present, there are more than 20 engines including merge tree, log, interface and other four categories.
High throughput and write capability: ClickHouse adopts a structure similar to LSM tree, and compares data in the background regularly after data is written. Through the structure similar to LSM tree, ClickHouse is written in sequence by append during data import. After writing, the data segments cannot be changed. During background compaction, multiple segments are written back to disk after merge sort.
Data partitioning and thread level parallelism: ClickHouse divides the data into multiple partitions, and each partition is further divided into multiple index granularity, and then processes some of them through multiple CPU cores to realize parallel data processing. Under this design, a single Query can use all CPUs of the whole machine.

Usage scenario

ClickHouse has the advantage of aggregate query analysis of single wide table of large order of magnitude.

Most requests are for read access, not single point access, but range query or full table scan;
The data needs to be updated in large batches (greater than 1000 rows) rather than single lines; or there is no update operation;
When reading data, a large number of rows will be extracted from the database, but only a small number of columns will be used;
The table is "wide", that is, the table contains a large number of columns;
The query frequency is relatively low (usually hundreds or less queries per second per server);
For simple queries, a delay of about 50 milliseconds is allowed;
The value of the column is a relatively small value or short string (for example, only 60 bytes per URL);
High throughput is required when processing a single query (up to billions of rows per second per server);
No transaction is required;
Low data consistency requirements;
Each query will only query one large table, and the rest are small tables;
The query result is significantly smaller than the data source, that is, the data is filtered or aggregated, and the returned result does not exceed the memory size of a single server.

1.2 single machine deployment

reference resources: https://clickhouse.tech/docs/en/getting-started/install/#from-rpm-packages

A single machine can meet most learning scenarios, and high availability or distributed clusters need to be deployed for production (because it will add operational trouble).

sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/stable/x86_64
sudo yum install clickhouse-server clickhouse-client
sudo systemctl start clickhouse-server.service
clickhouse-client

[root@cloud-mn01 ~]# clickhouse-client 
ClickHouse client version 21.7.5.29 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 21.7.5 revision 54449.

cloud-mn01 :) show databases;

SHOW DATABASES

Query id: 7efd3129-2076-4eff-8bac-2a314abf7b78

┌─name────┐
│ default │
│ system  │
└─────────┘

2 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :) Bye.
[root@cloud-mn01 ~]# clickhouse-client --query "show databases"
default
system
[root@cloud-mn01 ~]#

1.3 high availability cluster

Copy write process

configuration file

/etc/clickhouse-server/config.xml

    <zookeeper>
        <node index="1">
            <host>hadoop102</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>hadoop103</host>
            <port>2181</port>
        </node>
        <node index="3">
            <host>hadoop104</host>
            <port>2181</port>
        </node>
    </zookeeper>

Build table

Replicas can only synchronize data, not table structures, so we need to manually create tables on each machine.

In ReplicatedMergeTree, the first parameter is the ZK of fragmentation_ Path is generally written in the format of: / clickhouse/table/{shard}/{table_name}. If there is only one partition, write 01. The second parameter is the replica name. The same fragment replica name cannot be the same.

CREATE TABLE t_order_rep2
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` Datetime
)
ENGINE = ReplicatedMergeTree('/clickhouse/table/01/t_order_rep', 'rep_101')
PARTITION BY toYYYYMMDD(create_time)
PRIMARY KEY id
ORDER BY (id, sku_id)

CREATE TABLE t_order_rep2
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` Datetime
)
ENGINE = ReplicatedMergeTree('/clickhouse/table/01/t_order_rep', 'rep_102')
PARTITION BY toYYYYMMDD(create_time)
PRIMARY KEY id
ORDER BY (id, sku_id)

1.4 distributed cluster

Cluster write process

Cluster read process

configuration file

/etc/clickhouse-server/config.xml

    <remote_servers>
        <gmall_cluster> <!-- Cluster name-->
            <shard> <!--The first slice of the cluster-->
                <internal_replication>true</internal_replication>
                <!--The first copy of the tile-->
                <replica>
                    <host>hadoop101</host>
                    <port>9000</port>
                </replica>
                <!--The second copy of the tile-->
                <replica>
                    <host>hadoop102</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard> <!--The second slice of the cluster-->
                <internal_replication>true</internal_replication>
                <replica> <!--The first copy of the tile-->
                    <host>hadoop103</host>
                    <port>9000</port>
                </replica>
                <replica> <!--The second copy of the tile-->
                    <host>hadoop104</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard> <!--The third slice of the cluster-->
                <internal_replication>true</internal_replication>
                <replica> <!--The first copy of the tile-->
                    <host>hadoop105</host>
                    <port>9000</port>
                </replica>
                <replica> <!--The second copy of the tile-->
                    <host>hadoop106</host>
                    <port>9000</port>
                </replica>
            </shard>
        </gmall_cluster>
    </remote_servers>
    
    <zookeeper>
        <node index="1">
            <host>hadoop102</host>
            <port>2181</port>
        </node>
        <node index="2">
            <host>hadoop103</host>
            <port>2181</port>
        </node>
        <node index="3">
            <host>hadoop104</host>
            <port>2181</port>
        </node>
    </zookeeper>
    
    <macros>
        <shard>01</shard> <!--Different machines put different pieces-->
        <replica>rep_1_1</replica> <!--The number of copies placed on different machines is different-->
    </macros>

Create table statement

data sheet

Only the data in its slice can be found in this table

CREATE TABLE st_order_mt ON CLUSTER gmall_cluster
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` Datetime
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/st_order_mt', '{replica}')
PARTITION BY toYYYYMMDD(create_time)
PRIMARY KEY id
ORDER BY (id, sku_id)

Distributed table

All data can be found in this table

CREATE TABLE st_order_mt_all2 ON CLUSTER gmall_cluster
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` Datetime
)
ENGINE = Distributed(gmall_cluster, default, st_order_mt, hiveHash(sku_id))

1.5 data type

Full data type documentation reference: https://clickhouse.tech/docs/en/sql-reference/data-types/#data_types

integer

type	Range
Int8	[-128 : 127]
Int16	[-32768 : 32767]
Int32	[-2147483648 : 2147483647]
Int64	[-9223372036854775808 : 9223372036854775807]
UInt8	[0 : 255]
UInt16	[0 : 65535]
UInt32	[0 : 4294967295]
UInt64	[0 : 18446744073709551615]

float

type	explain
Float32	float
Float64	double

It is recommended to store data as integers as much as possible, because floating-point calculations may cause rounding errors.

cloud-mn01 :) select 1.0 - 0.9;

SELECT 1. - 0.9

Query id: ec6a31cf-42df-418e-bcd5-63b3732ecb44

┌──────minus(1., 0.9)─┐
│ 0.09999999999999998 │
└─────────────────────┘

1 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :)

Boolean

There is no separate type to store Boolean values. UInt8 type can be used, and the value is limited to 0 or 1.

Decimal type

A signed floating-point number that maintains precision during addition, subtraction, and multiplication. For division, the least significant number is discarded (not rounded).

type	explain
Decimal32(s)	It is equivalent to Decimal(9-s,s), and the significant digits are 1 ~ 9 (s indicates the decimal digits)
Decimal64(s)	It is equivalent to Decimal(18-s,s), and the significant digits are 1 ~ 18
Decimal128(s)	It is equivalent to Decimal(38-s,s), and the significant digits are 1 ~ 38

character string

String

Strings can be of any length. It can contain any byte set, including empty bytes.

FixedString(N)

For a string of fixed length N, n must be a strictly positive natural number. When the server reads a string with a length less than N, the length of N bytes is reached by adding empty bytes at the end of the string. When the server reads a string with a length greater than N, an error message will be returned.

Enumeration type

Includes Enum8 and Enum16 types. Enum saves the correspondence of 'string' = integer.

# Use the - m option to support line breaks
[root@cloud-mn01 ~]# clickhouse-client -m
ClickHouse client version 21.7.5.29 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 21.7.5 revision 54449.

# establish
cloud-mn01 :) CREATE TABLE t_enum
:-] (
:-]  x Enum8('hello' = 1, 'world' = 2)
:-] )
:-] ENGINE = TinyLog;

CREATE TABLE t_enum
(
    `x` Enum8('hello' = 1, 'world' = 2)
)
ENGINE = TinyLog

Query id: b1bdb268-0cd1-4d1a-ad5a-59fc767bb85d

Ok.

0 rows in set. Elapsed: 0.008 sec. 

# insert
cloud-mn01 :) INSERT INTO t_enum VALUES ('hello'), ('world'), ('hello');

INSERT INTO t_enum VALUES

Query id: 16a4ae7c-20a8-4a2c-a4f3-0201823740ca

Ok.

3 rows in set. Elapsed: 0.002 sec. 

# View the number corresponding to the enum value
cloud-mn01 :) SELECT CAST(x, 'Int8') FROM t_enum;

SELECT CAST(x, 'Int8')
FROM t_enum

Query id: f9a69904-c5ef-4157-940b-bd171c040063

┌─CAST(x, 'Int8')─┐
│               1 │
│               2 │
│               1 │
└─────────────────┘

3 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :)

Time type

type	explain
Date	Accept the string of year month day, such as' December 16, 2019 '
Datetime	Accept the string of year month day hour: minute: second, such as' 2019-12-16 20:50:10 '
Datetime64	Accept year month day hour: minute: second Sub second string, such as' 2019-12-16 20:50:10.66 '

array

Array(T): an array of elements of type T.

# Create array mode 1, using the array function
cloud-mn01 :) SELECT array(1, 2) AS x, toTypeName(x);

SELECT
    [1, 2] AS x,
    toTypeName(x)

Query id: 30ac6d4c-854e-49b2-bc19-ed1529aa0dde

┌─x─────┬─toTypeName(array(1, 2))─┐
│ [1,2] │ Array(UInt8)            │
└───────┴─────────────────────────┘

1 rows in set. Elapsed: 0.002 sec. 

# How to create an array 2: use square brackets
cloud-mn01 :)  SELECT [1, 2] AS x, toTypeName(x);

SELECT
    [1, 2] AS x,
    toTypeName(x)

Query id: 9a6701df-9622-46a3-9a91-a0ad968f6f0a

┌─x─────┬─toTypeName([1, 2])─┐
│ [1,2] │ Array(UInt8)       │
└───────┴────────────────────┘

1 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :)

Null value

In order to distinguish between null values, clickhouse needs to store additional masks files, which will consume more space than normal values. Try to avoid the use of null values. You can store meaningless values on the business side, such as - 1 to identify null values.

2, Table engine

For a complete list of table engines, see: https://clickhouse.tech/docs/en/engines/table-engines/#table_engines

brief introduction

The table engine is a feature of ClickHouse. It can be said that the table engine determines how to store the data of the table.

Storage mode and location of data, where to write and where to read data.
Which queries are supported and how.
Concurrent data access.
Use of indexes, if any.
Whether multithreaded requests can be executed.
Data replication parameters.

2.1 Log

TinyLog

It is saved on disk in the form of column file, does not support indexing, and has no concurrency control. Generally, small tables that store a small amount of data play a limited role in the production environment. It can be used for practice and test at ordinary times.

cloud-mn01 :) show create table t_tinylog;

SHOW CREATE TABLE t_tinylog

Query id: 9f444ef0-6b2d-4cc7-af79-32e885db9c7a

┌─statement─────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.t_tinylog
(
    `id` String,
    `name` String
)
ENGINE = TinyLog │
└───────────────────────────────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :)

2.2 Integration

MySQL

The MySQL engine allows you to perform SELECT and INSERT queries on data that is stored on a remote MySQL server.

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause'])
SETTINGS
    [connection_pool_size=16, ]
    [connection_max_tries=3, ]
    [connection_auto_close=true ]
;

cloud-mn01 :) SHOW CREATE TABLE mysql;

SHOW CREATE TABLE mysql

Query id: 1d8f5ea0-0f46-4ad8-8033-aa96b8cdb2b1

┌─statement───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.mysql
(
    `name` String,
    `age` Int8
)
ENGINE = MySQL('127.0.0.1:3306', 'clickhouse', 'person', 'root', '')
SETTINGS connection_pool_size = 16, connection_max_tries = 3, connection_auto_close = 1 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :) SELECT * FROM mysql;

SELECT *
FROM mysql

Query id: 724bea04-d126-474b-b814-ab9162f41822

┌─name────┬─age─┐
│ rayslee │  18 │
└─────────┴─────┘

1 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :) INSERT INTO mysql VALUES('lily', 19);

INSERT INTO mysql VALUES

Query id: 27d72eaa-4c10-461f-ace4-ffab589493a4

Ok.

1 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :)

2.3 Special

Memory

In the memory engine, the data is directly saved in the memory in the original uncompressed form, and the data will disappear when the server restarts.
Read and write operations will not block each other, and indexes are not supported. Very, very high performance under simple query (more than 10G/s).
Generally, it is not used in many places. In addition to testing, it is used in scenarios where very high performance is required and the amount of data is not too large (the upper limit is about 100 million lines).

2.4 MergeTree

2.4.1 MergeTree

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
ORDER BY expr
[PARTITION BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr
    [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx' [, ...] ]
    [WHERE conditions]
    [GROUP BY key_expr [SET v1 = aggr_func(v1) [, v2 = aggr_func(v2) ...]] ] ]
[SETTINGS name=value, ...]

cloud-mn01 :) SHOW CREATE TABLE t_order_mt;

SHOW CREATE TABLE t_order_mt

Query id: e8257846-ed21-40b8-854f-e91ecb6f5a02

┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.t_order_mt
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` DateTime
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(create_time)
PRIMARY KEY id
ORDER BY (id, sku_id)
SETTINGS index_granularity = 8192 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.004 sec. 

cloud-mn01 :)

PARTITION BY

The main purpose of partitioning is to reduce the scanning range and optimize the query speed;
After partitioning, ClickHouse will process query statistics involving cross partitions in parallel.
Data writing and partition merging: data writing in any batch will generate a temporary partition and will not be included in any existing partition. write in
ClickHouse will automatically perform the merge operation at a certain time (about 10-15 minutes later);

cloud-mn01 :) insert into t_order_mt values
:-] (101,'sku_001',1000.00,'2020-06-01 12:00:00') ,
:-] (102,'sku_002',2000.00,'2020-06-01 11:00:00'),
:-] (102,'sku_004',2500.00,'2020-06-01 12:00:00'),
:-] (102,'sku_002',2000.00,'2020-06-01 13:00:00'),
:-] (102,'sku_002',12000.00,'2020-06-01 13:00:00'),
:-] (102,'sku_002',600.00,'2020-06-02 12:00:00');

INSERT INTO t_order_mt VALUES

Query id: 60cdb8e5-3f91-4a42-9d2c-14d93698b23e

Ok.

6 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :) select * from t_order_mt;

SELECT *
FROM t_order_mt

Query id: 96fb5d88-18a8-4bf5-89df-3e87cfaa301f

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 102 │ sku_002 │       600.00 │ 2020-06-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 101 │ sku_001 │      1000.00 │ 2020-06-01 12:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 11:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_002 │     12000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_004 │      2500.00 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

6 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :) insert into t_order_mt values
:-] (101,'sku_001',1000.00,'2020-06-01 12:00:00') ,
:-] (102,'sku_002',2000.00,'2020-06-01 11:00:00'),
:-] (102,'sku_004',2500.00,'2020-06-01 12:00:00'),
:-] (102,'sku_002',2000.00,'2020-06-01 13:00:00'),
:-] (102,'sku_002',12000.00,'2020-06-01 13:00:00'),
:-] (102,'sku_002',600.00,'2020-06-02 12:00:00');

INSERT INTO t_order_mt VALUES

Query id: e58fb4bf-2d69-40e6-857e-d91b049a974d

Ok.

6 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :) select * from t_order_mt;

SELECT *
FROM t_order_mt

Query id: 98b097f3-489c-4cc7-81ff-d1fe77e21a50

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 102 │ sku_002 │       600.00 │ 2020-06-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 102 │ sku_002 │       600.00 │ 2020-06-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 101 │ sku_001 │      1000.00 │ 2020-06-01 12:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 11:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_002 │     12000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_004 │      2500.00 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 101 │ sku_001 │      1000.00 │ 2020-06-01 12:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 11:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_002 │     12000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_004 │      2500.00 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

12 rows in set. Elapsed: 0.004 sec. 

cloud-mn01 :) optimize table t_order_mt final;

OPTIMIZE TABLE t_order_mt FINAL

Query id: 6c242f60-e096-4293-b936-9df17763ffb1

Ok.

0 rows in set. Elapsed: 0.005 sec. 

cloud-mn01 :) select * from t_order_mt;

SELECT *
FROM t_order_mt

Query id: 670d8871-bec7-43d4-9141-1347602ed24a

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 102 │ sku_002 │       600.00 │ 2020-06-02 12:00:00 │
│ 102 │ sku_002 │       600.00 │ 2020-06-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 101 │ sku_001 │      1000.00 │ 2020-06-01 12:00:00 │
│ 101 │ sku_001 │      1000.00 │ 2020-06-01 12:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 11:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_002 │     12000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 11:00:00 │
│ 102 │ sku_002 │      2000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_002 │     12000.00 │ 2020-06-01 13:00:00 │
│ 102 │ sku_004 │      2500.00 │ 2020-06-01 12:00:00 │
│ 102 │ sku_004 │      2500.00 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

12 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :)

# Example 20200602_2_4_1 = PartitionId_MinBlockNum_MaxBlockNum_Level
# =>PartitionId
#     Partition value
# =>MinBlockNum
#     Minimum partition block number, self increasing type, starting from 1 and increasing upward. Each new directory partition is incremented by a number.
# =>MaxBlockNum
#     The maximum partition block number. The newly created partition MinBlockNum is equal to the number of MaxBlockNum.
# =>Level
#     The level of consolidation and the number of times to be consolidated. The more merging times, the higher the level value.
[root@cloud-mn01 t_order_mt]# pwd
/var/lib/clickhouse/data/default/t_order_mt
[root@cloud-mn01 t_order_mt]# ll
total 4
drwxr-x--- 2 clickhouse clickhouse 203 Aug  1 12:32 20200601_1_1_0
drwxr-x--- 2 clickhouse clickhouse 203 Aug  1 12:34 20200601_1_3_1
drwxr-x--- 2 clickhouse clickhouse 203 Aug  1 12:33 20200601_3_3_0
drwxr-x--- 2 clickhouse clickhouse 203 Aug  1 12:32 20200602_2_2_0
drwxr-x--- 2 clickhouse clickhouse 203 Aug  1 12:34 20200602_2_4_1
drwxr-x--- 2 clickhouse clickhouse 203 Aug  1 12:33 20200602_4_4_0
drwxr-x--- 2 clickhouse clickhouse   6 Aug  1 12:29 detached
-rw-r----- 1 clickhouse clickhouse   1 Aug  1 12:29 format_version.txt
[root@cloud-mn01 t_order_mt]#

PRIMARY KEY

The primary key in ClickHouse is different from other databases. It only provides the primary index of data, but it is not the only constraint. This means that data with the same primary key can exist.

index_granularity

Index granularity refers to the interval between two adjacent indexes in a sparse index.
MergeTree in ClickHouse defaults to 8192. Officials do not recommend modifying this value unless there are a large number of duplicate values in this column. For example, tens of thousands of rows in a partition have different data.
The advantage of sparse index is that it can locate more data with less index data. The cost is that it can only locate the first row of index granularity, and then scan a little.

ORDER BY

order by sets the order in which the data in the partition is saved in order.
Order by is the only required item in MergeTree, which is even more important than the primary key, because when the user does not set the primary key, many processes will be processed according to the field of order by.
The primary key must be a prefix field to the order by field. For example, if the order by field is (id,sku_id), the primary key must be ID or (id,sku_id).

Secondary index

Secondary indexes can work for queries with non primary key fields.

GRANULARITY N is the granularity of the secondary index to the primary index, that is, n * index_ The granularity row generates an index.

cloud-mn01 :) SHOW CREATE TABLE t_order_mt2;

SHOW CREATE TABLE t_order_mt2

Query id: d23601cf-3f9a-44f6-bde1-51735d7c31a4

┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.t_order_mt2
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` DateTime,
    INDEX a total_amount TYPE minmax GRANULARITY 5
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(create_time)
PRIMARY KEY id
ORDER BY (id, sku_id)
SETTINGS index_granularity = 8192 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :)

TTL is Time To Live. MergeTree provides the function of managing the life cycle of data tables or columns.

Column level

# Specify when creating
CREATE TABLE example_table
(
    d DateTime,
    a Int TTL d + INTERVAL 1 MONTH,
    b Int TTL d + INTERVAL 1 MONTH,
    c String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d;

# Specify after creation
ALTER TABLE example_table
    MODIFY COLUMN
    c String TTL d + INTERVAL 1 DAY;

Table level

# Specify when creating
CREATE TABLE example_table
(
    d DateTime,
    a Int
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d
TTL d + INTERVAL 1 MONTH [DELETE],
    d + INTERVAL 1 WEEK TO VOLUME 'aaa',
    d + INTERVAL 2 WEEK TO DISK 'bbb';

# Specify after creation
ALTER TABLE example_table
    MODIFY TTL d + INTERVAL 1 DAY;

2.4.2 ReplacingMergeTree

brief introduction

Replaceingmergetree is a variant of MergeTree. Its storage features fully inherit from MergeTree, but it has an additional partition de duplication function.
Data De duplication only occurs during the first batch insertion and partition consolidation. The merger will take place in the background at an unknown time, so you can't plan in advance.
If the table has been partitioned, de duplication will only be performed within the partition, and cross partition de duplication cannot be performed.

create table t_order_rmt(
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2) ,
 create_time Datetime 
) engine =ReplacingMergeTree(create_time)
 partition by toYYYYMMDD(create_time)
 primary key (id)
 order by (id, sku_id);

The parameter filled in by replaceingmergetree() is the version field, and the maximum value of the duplicate data retention version field.
If you do not fill in the version field, the last one is reserved by default according to the insertion order.
Use the order by field as the unique key.

2.4.3 SummingMergeTree

SummingMergeTree, engine for partition "pre aggregation"

create table t_order_smt(
 id UInt32,
 sku_id String,
 total_amount Decimal(16,2) ,
 create_time Datetime 
) engine =SummingMergeTree(total_amount)
 partition by toYYYYMMDD(create_time)
 primary key (id)
 order by (id,sku_id );

Take the column specified in SummingMergeTree() as the summary data column
You can fill in multiple columns, which must be numeric columns. If not, all fields that are non dimension columns and numeric columns are summarized data columns
The column of order by shall prevail as the dimension column
The other columns retain the first row in the insertion order
Data not in a partition will not be aggregated
Aggregation occurs only when the same batch is inserted (new version) or piecemeal merged

3, SQL operation

DDL data definition language
DCL data control language
DML data manipulation language
DQL data query language

3.1 DML

brief introduction

ClickHouse provides the ability of Delete and Update. This kind of operation is called Mutation query, which can be regarded as a kind of Alter.
Although modification and deletion can be realized, unlike ordinary OLTP databases, the Mutation statement is a very "heavy" operation and does not support transactions.
The main reason for "duplicate" is that each modification or deletion will result in abandoning the original partition of the target data and rebuilding a new partition. Therefore, try to make batch changes instead of frequent small data operations.
Because the operation is "heavy", the Mutation statement is executed in two steps. The part of synchronous execution is actually to add data, add partitions and mark the old partitions with logical invalidation marks. The old data will not be deleted until the partition merge is triggered to free up disk space.

delete

alter table t_order_smt delete where sku_id ='sku_001';

to update

alter table t_order_smt update total_amount=toDecimal32(2000.00,2) where id = 102;

3.2 DQL

Group By

GROUP BY adds with rollup\with cube\with total to calculate subtotals and totals.

cloud-mn01 :) select * from t_order_mt;

SELECT *
FROM t_order_mt

Query id: be6ba693-9ad0-45e9-9b05-9cb0aeba1213

┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 105 │ sku_003 │       600.00 │ 2020-06-02 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 101 │ sku_001 │      1000.00 │ 2020-06-01 12:00:00 │
│ 101 │ sku_002 │      2000.00 │ 2020-06-01 12:00:00 │
│ 103 │ sku_004 │      2500.00 │ 2020-06-01 12:00:00 │
│ 104 │ sku_002 │      2000.00 │ 2020-06-01 12:00:00 │
│ 110 │ sku_003 │       600.00 │ 2020-06-01 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘
┌──id─┬─sku_id──┬─total_amount─┬─────────create_time─┐
│ 106 │ sku_001 │      1000.00 │ 2020-06-04 12:00:00 │
│ 107 │ sku_002 │      2000.00 │ 2020-06-04 12:00:00 │
│ 108 │ sku_004 │      2500.00 │ 2020-06-04 12:00:00 │
│ 109 │ sku_002 │      2000.00 │ 2020-06-04 12:00:00 │
└─────┴─────────┴──────────────┴─────────────────────┘

10 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :)

with rollup: remove dimensions from right to left for subtotal

cloud-mn01 :) select id , sku_id,sum(total_amount) from t_order_mt group by 
:-] id,sku_id with rollup;

SELECT
    id,
    sku_id,
    sum(total_amount)
FROM t_order_mt
GROUP BY
    id,
    sku_id
    WITH ROLLUP

Query id: a43fba98-5bcf-4bf2-96d1-cf2a41b2891d

┌──id─┬─sku_id──┬─sum(total_amount)─┐
│ 110 │ sku_003 │            600.00 │
│ 109 │ sku_002 │           2000.00 │
│ 107 │ sku_002 │           2000.00 │
│ 106 │ sku_001 │           1000.00 │
│ 104 │ sku_002 │           2000.00 │
│ 101 │ sku_002 │           2000.00 │
│ 103 │ sku_004 │           2500.00 │
│ 108 │ sku_004 │           2500.00 │
│ 105 │ sku_003 │            600.00 │
│ 101 │ sku_001 │           1000.00 │
└─────┴─────────┴───────────────────┘
┌──id─┬─sku_id─┬─sum(total_amount)─┐
│ 110 │        │            600.00 │
│ 106 │        │           1000.00 │
│ 105 │        │            600.00 │
│ 109 │        │           2000.00 │
│ 107 │        │           2000.00 │
│ 104 │        │           2000.00 │
│ 103 │        │           2500.00 │
│ 108 │        │           2500.00 │
│ 101 │        │           3000.00 │
└─────┴────────┴───────────────────┘
┌─id─┬─sku_id─┬─sum(total_amount)─┐
│  0 │        │          16200.00 │
└────┴────────┴───────────────────┘

20 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :)

with cube: remove the dimension from right to left for subtotal, and then remove the dimension from left to right for subtotal

cloud-mn01 :) select id , sku_id,sum(total_amount) from t_order_mt group by 
:-] id,sku_id with cube;

SELECT
    id,
    sku_id,
    sum(total_amount)
FROM t_order_mt
GROUP BY
    id,
    sku_id
    WITH CUBE

Query id: df7b72c9-5e06-4ddf-a8f1-5ad1e3e195ad

┌──id─┬─sku_id──┬─sum(total_amount)─┐
│ 110 │ sku_003 │            600.00 │
│ 109 │ sku_002 │           2000.00 │
│ 107 │ sku_002 │           2000.00 │
│ 106 │ sku_001 │           1000.00 │
│ 104 │ sku_002 │           2000.00 │
│ 101 │ sku_002 │           2000.00 │
│ 103 │ sku_004 │           2500.00 │
│ 108 │ sku_004 │           2500.00 │
│ 105 │ sku_003 │            600.00 │
│ 101 │ sku_001 │           1000.00 │
└─────┴─────────┴───────────────────┘
┌──id─┬─sku_id─┬─sum(total_amount)─┐
│ 110 │        │            600.00 │
│ 106 │        │           1000.00 │
│ 105 │        │            600.00 │
│ 109 │        │           2000.00 │
│ 107 │        │           2000.00 │
│ 104 │        │           2000.00 │
│ 103 │        │           2500.00 │
│ 108 │        │           2500.00 │
│ 101 │        │           3000.00 │
└─────┴────────┴───────────────────┘
┌─id─┬─sku_id──┬─sum(total_amount)─┐
│  0 │ sku_003 │           1200.00 │
│  0 │ sku_004 │           5000.00 │
│  0 │ sku_001 │           2000.00 │
│  0 │ sku_002 │           8000.00 │
└────┴─────────┴───────────────────┘
┌─id─┬─sku_id─┬─sum(total_amount)─┐
│  0 │        │          16200.00 │
└────┴────────┴───────────────────┘

24 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :)

with totals: calculate totals only

cloud-mn01 :) select id , sku_id,sum(total_amount) from t_order_mt group by 
:-] id,sku_id with totals;

SELECT
    id,
    sku_id,
    sum(total_amount)
FROM t_order_mt
GROUP BY
    id,
    sku_id
    WITH TOTALS

Query id: 0e5d8a29-253e-4f5d-90a1-73c81122837f

┌──id─┬─sku_id──┬─sum(total_amount)─┐
│ 110 │ sku_003 │            600.00 │
│ 109 │ sku_002 │           2000.00 │
│ 107 │ sku_002 │           2000.00 │
│ 106 │ sku_001 │           1000.00 │
│ 104 │ sku_002 │           2000.00 │
│ 101 │ sku_002 │           2000.00 │
│ 103 │ sku_004 │           2500.00 │
│ 108 │ sku_004 │           2500.00 │
│ 105 │ sku_003 │            600.00 │
│ 101 │ sku_001 │           1000.00 │
└─────┴─────────┴───────────────────┘

Totals:
┌─id─┬─sku_id─┬─sum(total_amount)─┐
│  0 │        │          16200.00 │
└────┴────────┴───────────────────┘

10 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :)

Data export

clickhouse-client --query "select * from t_order_mt where create_time='2020-06-01 12:00:00'" --format CSVWithNames> /opt/module/data/rs1.csv

3.3 DDL

New field

alter table tableName add column newcolname String after col1;

Modify field

alter table tableName modify column newcolname String;

Delete field

alter table tableName drop column newcolname;

4, Optimize

4.1 table creation optimization

4.1. 1 data type

Time field

When creating a table, do not use strings for fields that can be represented by numeric type or date time type;
Although the ClickHouse underlying layer stores DateTime as the timestamp Long type, it is not recommended to store the Long type;
DateTime does not need function conversion, with high execution efficiency and good readability.

CREATE TABLE t_type2
(
    `id` UInt32,
    `sku_id` String,
    `total_amount` Decimal(16, 2),
    `create_time` Int32
)
ENGINE = ReplacingMergeTree(create_time)
PARTITION BY toYYYYMMDD(toDate(create_time)) --It needs to be converted once, otherwise an error will be reported
PRIMARY KEY id
ORDER BY (id, sku_id)

Null value

Nullable types almost always drag performance, because when storing nullable columns, an additional file needs to be created to store NULL tags, and nullable columns cannot be indexed;
Unless it is very special, you should directly use the default value of the field to indicate null, or specify a value that is meaningless in the business;

CREATE TABLE t_null
(
    `x` Int8,
    `y` Nullable(Int8)
)
ENGINE = TinyLog

[root@cloud-mn01 t_null]# pwd
/var/lib/clickhouse/data/default/t_null
[root@cloud-mn01 t_null]# 
[root@cloud-mn01 t_null]# ll
total 16
-rw-r----- 1 clickhouse clickhouse 91 Aug 10 09:21 sizes.json
-rw-r----- 1 clickhouse clickhouse 28 Aug 10 09:21 x.bin
-rw-r----- 1 clickhouse clickhouse 28 Aug 10 09:21 y.bin
-rw-r----- 1 clickhouse clickhouse 28 Aug 10 09:21 y.null.bin
[root@cloud-mn01 t_null]#

4.1. 2 partitions and indexes

The partition granularity is determined according to the business characteristics and should not be too coarse or too fine. Generally, it is divided by day, or it can be specified as Tuple(). Taking 100 million data in a single table as an example, it is best to control the partition size at 10-30;
The index column must be specified. The index column in ClickHouse is the row sequence. It is specified by order by. Generally, the attributes often used as filter criteria in query criteria are included;
It can be a single dimension or an index of combined dimensions; Generally, it is required to meet the principle of high-level column first and high query frequency first;
In addition, those with a large base are not suitable for index columns, such as the userid field of the user table; Generally, it is the best if the filtered data is within one million.

-- official hits_v1
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))

-- official visits_v1
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))

4.1. Table 3 parameters

Index_granularity is used to control the index granularity. The default value is 8192. Adjustment is not recommended unless necessary.
If it is not necessary to keep the full amount of historical data in the table, it is recommended to specify TTL (survival time value);

4.1. 4 write and delete

principle

Try not to perform single or small batch deletion and insertion operations, which will produce small partition files and bring great pressure to the background Merge task;
Do not write too many partitions at a time, or write data too fast. Too fast data writing will cause the Merge speed to fail to keep up and report an error;
It is generally recommended to initiate 2-3 write operations every second, and write 2w~5w pieces of data each time (depending on the server performance).

Write too fast

-- use WAL Pre write log to improve write performance( in_memory_parts_enable_wal Default to true). 
1. Code: 252, e.displayText() = DB::Exception: Too many parts(304). 
Merges are processing significantly slower than inserts

--   When the server memory is abundant, the memory quota is increased through max_memory_usage To achieve
--   When the server memory is insufficient, it is recommended to allocate the excess content to the system hard disk, but the execution speed will be reduced. Generally, it is through 
-- max_bytes_before_external_group_by,max_bytes_before_external_sort Parameters.
2. Code: 241, e.displayText() = DB::Exception: Memory limit (for query) 
exceeded:would use 9.37 GiB (attempt to allocate chunk of 301989888 
bytes), maximum: 9.31 GiB

4.1. 5 common configurations

to configure	describe
background_pool_size	The size of the background thread pool. Merge threads are executed in this thread pool. This thread pool is not only used for merge threads. The default value is 16. If allowed, it is recommended to change it to twice the number of CPUs (number of threads).
background_schedule_pool_size	The number of threads executing background tasks (replication table, Kafka stream, DNS cache update). The default is 128. It is recommended to change it to twice the number of CPUs (number of threads).
background_distributed_schedule_pool_size	It is set as the number of threads for distributed sending and executing background tasks. The default is 16. It is recommended to change it to twice the number of CPUs (threads).
max_concurrent_queries	The maximum number of concurrent requests (including select,insert, etc.), the default value is 100, and it is recommended to add 150 ~ 300.
max_threads	Set the maximum number of CPUs that can be used by a single query. The default is the number of cpu cores
max_memory_usage	This parameter is in users In XML, it indicates the maximum memory occupied by a single Query. This value can be set to be large, which can increase the upper limit of cluster Query. Reserve a bit for the OS, such as a machine with 128G memory, and set it to 100GB.
max_bytes_before_external_group_by	Generally according to Max_ memory_ Half of usage sets the memory. When the memory used by the group exceeds the threshold, it will be refreshed to the disk for processing. Because clickhouse aggregation is divided into two stages: querying and creating intermediate data and merging intermediate data. Combined with the previous item, 50GB is recommended.
max_bytes_before_external_sort	When order by has used max_bytes_before_external_sort memory overflows the disk (based on disk sorting). If this value is not set, errors will be thrown directly when there is not enough memory. If this value is set, order by can be completed normally, but the speed must be slow compared with the memory (the measured speed is very slow and unacceptable).
max_table_size_to_drop	This parameter is in config XML, which is used when tables or partitions need to be deleted. The default is 50GB, which means that deleting partitioned tables above 50GB will fail. It is recommended to change it to 0, so that no matter how large the partition table is, it can be deleted.

4.2 syntax optimization

4.2.1 EXPLAIN

EXPLAIN

EXPLAIN [AST | SYNTAX | PLAN | PIPELINE] [setting = value, ...] SELECT ... [FORMAT ...]

PLAN: used to view the execution PLAN. The default value is.
1.1 header print the head description of each step in the plan. It is off by default and 0 by default;
1.2 description print the description of each step in the plan. It is on by default, and the default value is 1;
1.3 actions print the details of each step in the plan. It is off by default and 0 by default.
AST: used to view the syntax tree;
SYNTAX: used to optimize SYNTAX;
PIPELINE: used to view PIPELINE plan.
4.1 header print the head description of each step in the plan. It is closed by default;
4.2 the graph describes the pipeline diagram in DOT graphic language. It is closed by default. Relevant graphics need to be viewed in cooperation with graphviz;
4.3 actions if graph is enabled, compact printing is enabled by default.

EXPLAIN
SELECT number
FROM system.numbers
LIMIT 10

Query id: bbc68c47-0848-4219-ae39-ba0a744df1dd

┌─explain───────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                               │
│   SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│     Limit (preliminary LIMIT)                                             │
│       ReadFromStorage (SystemNumbers)                                     │
└───────────────────────────────────────────────────────────────────────────┘

EXPLAIN header = 1, actions = 1, description = 1
SELECT number
FROM system.numbers
LIMIT 10

Query id: 4d53ac26-5d3e-4217-9ee2-bd798413c1f6

┌─explain───────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                               │
│ Header: number UInt64                                                     │
│ Actions: INPUT :: 0 -> number UInt64 : 0                                  │
│ Positions: 0                                                              │
│   SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│   Header: number UInt64                                                   │
│     Limit (preliminary LIMIT)                                             │
│     Header: number UInt64                                                 │
│     Limit 10                                                              │
│     Offset 0                                                              │
│       ReadFromStorage (SystemNumbers)                                     │
│       Header: number UInt64                                               │
└───────────────────────────────────────────────────────────────────────────┘

SYNTAX syntax optimization

-- Ternary operator optimization needs to be turned on

SET optimize_if_chain_to_multiif = 1;

EXPLAIN SYNTAX
SELECT if(number = 1, 'hello', if(number = 2, 'world', 'atguigu'))
FROM numbers(10)

Query id: d4bd3df8-6a70-4831-8c19-cdfc0ed9da25

┌─explain─────────────────────────────────────────────────────────────┐
│ SELECT multiIf(number = 1, 'hello', number = 2, 'world', 'atguigu') │
│ FROM numbers(10)                                                    │
└─────────────────────────────────────────────────────────────────────┘

4.2. 2 optimization rules

Prepare test data

[root@cloud-mn01 ~]# ll
total 1792572
-rw-r--r-- 1 root root 1271623680 Aug 10 09:58 hits_v1.tar
-rw-r--r-- 1 root root  563968000 Aug 10 09:58 visits_v1.tar
[root@cloud-mn01 ~]# tar -xf hits_v1.tar -C /var/lib/clickhouse # hits_ The V1 table has more than 130 fields and more than 8.8 million pieces of data
[root@cloud-mn01 ~]# tar -xf visits_v1.tar -C /var/lib/clickhouse # visits_ The V1 table has more than 180 fields and more than 1.6 million pieces of data
[root@cloud-mn01 ~]# systemctl restart clickhouse-server.service 
[root@cloud-mn01 ~]#

COUNT optimization

When you call the count function, if you use count() or count(*) and there is no where condition, you will directly use system Total of tables_ rows

cloud-mn01 :) EXPLAIN SELECT count()FROM datasets.hits_v1;

EXPLAIN
SELECT count()
FROM datasets.hits_v1

Query id: 6ac410cd-81f1-4d96-bd35-2d45bbfb276d

┌─explain──────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))          │
│   MergingAggregated                                  │
│     ReadFromPreparedSource (Optimized trivial count) │
└──────────────────────────────────────────────────────┘

3 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :)

If count is a specific column field, this optimization will not be used

cloud-mn01 :) EXPLAIN SELECT count(CounterID)FROM datasets.hits_v1;

EXPLAIN
SELECT count(CounterID)
FROM datasets.hits_v1

Query id: be5182bb-4808-4e8f-a395-1fedd1bdd0be

┌─explain───────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                   │
│   Aggregating                                                                 │
│     Expression (Before GROUP BY)                                              │
│       SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│         ReadFromMergeTree                                                     │
└───────────────────────────────────────────────────────────────────────────────┘

5 rows in set. Elapsed: 0.002 sec. 

cloud-mn01 :)

Predicate push down

When group by has a having clause but is not modified with cube, with rollback or with totals, the having filter will be pushed down to where to filter in advance.

EXPLAIN SYNTAX
SELECT UserID
FROM datasets.hits_v1
GROUP BY UserID
HAVING UserID = '8585742290196126178'

Query id: 1ecd2b6b-6a0d-400a-aca1-3e9eac0b7874

┌─explain──────────────────────────────┐
│ SELECT UserID                        │
│ FROM datasets.hits_v1                │
│ WHERE UserID = '8585742290196126178' │
│ GROUP BY UserID                      │
└──────────────────────────────────────┘

Subqueries also support predicate push down

EXPLAIN SYNTAX
SELECT *
FROM
(
    SELECT UserID
    FROM datasets.visits_v1
)
WHERE UserID = '8585742290196126178'

Query id: 4cece210-36f7-45c3-95a3-acb75a72ad09

┌─explain──────────────────────────────────┐
│ SELECT UserID                            │
│ FROM                                     │
│ (                                        │
│     SELECT UserID                        │
│     FROM datasets.visits_v1              │
│     WHERE UserID = '8585742290196126178' │
│ )                                        │
│ WHERE UserID = '8585742290196126178'     │
└──────────────────────────────────────────┘

Aggregate calculation extrapolation

EXPLAIN SYNTAX
SELECT sum(UserID * 2)
FROM datasets.visits_v1

Query id: bf6d9094-9c2a-4c70-acc4-574dee713e9a

┌─explain─────────────────┐
│ SELECT sum(UserID) * 2  │
│ FROM datasets.visits_v1 │
└─────────────────────────┘

Aggregate function elimination

If min, max and any aggregate functions are used for the aggregate key, that is, group by key, the function is eliminated

EXPLAIN SYNTAX
SELECT
    sum(UserID * 2),
    max(VisitID),
    max(UserID)
FROM visits_v1
GROUP BY UserID

Query id: 88dc8579-e110-433b-bc52-46d90078b187

┌─explain──────────────┐
│ SELECT               │
│     sum(UserID) * 2, │
│     max(VisitID),    │
│     UserID           │
│ FROM visits_v1       │
│ GROUP BY UserID      │
└──────────────────────┘

scalar replacement

If the subquery returns only one row of data, it is replaced with scalar when referenced

EXPLAIN SYNTAX
WITH (
        SELECT sum(bytes)
        FROM system.parts
        WHERE active
    ) AS total_disk_usage
SELECT
    (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage,
    table
FROM system.parts
GROUP BY table
ORDER BY table_disk_usage DESC
LIMIT 10

Query id: dd48033c-d21b-41f2-87f0-78f3c529dc6b

┌─explain─────────────────────────────────────────────────────────────────────────┐
│ WITH identity(CAST(0, 'UInt64')) AS total_disk_usage                            │
│ SELECT                                                                          │
│     (sum(bytes_on_disk AS bytes) / total_disk_usage) * 100 AS table_disk_usage, │
│     table                                                                       │
│ FROM system.parts                                                               │
│ GROUP BY table                                                                  │
│ ORDER BY table_disk_usage DESC                                                  │
│ LIMIT 10                                                                        │
└─────────────────────────────────────────────────────────────────────────────────┘

Ternary operation optimization

If optimize is on_ if_ chain_ to_ For the multiIf parameter, the ternary operator will be replaced by the multiIf function

EXPLAIN SYNTAX
SELECT if(number = 1, 'hello', if(number = 2, 'world', 'atguigu'))
FROM numbers(10)
SETTINGS optimize_if_chain_to_multiif = 1

Query id: e1cb685f-a2b9-4121-bd8e-374b0c76d735

┌─explain─────────────────────────────────────────────────────────────┐
│ SELECT multiIf(number = 1, 'hello', number = 2, 'world', 'atguigu') │
│ FROM numbers(10)                                                    │
│ SETTINGS optimize_if_chain_to_multiif = 1                           │
└─────────────────────────────────────────────────────────────────────┘

4.3 query optimization

4.3. 1 single table

PreWhere

The function of the where statement is the same as that of the where statement, which is used to filter data.
The difference is that where only supports the tables of * MergeTree family engines. First, it will read the specified column data to judge the data filtering. After the data filtering, it will read the column field declared by select to complete the other attributes.
When the query columns are significantly more than the filter columns, the query performance can be improved ten times by using prewhere. Prewhere will automatically optimize the data reading method in the filtering stage and reduce io operations.
In some scenarios, you need to specify the where manually
4.1 using constant expressions
4.2 use fields of alias type by default
4.3 query including arrayJOIN, globalIn, globalNotIn or indexHint
4.4 the column field of select query is the same as the predicate of where
4.5 primary key field is used

sampling

The sampling modifier is valid only in the MergeTree engine table, and the sampling policy needs to be specified when creating the table.

CREATE TABLE datasets.hits_v1
(
    `WatchID` UInt64,
    `JavaEnable` UInt8,
	...
    `RequestTry` UInt8
)
ENGINE = MergeTree
...
SAMPLE BY intHash32(UserID)
...

SELECT
    Title,
    count(*) AS PageViews
FROM hits_v1
SAMPLE 0.1 -- Representative sampling 10%Data,It can also be a specific number
WHERE CounterID = 57
GROUP BY Title
ORDER BY PageViews DESC
LIMIT 1000

Query id: 269e0b6b-4e78-4282-8e35-f5bdc73c69c3

┌─Title────────────────────────────────────────────────────────────────┬─PageViews─┐
│                                                                      │        77 │
│ Фильмы онлайн на сегодня                                             │         6 │
│ Сбербанка «Работа, мебель обувь бензор.НЕТ « Новости, аксессионально │         6 │
└──────────────────────────────────────────────────────────────────────┴───────────┘

duplicate removal

uniqCombined replaces distinct, and its performance can be improved by more than 10 times;
The underlying layer of uniqCombined is implemented by hyperlog algorithm, which has a data error of about 2%. This de duplication method can be directly used to improve the query performance;
Count(distinct) will use uniqExact for accurate de duplication;
It is not recommended to perform distinct de duplication query on tens of millions of different data, but to perform approximate de duplication uniqCombined instead.

cloud-mn01 :) select count(distinct rand()) from hits_v1;

SELECT countDistinct(rand())
FROM hits_v1

Query id: 4c17ea01-14f6-4c83-9990-450bb30a2f59

┌─uniqExact(rand())─┐
│           8864520 │
└───────────────────┘

1 rows in set. Elapsed: 0.483 sec. Processed 8.87 million rows, 80.31 MB (18.37 million rows/s., 166.26 MB/s.)

cloud-mn01 :) SELECT uniqCombined(rand()) from datasets.hits_v1　;

SELECT uniqCombined(rand())
FROM datasets.hits_v1

Query id: 63912054-e9ed-47d1-a4db-1923a5c8f9c1

┌─uniqCombined(rand())─┐
│              8878876 │
└──────────────────────┘

1 rows in set. Elapsed: 0.102 sec. Processed 8.87 million rows, 80.31 MB (86.86 million rows/s., 786.11 MB/s.)

cloud-mn01 :)

matters needing attention

Query fusing: in order to avoid the service avalanche caused by individual slow queries, in addition to setting timeout for a single query, cycle fusing can also be configured. In a query cycle, if users frequently perform slow query operations, they will not be able to continue query operations after exceeding the specified threshold.
Turn off virtual memory: the data exchange between physical memory and virtual memory will slow down the query. If resources allow, turn off virtual memory.
Configure join_use_nulls: add a join for each account_ use_ Nulls configuration, a record in the left table does not exist in the right table, and the corresponding field in the right table will return the default value of the corresponding data type of the field instead of the Null value in standard SQL.
Sort before batch writing: when batch writing data, the number of partitions involved in each batch of data must be controlled. It is best to sort the data to be imported before writing. Disordered data or too many partitions will cause ClickHouse to fail to merge the newly imported data in time, thus affecting the query performance.
Pay attention to cpu: generally, query fluctuations will occur when the cpu is about 50%, and a wide range of query timeout will occur when it reaches 70%. cpu is the most critical indicator, so we should pay close attention to it.

4.3. 2 multi meter

Use IN instead of JOIN: when multiple tables are associated queried and the queried data is only from one table, consider using IN instead of JOIN;
Size table join: when joining multiple tables, the principle of small tables on the right should be met. When the right table is associated, it is loaded into memory and compared with the left table;
Note that predicate push down: ClickHouse will not actively initiate predicate push down during join query, and each sub query needs to complete the filtering operation in advance;
Distributed tables use GLOBAL: the IN and JOIN of two distributed tables must be preceded by the GLOBAL keyword. The right table will only be queried once at the node receiving the query request and distributed to other nodes. If the GLOBAL keyword is not added, each node will initiate a separate query on the right table, and the right table is a distributed table, resulting IN a total of N queries on the right table ² Times (n is the number of slices of the distributed table), which is query amplification, which will bring great overhead.

5, Advanced

5.1 data consistency

Replaceingmergetree and SummingMergeTree only guarantee the final consistency. There will be short-term data inconsistency after updating the table data.

Environmental preparation

-- user_id Is the identification of data De duplication update;
-- create_time Is the version number field in each group of data create_time The largest row represents the latest data;
-- deleted Is a self-defined flag bit. For example, 0 means not deleted and 1 means deleted data.
CREATE TABLE test_a
(
    `user_id` UInt64,
    `score` String,
    `deleted` UInt8 DEFAULT 0,
    `create_time` DateTime DEFAULT toDateTime(0)
)
ENGINE = ReplacingMergeTree(create_time)
ORDER BY user_id

INSERT INTO test_a (user_id, score) WITH (
        SELECT ['A', 'B', 'C', 'D', 'E', 'F', 'G']
    ) AS dict
SELECT
    number AS user_id,
    dict[(number % 7) + 1]
FROM numbers(100000)

De duplication through FINAL

Add the FINAL modifier after the query statement, so that the special logic of Merge (such as data De duplication, pre aggregation, etc.) will be executed during the query.

-- Update data
INSERT INTO test_a (user_id, score, create_time) WITH (
        SELECT ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG']
    ) AS dict
SELECT
    number AS user_id,
    dict[(number % 7) + 1],
    now() AS create_time
FROM numbers(5000)

-- General query
SELECT COUNT()
FROM test_a
WHERE user_id < 5000

┌─count()─┐
│   10000 │
└─────────┘

-- adopt FINAL query
SELECT COUNT()
FROM test_a
FINAL
WHERE user_id < 5000

┌─count()─┐
│    5000 │
└─────────┘

De duplication through GROUP BY

-- argMax(field1，field2):according to field2 The maximum value of is taken as field1 Value of.
-- When we update the data, we will write a new row of data. For example, in the above statement, we can query the largest data create_time Modified score Field value.
SELECT
    user_id ,
    argMax(score, create_time) AS score, 
    argMax(deleted, create_time) AS deleted,
    max(create_time) AS ctime 
FROM test_a 
GROUP BY user_id
HAVING deleted = 0;

-- Create views to facilitate testing
CREATE VIEW view_test_a AS
SELECT
    user_id,
    argMax(score, create_time) AS score,
    argMax(deleted, create_time) AS deleted,
    max(create_time) AS ctime
FROM test_a
GROUP BY user_id
HAVING deleted = 0

-- Update data
INSERT INTO test_a (user_id, score, create_time) VALUES

-- Query view, de duplication
SELECT *
FROM view_test_a
WHERE user_id = 0

Query id: a11e2648-cba4-4fde-9e95-3a6896f0adca

┌─user_id─┬─score─┬─deleted─┬───────────────ctime─┐
│       0 │ AAAA  │       0 │ 2021-08-10 11:17:49 │
└─────────┴───────┴─────────┴─────────────────────┘

-- Query the original table, and all records are in the
SELECT *
FROM test_a
WHERE user_id = 0

Query id: 55af213a-f8b7-4238-8456-bc5df1b62562

┌─user_id─┬─score─┬─deleted─┬─────────create_time─┐
│       0 │ AAAA  │       0 │ 2021-08-10 11:17:49 │
└─────────┴───────┴─────────┴─────────────────────┘
┌─user_id─┬─score─┬─deleted─┬─────────create_time─┐
│       0 │ A     │       0 │ 1970-01-01 08:00:00 │
└─────────┴───────┴─────────┴─────────────────────┘

-- Delete data
INSERT INTO test_a (user_id, score, deleted, create_time) VALUES

-- Query the view and the record disappears
SELECT *
FROM view_test_a
WHERE user_id = 0

Query id: c6157128-84ac-4e86-92a9-68aad99b539d

Ok.

0 rows in set. Elapsed: 0.006 sec. Processed 8.19 thousand rows, 188.47 KB (1.47 million rows/s., 33.80 MB/s.)

-- Query the original table, and the data is in the
SELECT *
FROM test_a
WHERE user_id = 0

Query id: 482cbcdb-f2d1-45b4-ba05-7153c0e0a6ef

┌─user_id─┬─score─┬─deleted─┬─────────create_time─┐
│       0 │ AAAA  │       0 │ 2021-08-10 11:17:49 │
└─────────┴───────┴─────────┴─────────────────────┘
┌─user_id─┬─score─┬─deleted─┬─────────create_time─┐
│       0 │ AAAA  │       1 │ 2021-08-10 11:19:10 │
└─────────┴───────┴─────────┴─────────────────────┘
┌─user_id─┬─score─┬─deleted─┬─────────create_time─┐
│       0 │ A     │       0 │ 1970-01-01 08:00:00 │
└─────────┴───────┴─────────┴─────────────────────┘

5.2 materialized view

brief introduction

The normal view does not save data, but only query statements, while the materialized view of ClickHouse is a kind of persistence of query results;
The user's query is no different from the table. It is a table, and it is also like a table that is being pre calculated at all times;
Advantages: fast query speed;
Disadvantages: its essence is a usage scenario of streaming data. It is a cumulative technology, which is not suitable for de reanalysis of historical data; If many materialized views are added to a table, many machine resources will be consumed when writing this table.

grammar

It is also the create syntax, which creates a hidden target table TO save the view data. You can also save the TO table name TO
An explicit table. No TO table name is added. The table name defaults TO inner. Materialized view name

CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] 
[ENGINE = engine] [POPULATE] AS SELECT ...

The engine of materialized view must be specified for data storage;
TO [db]. When using [table] syntax, you cannot use popup;
The popup keyword determines the update strategy of materialized view: + query historical data, - only apply to newly inserted data;
The materialized view does not support synchronous deletion. If the data of the source table does not exist (deleted), the data of the materialized view will still be retained.

demonstration

-- Create original table
CREATE TABLE hits_test
(
    `EventDate` Date,
    `CounterID` UInt32,
    `UserID` UInt64,
    `URL` String,
    `Income` UInt8
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
SETTINGS index_granularity = 8192

-- insert data
INSERT INTO hits_test SELECT
    EventDate,
    CounterID,
    UserID,
    URL,
    Income
FROM datasets.hits_v1
LIMIT 10000

-- Create materialized view
CREATE MATERIALIZED VIEW hits_mv
ENGINE = SummingMergeTree
PARTITION BY toYYYYMM(EventDate)
ORDER BY (EventDate, intHash32(UserID)) AS
SELECT
    UserID,
    EventDate,
    count(URL) AS ClickCount,
    sum(Income) AS IncomeSum
FROM hits_test
WHERE EventDate >= '2014-03-20'
GROUP BY
    UserID,
    EventDate

-- View generated inner surface
SHOW TABLES

┌─name───────────────────────────────────────────┐
│ .inner_id.069f7d89-bd86-4ae6-869f-7d89bd86fae6 │
│ hits_mv                                        │
│ hits_test                                      │
└────────────────────────────────────────────────┘

-- Import incremental data
INSERT INTO hits_test SELECT
    EventDate,
    CounterID,
    UserID,
    URL,
    Income
FROM datasets.hits_v1
WHERE EventDate >= '2014-03-23'
LIMIT 10

-- Query materialized view
SELECT *
FROM hits_mv

Query id: 9af4d8b2-7e9d-48d1-b950-a923e95a047c

┌──────────────UserID─┬──EventDate─┬─ClickCount─┬─IncomeSum─┐
│ 8585742290196126178 │ 2014-03-23 │          8 │        16 │
│ 1095363898647626948 │ 2014-03-23 │          2 │         0 │
└─────────────────────┴────────────┴────────────┴───────────┘

2 rows in set. Elapsed: 0.002 sec.

5.3 materialized MySQL engine

brief introduction

The database engine of materialize MySQL can map to a database in MySQL and automatically create the corresponding replaceingmergetree in ClickHouse;
As a MySQL copy, ClickHouse service reads Binlog and executes DDL and DML requests, realizing the real-time synchronization function of business database based on MySQL Binlog mechanism;
Materialized MySQL supports both full and incremental synchronization. At the beginning of database creation, tables and data in MySQL will be fully synchronized, and then incremental synchronization will be carried out through binlog;
The materialized MySQL database automatically adds a new value for each replaceingmergetree it creates_ sign and_ version field;
The ClickHouse database table automatically converts MySQL primary keys and index clauses into ORDER BY tuples.

_sign + _version

_ Version is used as a version parameter. Whenever insert, update and delete events are heard, it will be automatically incremented globally in databse; And_ sign is used to mark whether it is deleted. The value is 1 or - 1.

MYSQL_WRITE_ROWS_EVENT: _sign = 1，_version ++
MYSQL_DELETE_ROWS_EVENT: _sign = -1，_version ++
MYSQL_UPDATE_ROWS_EVENT: new data_ sign = 1，_ version ++
MYSQL_QUERY_EVENT: supports CREATE TABLE, DROP TABLE, RENAME TABLE, etc.

MySQL configuration

The binlog and GTID modes need to be enabled

[root@cloud-dn02 ~]# cat /etc/my.cnf
[mysqld]
user=root
basedir=/opt/module/mysql
datadir=/opt/module/mysql/data
socket=/tmp/mysql.sock
port=3306

# Make sure that MySQL has the binlog function enabled and the format is ROW
server_id=6
log-bin=mysql-bin
binlog_format=ROW

# Turn on GTID mode, which can ensure the consistency of data synchronization in mysql master-slave mode (during master-slave switching).
gtid-mode=on
enforce-gtid-consistency=1
log-slave-updates=1

# the MaterializeMySQL engine requires default_authentication_plugin='mysql_native_password'.
default_authentication_plugin=mysql_native_password

[mysql]
socket=/tmp/mysql.sock
[root@cloud-dn02 ~]#

Exhibition

Create MySQL tables and insert data

CREATE DATABASE testck;
CREATE TABLE `testck`.`t_organization` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `code` int NOT NULL,
 `name` text DEFAULT NULL,
 `updatetime` datetime DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY (`code`)
) ENGINE=InnoDB;

INSERT INTO testck.t_organization (code, name,updatetime) VALUES(1000,'Realinsight',NOW());
INSERT INTO testck.t_organization (code, name,updatetime) VALUES(1001, 'Realindex',NOW());
INSERT INTO testck.t_organization (code, name,updatetime) VALUES(1002,'EDT',NOW());

CREATE TABLE `testck`.`t_user` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `code` int,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO testck.t_user (code) VALUES(1);

Create ClickHouse synchronization database

SET allow_experimental_database_materialize_mysql = 1

CREATE DATABASE test_binlog
ENGINE = MaterializeMySQL('cloud-dn02:3306', 'testck', 'rayslee', 'abcd1234..')


# Note that there may be an unknown error when using root: Code: 100 DB::Exception: Received from localhost:9000.  DB::Exception: Access denied for user root.
cloud-mn01 :) show tables;

SHOW TABLES

Query id: 95891aee-ffbc-4b0c-abc1-881266785c86

┌─name───────────┐
│ t_organization │
│ t_user         │
└────────────────┘

2 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :) select * from t_user;

SELECT *
FROM t_user

Query id: 9fa08f8a-4cd3-4c85-a71d-e3a92237caa8

┌─id─┬─code─┐
│  1 │    1 │
└────┴──────┘

1 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :) select * from t_organization;

SELECT *
FROM t_organization

Query id: eb1662db-101e-4132-aa7c-0326099ed084

┌─id─┬─code─┬─name────────┬──────────updatetime─┐
│  1 │ 1000 │ Realinsight │ 2021-08-11 11:22:34 │
│  2 │ 1001 │ Realindex   │ 2021-08-11 11:22:34 │
│  3 │ 1002 │ EDT         │ 2021-08-11 11:22:34 │
└────┴──────┴─────────────┴─────────────────────┘

3 rows in set. Elapsed: 0.003 sec. 

cloud-mn01 :)

5.4 cluster monitoring

Installation monitoring

Install prometheus

[root@cloud-mn01 ~]# tar -zxf /opt/soft/prometheus-2.26.0.linux-amd64.tar.gz -C /opt/module/
[root@cloud-mn01 ~]# ln -s /opt/module/prometheus-2.26.0.linux-amd64 /opt/module/prometheus
[root@cloud-mn01 ~]# vi /opt/module/prometheus/prometheus.yml 
[root@cloud-mn01 ~]# tail /opt/module/prometheus/prometheus.yml 
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  # Add clickhouse monitoring
  - job_name: 'clickhouse'
    static_configs:
    - targets: ['cloud-mn01:9363']

[root@cloud-mn01 ~]# nohup /opt/module/prometheus/prometheus --config.file=/opt/module/prometheus/prometheus.yml &>> /var/log/prometheus.log &
[1] 1683
[root@cloud-mn01 ~]#

http://192.168.1.101:9090/targets

Installing grafana

[root@cloud-mn01 ~]# tar -zxf /opt/soft/grafana-7.5.2.linux-amd64.tar.gz -C /opt/module/
[root@cloud-mn01 ~]# ln -s /opt/module/grafana-7.5.2 /opt/module/grafana
[root@cloud-mn01 ~]# cd /opt/module/grafana
# Note: you must enter the Grafana home directory for execution
[root@cloud-mn01 grafana]# nohup ./bin/grafana-server web &>> /var/log/grafana-server.log &
[2] 1748
[root@cloud-mn01 grafana]#

http://192.168.1.101:3000/ Default username and password admin/admin

Clickhouse adaptation

[root@cloud-mn01 ~]# vim /etc/clickhouse-server/config.xml
	# Open external access
	<listen_host>::</listen_host> 
	# Open monitoring indicator
    <prometheus>
        <endpoint>/metrics</endpoint>
        <port>9363</port>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
        <status_info>true</status_info>
    </prometheus>
[root@cloud-mn01 ~]# systemctl restart clickhouse-server.service 
[root@cloud-mn01 ~]#

Add monitoring

add data source

Select monitoring indicator template

https://grafana.com/grafana/dashboards/14432

Import template

'+' => Import

View monitoring effect

5.5 backup and recovery

https://github.com/AlexAkulov/clickhouse-backup/releases/tag/v1.0.0

## The tool has a version compatibility problem temporarily (the latest version of clickhouse was released in July and the latest version of the tool was released in June)

[root@cloud-mn01 soft]# rpm -ivh clickhouse-backup-1.0.0-1.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:clickhouse-backup-1.0.0-1        ################################# [100%]
[root@cloud-mn01 soft]# cd /etc/clickhouse-backup/
[root@cloud-mn01 clickhouse-backup]# ll
total 4
-rw-r--r-- 1 root root 1682 Jun 17 00:49 config.yml.example
[root@cloud-mn01 clickhouse-backup]#

Programmer Think