Doris series 6-Doris rollup

Posted by apulmca2k4 on Wed, 05 Jan 2022 09:32:16 +0100

I Basic concepts

In Doris, we call the table created by the user through the table creation statement as the Base Table. The Base Table holds the basic data stored in the way specified by the user's table creation statement.

On top of the Base table, we can create any number of ROLLUP tables. These ROLLUP data are generated based on the Base table and are physically stored independently.

The basic function of the ROLLUP table is to obtain coarser grained aggregate data based on the Base table.

Let's use examples to illustrate the ROLLUP table and its functions in different data models.

II ROLLUP in Aggregate and Uniq models

Because Uniq is only a special case of the Aggregate model, we will not distinguish it here.

2.1 test data preparation

MySQL data preparation:
MySQL test data preparation

Data import doris:

CREATE TABLE ods_fact_sale_doris
    id	      BIGINT DEFAULT '0',
    sale_date varchar(100),
    prod_name varchar(32),
    sale_nums BIGINT SUM
AGGREGATE KEY(id,sale_date,prod_name)
PROPERTIES("replication_num" = "3");

LOAD LABEL table3_20211213_1
    DATA INFILE("hdfs://*")
    INTO TABLE ods_fact_sale_doris
    FORMAT AS 'orc'
        id = id,
WITH BROKER broker_name

SHOW LOAD WHERE LABEL = "table3_20211213_1"\G

2.2 get daily sales

2.2.1 write sql directly

select sale_date,sum(sale_nums) as all_nums
  from ods_fact_sale_doris
 group by sale_date;

It took almost 38 seconds.

2.2.2 first use rollup aggregation and then query

ALTER TABLE ods_fact_sale_doris ADD ROLLUP Rollup5(sale_date,sale_nums);

As you can see, only each sale is retained in the ROLLUP_ Date, in sale_ The result of SUM on the num column. Then, when we make the following query:

select sale_date,sum(sale_nums) as all_nums
  from ods_fact_sale_doris
 group by sale_date;

Doris will automatically hit the ROLLUP table, so that only a small amount of data needs to be scanned to complete the aggregation query.

III Some notes on ROLLUP

  1. The most fundamental function of ROLLUP is to improve the query efficiency of some queries (whether it is to reduce the amount of data through aggregation or modify the column order to match the prefix index). Therefore, the meaning of ROLLUP has gone beyond the scope of "roll up". This is why we named it Materialized Index in the source code.

  2. ROLLUP is attached to the Base table and can be regarded as an auxiliary data structure of the Base table. You can create or delete rollups based on the Base table, but you cannot explicitly specify a ROLLUP in the query. Whether to hit the ROLLUP is completely determined by Doris system automatically.

  3. ROLLUP data is physically stored independently. Therefore, the more rollups you create, the more disk space you use. At the same time, it will also affect the import speed (the imported ETL stage will automatically generate all ROLLUP data), but it will not reduce the query efficiency (only better).

  4. The data update of ROLLUP is completely synchronized with Base. Users do not need to care about this issue.

  5. The aggregation method of columns in ROLLUP is exactly the same as that of the Base table. When creating a ROLLUP, you do not need to specify it or modify it.

  6. A necessary condition (non sufficient condition) for a query to hit the ROLLUP is that all columns involved in the query (including the query condition columns in select list and where) exist in the ROLLUP column. Otherwise, the query can only hit the Base table.

  7. Some types of queries (such as count(*) cannot hit ROLLUP under any conditions.

  8. You can use EXPLAIN your_sql; Command to obtain the query execution plan. In the execution plan, check whether ROLLUP is hit.

  9. You can use DESC tbl_name ALL; Statement displays the Base table and all created rollups.

reference resources: