Introduction to Hologres TPCH based performance test

Posted by blues on Mon, 17 Jan 2022 08:47:52 +0100

Introduction: This paper will introduce how to do performance test based on TPCH data set in Hologres, and provide reference for test results to facilitate you to select product specifications.

background information

TPC-H (Business Intelligence Computing test) is a test set developed by the transaction processing performance Council (TPC) to simulate decision support applications. At present, it is widely used in academia and industry to evaluate the performance of decision support technology. TPC-H is modeled according to the real production and operation environment and simulates the data warehouse of a sales system. It contains 8 tables, and the amount of data can be set from 1g to 3T. The benchmark includes 22 queries, and the main evaluation index is the response time of each query, that is, the time from submitting the query to returning the result. The test results can comprehensively reflect the ability of the system to process queries. Refer to for details TPCH documentation.

Data set introduction

The data set contains the following 8 tables, and the relationship between them is shown in the figure below.

Test details

Description of test data volume

The amount of test data will directly affect the test results. In the generation tool of TPC-H, SF (scale factor) is used to control the amount of data generated. 1 SF corresponds to 1 GB.

Note: the amount of data mentioned above is only The data volume of the original data does not include space occupation such as index, so more space needs to be reserved when preparing the environment.

testing environment

The exclusive instance (pay as you go) is used in this test. Since it is only used for test illustration, 8-core 32G is selected for computing resource configuration.

Test scenario

This test scenario mainly includes three parts:

  1. OLAP query scenario test mainly uses the column save table and directly uses 22 queries in TPCH test;
  2. The Key/Value point query scenario test mainly uses the row save table. After using the row save table for orders, the point query of primary key filtering is performed;

Basic environmental preparation

  • This step is mainly used to prepare the data required for OLAP query scenarios and Key/Value point query scenarios;

Basic environmental preparation

1. Create an ECS instance

Log in to alicloud and create an ECS instance for data generation, data import to Hologres, and client testing. Recommended specifications:

  • ecs.g6.4xlarge specification
  • CentOS 7.9 system
  • ESSD data disk. The specific data capacity is determined according to the amount of data to be tested
  • It is recommended that ECS and Hologres instances use the same Region and VPC network

2. Create Hologres instance

  • Log in to alicloud, enter the Hologres product console, and click add engine instance
  • Select the configuration and fill in the instance name. Please refer to for details Official documents.

3. Create test database

  • After creating the instance, you need to log in to the Hologres instance you created and create a database. In this test, the database is named tpch_1sf, please refer to for detailed operation steps Official documents

Generate TPC-H data

1. Prepare data generation tools

  • Remote link ECS instance
  • Update all libraries
yum update
  • Install git
yum install git
  • Install gcc
yum install gcc
  • Download TPC-H data generation code
git clone https://github.com/gregrahn/tpch-kit.git
  • Enter the data generation tool code directory
cd tpch-kit/dbgen
  • Compile data generation tool code
make

2. Generate data

  • After the compilation is successful, you can use the following code to view the relevant parameters of the code generation tool.
./dbgen --help
  • This test only generates 1 GB of data, so run the following code to generate data.
./dbgen -vf -s 1
If you need to generate more data, you can adjust the parameters of SF. For example, you can use the following code to generate 1 T data
./dbgen -vf -s 1000
  • Generally, 32CU can run TPCH SF10256CU can run TPCH SF50
  • After the data is generated, you can use the following code to view the generated file. You can see that the generation tool generates 8 data files, and each data file corresponds to a table in the dataset.
ls | grep '.*.tbl'

OLAP query scenario test

Prepare data

1. Create table

  • Since this article mainly uses psql for data import, you need to run the following commands in ECS to install psql
yum install postgresql-server
  • After installing psql, you can log in to the Hologres instance using the following command
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
  • After connecting to Hologres using psql, you can use the following table creation statement to create a database table
DROP TABLE IF EXISTS LINEITEM;

BEGIN;
CREATE TABLE LINEITEM
(
    L_ORDERKEY      INT         NOT NULL,
    L_PARTKEY       INT         NOT NULL,
    L_SUPPKEY       INT         NOT NULL,
    L_LINENUMBER    INT         NOT NULL,
    L_QUANTITY      DECIMAL(15,2) NOT NULL,
    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    L_DISCOUNT      DECIMAL(15,2) NOT NULL,
    L_TAX           DECIMAL(15,2) NOT NULL,
    L_RETURNFLAG    TEXT        NOT NULL,
    L_LINESTATUS    TEXT        NOT NULL,
    L_SHIPDATE      TIMESTAMPTZ NOT NULL,
    L_COMMITDATE    TIMESTAMPTZ NOT NULL,
    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,
    L_SHIPINSTRUCT  TEXT        NOT NULL,
    L_SHIPMODE      TEXT        NOT NULL,
    L_COMMENT       TEXT        NOT NULL,
    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS ORDERS;

BEGIN;
CREATE TABLE ORDERS
(
    O_ORDERKEY      INT         NOT NULL PRIMARY KEY,
    O_CUSTKEY       INT         NOT NULL,
    O_ORDERSTATUS   TEXT        NOT NULL,
    O_TOTALPRICE    DECIMAL(15,2) NOT NULL,
    O_ORDERDATE     timestamptz NOT NULL,
    O_ORDERPRIORITY TEXT        NOT NULL,
    O_CLERK         TEXT        NOT NULL,
    O_SHIPPRIORITY  INT         NOT NULL,
    O_COMMENT       TEXT        NOT NULL
);
CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property('ORDERS', 'colocate_with', 'LINEITEM');
CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS PARTSUPP;

BEGIN;
CREATE TABLE PARTSUPP
(
    PS_PARTKEY    INT    NOT NULL,
    PS_SUPPKEY    INT    NOT NULL,
    PS_AVAILQTY   INT    NOT NULL,
    PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
    PS_COMMENT    TEXT   NOT NULL,
    PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY)
);
CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');
CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');
CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');
CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');
CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS PART;

BEGIN;
CREATE TABLE PART
(
    P_PARTKEY     INT    NOT NULL PRIMARY KEY,
    P_NAME        TEXT   NOT NULL,
    P_MFGR        TEXT   NOT NULL,
    P_BRAND       TEXT   NOT NULL,
    P_TYPE        TEXT   NOT NULL,
    P_SIZE        INT    NOT NULL,
    P_CONTAINER   TEXT   NOT NULL,
    P_RETAILPRICE DECIMAL(15,2) NOT NULL,
    P_COMMENT     TEXT   NOT NULL
);
CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');
CALL set_table_property('PART', 'colocate_with', 'LINEITEM');
CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');
COMMIT;



DROP TABLE IF EXISTS CUSTOMER;
BEGIN;
CREATE TABLE CUSTOMER
(
    C_CUSTKEY    INT    NOT NULL PRIMARY KEY,
    C_NAME       TEXT   NOT NULL,
    C_ADDRESS    TEXT   NOT NULL,
    C_NATIONKEY  INT    NOT NULL,
    C_PHONE      TEXT   NOT NULL,
    C_ACCTBAL    DECIMAL(15,2) NOT NULL,
    C_MKTSEGMENT TEXT   NOT NULL,
    C_COMMENT    TEXT   NOT NULL
);
CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');
CALL set_table_property('CUSTOMER', 'colocate_with', 'LINEITEM');
CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS SUPPLIER;

BEGIN;
CREATE TABLE SUPPLIER
(
    S_SUPPKEY   INT    NOT NULL PRIMARY KEY,
    S_NAME      TEXT   NOT NULL,
    S_ADDRESS   TEXT   NOT NULL,
    S_NATIONKEY INT    NOT NULL,
    S_PHONE     TEXT   NOT NULL,
    S_ACCTBAL   DECIMAL(15,2) NOT NULL,
    S_COMMENT   TEXT   NOT NULL
);
CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');
CALL set_table_property('SUPPLIER', 'colocate_with', 'LINEITEM');
CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS NATION;

BEGIN;
CREATE TABLE NATION(
  N_NATIONKEY INT NOT NULL PRIMARY KEY,
  N_NAME text NOT NULL,
  N_REGIONKEY INT NOT NULL,
  N_COMMENT text NOT NULL
);
CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');
CALL set_table_property('NATION', 'colocate_with', 'LINEITEM');
CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');
CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');
CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS REGION;

BEGIN;
CREATE TABLE REGION
(
    R_REGIONKEY INT  NOT NULL PRIMARY KEY,
    R_NAME      TEXT NOT NULL,
    R_COMMENT   TEXT
);
CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');
CALL set_table_property('REGION', 'colocate_with', 'LINEITEM');
CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');
COMMIT;
  • After creation, you can use the following code in psql to check whether the creation is successful
tpch_1sf=# \dt
  • If successful, the practical effect is as follows
tpch_1sf=# \dt
               List of relations
 Schema |   Name   | Type  |       Owner        
--------+----------+-------+--------------------
 public | customer | table | tpch_1sf_developer
 public | lineitem | table | tpch_1sf_developer
 public | nation   | table | tpch_1sf_developer
 public | orders   | table | tpch_1sf_developer
 public | part     | table | tpch_1sf_developer
 public | partsupp | table | tpch_1sf_developer
 public | region   | table | tpch_1sf_developer
 public | supplier | table | tpch_1sf_developer
(8 rows)

2. Import data

  • This test scheme mainly uses the COPY FROM STDIN method to import data. For details, please refer to Official documents . Here, the previously generated tbl data file will be imported into the table created in Hologres.
  • You can import data by referring to the following shell script in the directory of the data generation tool
for i in `ls *.tbl`; do
    echo $i;
    name=`echo $i| cut -d'.' -f1`;
    PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter '|' csv;" < $i;
done
  • You have now finished importing data

3. Collect statistical information

  • In order to better execute the query, you can use the following statements in psql to make Hologres collect the characteristic information of each table.
vacuum  region;
vacuum  nation;
vacuum  supplier;
vacuum  customer;
vacuum  part;
vacuum  partsupp;
vacuum  orders;
vacuum  lineitem;

analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;

Execute query

  • In order to facilitate the statistics and query of information, you need to use the pgbench tool. You can use the following command to install pgbench (if pgbench already exists on the test machine, please ensure that the version is greater than 9.6, preferably greater than 13, otherwise the following tests will encounter various incompatibilities)
yum install postgresql-contrib
  • To facilitate query, you can directly download the required 22 SQL through the following connection

tpch_data_tpch_query.zip

  • Then upload to ECS
  • Enter the ECS, enter the directory where the files are uploaded, and use the following shell command to extract the files
unzip tpch_data_tpch_query
  • Now that you have completed the preparations, you can use pgbench to test. You can use the following command to execute a single query
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f xxx.sql
  • Parameter interpretation
Configuration itemparameterexplain
-hendpoint of Hologres instanceView at Hologres control desk
-pPort address of the Hologres instanceView at Hologres control desk
-dHologres specifies the database name in the instance
-cNumber of clients (concurrency)Example: 1. Since the test only tests query performance and does not test concurrency, the concurrency can be set to 1
-tThe number of pressure test queries to be executed by each client50
-fsql for pressure measurementExample: 6 sql
  • You can also directly execute the following shell} script, directly execute 22 queries in batch, and output the results to the file hologres_tpch_test.out
rm -f hologres_tpch_test.out
echo `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.out
for i in {1..22}
do
    PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f ./tpch_data_tpch_query/${i}.sql >> ./hologres_tpch_test.out
done
  • View hologres_tpch_test.out to get the query results, as shown in the following example

    • transaction type: describes the specific SQL file to be executed
    • latency average: records the average time of 3 queries of the corresponding SQL file
2021-03-23 03:50:54 begin
pghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100
transaction type: ./tpch_data_tpch_query/1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 3
number of transactions actually processed: 3/3
latency average = 76.936 ms
tps = 12.997850 (including connections establishing)
tps = 15.972757 (excluding connections establishing)
...

TPCH 22 query statements

Q1

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '90' day 
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;

Q2

select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 15
    and p_type like '%BRASS'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'
    and ps_supplycost = (
        select
            min(ps_supplycost)
        from
            partsupp,
            supplier,
            nation,
            region
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
    )
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey
limit 100;

Q3

select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'BUILDING'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-15'
    and l_shipdate > date '1995-03-15'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
limit 10;

Q4

select
    o_orderpriority,
    count(*) as order_count
from
    orders
where
    o_orderdate >= date '1993-07-01'
    and o_orderdate < date '1993-07-01' + interval '3' month
    and exists (
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by
    o_orderpriority
order by
    o_orderpriority;

Q5

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'ASIA'
    and o_orderdate >= date '1994-01-01'
    and o_orderdate < date '1994-01-01' + interval '1' year
group by
    n_name
order by
    revenue desc;

Q6

select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < date '1994-01-01' + interval '1' year
    and l_discount between 6 - 1 and 6 + 1
    and l_quantity < 2400

Q7

set hg_experimental_enable_double_equivalent=on;
select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
    (
        select
            n1.n_name as supp_nation,
            n2.n_name as cust_nation,
            extract(year from l_shipdate) as l_year,
            l_extendedprice * (1 - l_discount) as volume
        from
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2
        where
            s_suppkey = l_suppkey
            and o_orderkey = l_orderkey
            and c_custkey = o_custkey
            and s_nationkey = n1.n_nationkey
            and c_nationkey = n2.n_nationkey
            and (
                (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
                or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
            )
            and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

Q8

set hg_experimental_enable_double_equivalent=on;
select
    o_year,
    sum(case
        when nation = 'BRAZIL' then volume
        else 0
    end) / sum(volume) as mkt_share
from
    (
        select
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1 - l_discount) as volume,
            n2.n_name as nation
        from
            part,
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2,
            region
        where
            p_partkey = l_partkey
            and s_suppkey = l_suppkey
            and l_orderkey = o_orderkey
            and o_custkey = c_custkey
            and c_nationkey = n1.n_nationkey
            and n1.n_regionkey = r_regionkey
            and r_name = 'AMERICA'
            and s_nationkey = n2.n_nationkey
            and o_orderdate between date '1995-01-01' and date '1996-12-31'
            and p_type = 'STANDARD POLISHED TIN'
    ) as all_nations
group by
    o_year
order by
    o_year;

Q9

set hg_experimental_enable_double_equivalent=on;
select
    nation,
    o_year,
    sum(amount) as sum_profit
from
    (
        select
            n_name as nation,
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
        from
            part,
            supplier,
            lineitem,
            partsupp,
            orders,
            nation
        where
            s_suppkey = l_suppkey
            and ps_suppkey = l_suppkey
            and ps_partkey = l_partkey
            and p_partkey = l_partkey
            and o_orderkey = l_orderkey
            and s_nationkey = n_nationkey
            and p_name like '%green%'
    ) as profit
group by
    nation,
    o_year
order by
    nation,
    o_year desc;

Q10

select
    c_custkey,
    c_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
from
    customer,
    orders,
    lineitem,
    nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '1993-10-01'
    and o_orderdate < date '1993-10-01' + interval '3' month
    and l_returnflag = 'R'
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc
limit 20;

Q11

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'GERMANY'
group by
    ps_partkey having
        sum(ps_supplycost * ps_availqty) > (
            select
                sum(ps_supplycost * ps_availqty) * 0.0000010000
            from
                partsupp,
                supplier,
                nation
            where
                ps_suppkey = s_suppkey
                and s_nationkey = n_nationkey
                and n_name = 'GERMANY'
        )
order by
    value desc
    limit 100;

Q12

select
    l_shipmode,
    sum(case
        when o_orderpriority = '1-URGENT'
            or o_orderpriority = '2-HIGH'
            then 1
        else 0
    end) as high_line_count,
    sum(case
        when o_orderpriority <> '1-URGENT'
            and o_orderpriority <> '2-HIGH'
            then 1
        else 0
    end) as low_line_count
from
    orders,
    lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('MAIL', 'SHIP')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1994-01-01'
    and l_receiptdate < date '1994-01-01' + interval '1' year
group by
    l_shipmode
order by
    l_shipmode;

Q13

select
    c_count,
    count(*) as custdist
from
    (
        select
            c_custkey,
            count(o_orderkey)
        from
            customer left outer join orders on
                c_custkey = o_custkey
                and o_comment not like '%special%requests%'
        group by
            c_custkey
    ) as c_orders (c_custkey, c_count)
group by
    c_count
order by
    custdist desc,
    c_count desc;

Q14

select
    100.00 * sum(case
        when p_type like 'PROMO%'
            then l_extendedprice * (1 - l_discount)
        else 0
    end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1995-09-01'
    and l_shipdate < date '1995-09-01' + interval '1' month;

Q15

with revenue0(SUPPLIER_NO, TOTAL_REVENUE)  as
    (
    select
        l_suppkey,
        sum(l_extendedprice * (1 - l_discount))
    from
        lineitem
    where
        l_shipdate >= date '1995-12-01'
        and l_shipdate < date '1995-12-01' + interval '3' month
    group by
        l_suppkey
    )
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue0
where
    s_suppkey = supplier_no
    and total_revenue = (
        select
            max(total_revenue)
        from
            revenue0
    )
order by
    s_suppkey;

Q16

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#45'
    and p_type not like 'MEDIUM POLISHED%'
    and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
    and ps_suppkey not in (
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size;

Q17

select
    sum(l_extendedprice) / 7.0 as avg_yearly
from
    lineitem,
    part
where
    p_partkey = l_partkey
    and p_brand = 'Brand#23'
    and p_container = 'MED BOX'
    and l_quantity < (
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
            and l_partkey in(
        select p_partkey 
        from part 
        where p_brand = 'Brand#23' and p_container = 'MED BOX')
    );

Q18

select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 300
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
limit 100;

Q19

select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#12'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 1 and l_quantity <= 1 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 10 and l_quantity <= 10 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#34'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= 20 and l_quantity <= 20 + 10
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

Q20

select
    s_name,
    s_address
from
    supplier,
    nation
where
    s_suppkey in (
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in (
                select
                    p_partkey
                from
                    part
                where
                    p_name like 'forest%'
            )
            and ps_availqty > (
                select
                    0.5 * sum(l_quantity)
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date '1994-01-01'
                    and l_shipdate < date '1994-01-01' + interval '1' year
            )
    )
    and s_nationkey = n_nationkey
    and n_name = 'CANADA'
order by
    s_name;

Q21

select
    s_name,
    count(*) as numwait
from
    supplier,
    lineitem l1,
    orders,
    nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists (
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists (
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = 'SAUDI ARABIA'
group by
    s_name
order by
    numwait desc,
    s_name
limit 100;

Q22

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from
    (
        select
            substring(c_phone from 1 for 2) as cntrycode,
            c_acctbal
        from
            customer
        where
            substring(c_phone from 1 for 2) in
                ('13', '31', '23', '29', '30', '18', '17')
            and c_acctbal > (
                select
                    avg(c_acctbal)
                from
                    customer
                where
                    c_acctbal > 0.00
                    and substring(c_phone from 1 for 2) in
                        ('13', '31', '23', '29', '30', '18', '17')
            )
            and not exists (
                select
                    *
                from
                    orders
                where
                    o_custkey = c_custkey
            )
    ) as custsale
group by
    cntrycode
order by
    cntrycode;

Key/Value check scenario test

Prepare data

1. Create table

  • Continue to use the database created in the OLAP query scenario. We will use the orders table in the TPCH dataset for testing. After connecting to Hologres using psql, you can use the following table creation statement to create the database table;
Note: the click query scenario needs to use row save tables, so a new table needs to be created. You cannot use OLAP to query the tables used in the scenario
DROP TABLE IF EXISTS orders_row;

BEGIN;
CREATE TABLE public.orders_row (
 "o_orderkey" int8 NOT NULL,
 "o_custkey" int8,
 "o_orderstatus" bpchar(1),
 "o_totalprice" numeric(15,2),
 "o_orderdate" date,
 "o_orderpriority" bpchar(15),
 "o_clerk" bpchar(15),
 "o_shippriority" int8,
 "o_comment" varchar(79),
PRIMARY KEY (o_orderkey)
);
CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');
CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');
CALL SET_TABLE_PROPERTY('public.orders_row', 'time_to_live_in_seconds', '3153600000');
CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');
COMMIT;

2. Import data in copy mode

  • This test scheme mainly uses COPY FROM STDIN to import data. For details, please refer to Official documents . Here, the previously generated tbl data file will be imported into the table created in Hologres.
  • You can import data by referring to the following commands in the directory of the data generation tool
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter '|' csv;" < orders.tbl

3. Import data in insert into mode

  • Since you have imported the data of the orders table in the OLAP scenario, you can run the following SQL statement to import the data
INSERT INTO public.orders_row
SELECT  *
FROM    public.orders;

query

1. Generate query statement

  • The main query statement features of the Key/Value query scenario are as follows
SELECT  column_a
        ,column_b
        ,...
        ,column_x
FROM    table_x
WHERE   pk = value_x
;

or

SELECT  column_a
        ,column_b
        ,...
        ,column_x
FROM    table_x
WHERE   pk IN ( value_a, value_b,..., value_x )
;
  • You can use the following script to generate the required sql. This script will generate 2 pieces of sql

    • kv_query_single.sql SQL for single value filtering
    • kv_query_in.sql SQL for multi value filtering. The script will randomly generate an SQL for 10 values
rm -rf kv_query
mkdir kv_query
cd kv_query
echo '\set column_values random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;' >> kv_query_single.sql
echo '\set column_values1 random(1,99999999)
\set column_values2 random(1,99999999)
\set column_values3 random(1,99999999)
\set column_values4 random(1,99999999)
\set column_values5 random(1,99999999)
\set column_values6 random(1,99999999)
\set column_values7 random(1,99999999)
\set column_values8 random(1,99999999)
\set column_values9 random(1,99999999)
\set column_values10 random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9,:column_values10);' >> kv_query_in.sql

2. Query

  • Pgbench is required for query. You can use the following command to install pgbench
yum install postgresql-contrib
  • After that, you can use pgbench to perform pressure measurement for single value filtering scenarios
Note that execute the following command in the directory where the SQL is generated
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_single.sql
  • Scenarios for multi value filtering
Note that execute the following command in the directory where the SQL is generated
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_in.sql
  • Parameter interpretation
Configuration itemparameterexplain
-hendpoint of Hologres instanceView at Hologres control desk
-pPort address of the Hologres instanceView at Hologres control desk
-dHologres specifies the database name in the instance
-cNumber of clients (concurrency)Example: 8
-tThe number of pressure test queries to be executed by each client50
-fsql for pressure measurementExample: 6 sql

Test result reference

Amount of test data:

  • This test is based on TPCH 100G data set, and the specific data volume is shown in the table below
Table nameNumber of rows
LINEITEM600,037,902
ORDERS150,000,000
PARTSUPP80,000,000
PART20,000,000
CUSTOMER15,000,000
SUPPLIER1,000,000
NATION25
REGION5

Cluster specification

computing resource storage capacity Software versionremarks
64 CU
(CPU: 64 Core memory: 256 GB)100 GBr0.10.20Using the default cluster configuration, the number of shards is 40
128 CU
(CPU: 128 Core memory: 512 GB)100 GBr0.10.20Using the default configuration of the cluster, the number of shards is 80
Test time: June 2021

test result

Data import time

  • The execution time of data import is in seconds (s).
  • Import time refers to importing data into the table in Hologres
  • When using the COPY method to import data, a table corresponds to a data file, and the concurrent import method is not used.
  • The specific values are shown in the table below
Note: when importing in COPY mode, one table corresponds to one data file, and concurrent import mode is not used
Table nameNumber of rowsData volumeHologres 64CU
Import using COPY mode (public network)Import using COPY mode (VPC network import)Import using MaxCompute
LINEITEM600,037,90273.6GB3,070.453694.364148.165
ORDERS150,000,00016.4GB691.060172.52937.741
PARTSUPP80,000,0002.3GB468.560107.09218.488
PART20,000,00011.3GB96.34224.0208.083
CUSTOMER15,000,0002.3GB95.19022.93710.363
SUPPLIER1,000,000132MB5.0571.8031.503
NATION252KB0.5800.5840.747
REGION50.375KB0.1680.1530.430
Total106G4427.4101023.482225.52
  • In the following figure, blue indicates the time of importing data in the public network using COPY mode, green indicates the time of importing data in the VPC network using COPY mode, and gray indicates the time of importing data in the MaxCompute mode
  • The lower the ordinate value, the faster the import speed
  • Horizontal axis: table name. Vertical axis: data import time (s)

  • It can be seen that due to the impact of network bandwidth, when importing local file data in COPY mode, the time of importing data using VPC network is significantly shorter than that using public network; Using MaxCompute to import data is significantly shorter than using COPY to import local file data.

Query time

  • Query execution time is in seconds (s).
  • The query results are based on the internal table of Hologres
  • The specific values are shown in the table below
TPCH Query numberHologres 64CUHologres 128CU
13.1202.150
20.5810.467
31.7351.005
41.5580.836
52.9211.917
60.2970.096
72.0061.029
82.6741.679
95.2982.796
101.9440.924
110.3970.297
121.5310.852
131.7410.971
140.2860.160
150.2930.177
161.2231.020
171.4050.607
183.8172.169
191.4000.622
201.3580.868
214.1642.047
221.1210.654
Total40.87023.343
  • In the following figure, blue is the query result of 64CU instances and green is the query result of 128CU instances
  • The lower the ordinate value, the better the TPC-H performance.
  • It can be seen that the query time decreases linearly with the increase of the cost of the instance size
  • Horizontal axis: query in file The number in the. Vertical axis: query execution time (s)

Original link: https://developer.aliyun.com/article/785226?

Copyright notice: the content of this article is spontaneously contributed by Alibaba cloud real name registered users. The copyright belongs to the original author. Alibaba cloud developer community does not own its copyright or bear corresponding legal liabilities. Please refer to Alibaba cloud developer community user service agreement and Alibaba cloud developer community intellectual property protection guidelines for specific rules. If you find any content suspected of plagiarism in the community, fill in the infringement complaint form to report. Once verified, the community will immediately delete the content suspected of infringement.

Topics: Database git shell SQL IDE