Druid supports SQL
[ EXPLAIN PLAN FOR ] [ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ] SELECT [ ALL | DISTINCT ] { * | exprs } FROM table [ WHERE expr ] [ GROUP BY exprs ] [ HAVING expr ] [ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ] [ LIMIT limit ] [ UNION ALL <another query> ]
Duration data type
The duration data type specifies the time interval.
The time interval is specified in the format "PnYnMnDTnHnMnS", where:
-
P for period (required)
-
nY is the number of years
-
nM is the number of months
-
nD for days
-
T is the start of the time period (required if you want to specify hours, minutes or seconds)
-
nH for hours
-
nM for minutes
-
nS for seconds
-
Current "timestamp: the current timestamp in the connection time zone.
-
Current date: the current date in the connection time zone.
-
INTERVAL:
**Event selection: * * you can select any event to view the corresponding data
-
Number of times (pv): the number of times the user does this, > [count()]
-
Number of users (uv): the number of users doing this, > [?]
-
Per capita times (pv/uv): the average number of times a person does this, or the combination of events can be used to check (e.g. event total video play success)
-
Per capita times of all active users: total number of active users, how many times per person on average, can also be checked with combined events
- We can mainly check the number of advertisements per capita
[how to count active users]
-
Press... Sum: sum the number of a parameter. The statistical time length is often used
- You can check the time length by summing playTime. Pay attention to remove the outliers and check the activity, time length and retention of a game
- play_game: play time, sum by play time (MS)
- Playtime > 2000 (only when the time length is more than 2s)
- Playtime < 2160000 (less than abnormal value)
-
Press... Average: not commonly used
-
Press... Calculate per capita value: not commonly used
-
Press... Quantile:
-
Press... To repeat
- To de duplicate the event parameters, you can use this to check the number of games pulled by our platform every day, and the total number of games pulled by our platform -- [press... To de duplicate count(distinct)]
- Play \
-
Press... De duplication with users
- The event parameters and users who have done the event are de duplicated. If there are 100 people playing game A, 200 people playing game B and 300 people playing game C in the platform, the number found in this way is 100 + 200 + 300. With this, the number of games per capita in A certain version / experiment can be checked
- Play \
-
Press... View distribution
- You can see the frequency of an event and check the frequency distribution of an event
- Event? Scratcher? Show? Reward? Dialog: interactive game general reward pop-up
**New indicators and combined indicators: * * several indicators can be added, subtracted, multiplied and divided
Data query:
-
Time interval: (month, week, day, hour, 5 minutes)
-
Data query scheme (sql + json),
- Query event: (play_game: directly pull up the game I played)
index
-
Times (pv)
Total number of play game events
-- SQL SELECT count(*) as "PV" FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
// json { "queryType":"groupBy", "dataSource":"currency_record", "granularity":"hour", "aggregations":[ { "type":"count", "name":"count" } ], "intervals":[ "2000-01-01T00:00Z/3000-01-01T00:00Z" ] }
- Number of people (uv)
-- SQL SELECT count(DISTINCT "android_id") as "UV" FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
// json { "queryType":"groupBy", "dataSource":"currency_record", "granularity":"day", "aggregations":[ { "type":"count", "name":"count" } ], "intervals":[ "2000-01-01T00:00Z/3000-01-01T00:00Z" ] }
- Times per capita (pv/uv)
-- sql SELECT SUM("recordtype") as "sum_recordtype", COUNT(DISTINCT "imei" ) as "imei_count", TRUNCATE(SUM("recordtype")/(COUNT(DISTINCT "imei" )*1.0), 2) as "avg_recordtype" FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
{ // Query daily UV. 51s. 30s. 25s average. 353s // --The query type corresponds to the aggregation type with 3 type values, // --Timeseries topN groupbyif you want to group and aggregate a single dimension and indicator, it is recommended to use topN query, // --Higher performance can be achieved, grouping is suitable for multi-dimensional, multi index aggregate query // --timeseries time series query returns the aggregated result set for the specified time period according to the query rules, // --In query rules, you can set query strength, result sorting method and filter criteria. The filter criteria can use nested filtering and support post aggregation "queryType":"timeseries", "dataSource":"currency_record", "granularity":"day", "aggregations":[ { "type":"count", "name":"pv", "fieldName":"uuid", }, { // thetaSketch "type":"thetaSketch", "name":"uv", "fieldName":"uuid", } ], "postAggregations": [ { "type": "arithmetic", "name": "click_rate", "fn": "/", "fields": [ { "type": "hyperUniqueCardinality", "name": "click_cnt", "fieldName": "pv" }, { // If the type uses fieldAccess, an error will be reported, indicating that it cannot be converted to Number "type": "hyperUniqueCardinality", "name": "pv_cnt", "fieldName": "uv" }, ], } ] "intervals":[ "2000-01-01T00:00Z/3000-01-01T00:00Z" ] }
- Press... Summation
-- sql SELECT SUM("recordtype") FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY AND "recordtype" <= 40 AND "recordtype" > 25
// json { "queryType":"timeseries", "dataSource":"currency_record", "granularity":"day", "aggregations":[ { "type":"count", "name":"recordtype", "fieldName":"recordtype", }, ], "filter": { "type": "bound", "dimension": "recordtype", "lower": 25, "upper": 40 "alphaNumeric": true }, "intervals":[ "2000-01-01T00:00Z/3000-01-01T00:00Z" ] }
- Press... Average value
-- sql SELECT SUM("recordtype") as "sum_recordtype", COUNT(*) as "count", AVG("recordtype") as "avg_recordtype" FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY AND "recordtype" <= 40 AND "recordtype" > 25
// json { "queryType":"timeseries", "dataSource":"currency_record", "granularity":"month", "aggregations":[ { "type":"longSum", "name":"recordtype", "fieldName":"recordtype", }, { "type":"count", "name":"pv", "fieldName":"uuid", } ], "postAggregations": [ { "type": "arithmetic", "name": "avg", "fn": "/", "fields": [ { "type": "hyperUniqueCardinality", "name": "click_cnt", "fieldName": "recordtype" }, { // If the type uses fieldAccess, an error will be reported, indicating that it cannot be converted to Number "type": "hyperUniqueCardinality", "name": "pv_cnt", "fieldName": "pv" }, ], } ] "filter": { "type": "bound", "dimension": "recordtype", "lower": 25, "upper": 40 "alphaNumeric": true }, "intervals":[ "2000-01-01T00:00Z/3000-01-01T00:00Z" ] }
- Press... Seek per capita value
-- SQL SELECT SUM("recordtype") as "sum_recordtype", COUNT(DISTINCT "imei" ) as "imei_count", SUM("recordtype")/COUNT(DISTINCT "imei" ) as "avg_recordtype" FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY AND "recordtype" <= 40 AND "recordtype" > 25
// JSON { "queryType":"timeseries", "dataSource":"currency_record", "granularity":"month", "aggregations":[ { "type":"longSum", "name":"recordtype", "fieldName":"recordtype", }, { "type":"thetaSketch", "name":"imeiCount", "fieldName":"imei", } ], "postAggregations": [ { "type": "arithmetic", "name": "avg", "fn": "/", "fields": [ { "type": "hyperUniqueCardinality", "name": "click_cnt", "fieldName": "recordtype" }, { // If the type uses fieldAccess, an error will be reported, indicating that it cannot be converted to Number "type": "hyperUniqueCardinality", "name": "pv_cnt", "fieldName": "imeiCount" }, ], } ] "filter": { "type": "bound", "dimension": "recordtype", "lower": 25, "upper": 40 "alphaNumeric": true }, "intervals":[ "2000-01-01T00:00Z/3000-01-01T00:00Z" ] }
- Press... Quantile
-- SQL SELECT "Time", sum("=1") AS "=1" ,sum("2~3") AS "2~3" ,sum("4~8") AS "4~8", sum("9~16") AS "9~16", sum("17~25") AS "17~25" from ( SELECT "Time", case WHEN "Count1" = 1 then 1 ELSE 0 end as "=1", case WHEN "Count1" > 1 and "Count1" >= 3 then 1 ELSE 0 end as "2~3", case WHEN "Count1" > 3 and "Count1" <= 8 then 1 ELSE 0 end as "4~8", case WHEN "Count1" > 8 and "Count1" <= 16 then 1 ELSE 0 end as "9~16", case WHEN "Count1" > 16 and "Count1" <= 25 then 1 ELSE 0 end as "17~25", case WHEN "Count1" > 25 then 1 ELSE 0 end as ">25" from ( SELECT TIME_FLOOR("__time", 'PT5M') AS "Time", "uuid", COUNT(*) AS "Count1" FROM "app_time" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR AND "__time" <= CURRENT_TIMESTAMP GROUP BY 1, 2 ORDER BY "Time" ASC )) GROUP BY 1 ORDER BY "Time"
// JSON
Type conversion
- Ask for crowd classification
SELECT SUM("recordtype") as "sum_recordtype", COUNT(DISTINCT "imei" ) as "imei_count", TRUNCATE(SUM("recordtype")/(COUNT(DISTINCT "imei" )*1.0), 2) as "avg_recordtype" FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY AND "uuid" IN (SELECT DISTINCT "uuid" FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY AND "uuid" LIKE '%b31%')
- Press... To view the distribution
SELECT "Time", SUM("A"), SUM("B"), SUM("C"), SUM("D") FROM ( SELECT "Time", (case when "Count" <= 2 then 1 ELSE 0 end) as A, (case when "Count" > 2 AND "Count" <= 6 then 1 ELSE 0 end) as B, (case when "Count" > 6 AND "Count" <= 10 then 1 ELSE 0 end) as C, (case when "Count" > 10 then 1 ELSE 0 end) as D FROM ( SELECT TIME_FLOOR("__time", 'PT5M') AS "Time", "uuid", COUNT(*) AS "Count" FROM "app_time" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR AND "__time" <= CURRENT_TIMESTAMP GROUP BY 1, "uuid" ) ) GROUP BY 1 ORDER BY "Time"
SELECT count(*) as "PV" , count(DISTINCT "android_id") as "UV", count(*)/count(DISTINCT "android_id") as "PV/UV" FROM "currency_record" WHERE "__time" >= CURRENT_TIMESTAMP - INTERVAL '1' DAY