function
View all built-in functions
show functions;
How to use query function
desc function [extended]Detailed display function name
- UDF one in one out is measured by line
- UDAF enters one more place
- UDTF one in many out
UDF
NVL: assign a value to the data whose value is NULL. Its format is NVL (value, default_value). Its function is to return default if value is NULL_ Value, otherwise return the value of value. If both parameters are NULL, return NULL.
CASE WHEN THEN ELSE END
data source
dept_id sex name of a fictitious monkey with supernatural powers A male sea A male Song song B male Miss Luo Yu feng A female Sister Ting B female Tingting B female
select dept_id, sum(case sex when 'male' then 1 else 0 end) male_count, sum(case sex when 'female' then 1 else 0 end) female_count from emp_sex group by dept_id;
Equivalent to if (sex = 'male', 1,0)
CONCAT
Splice string
CONCAT_WS(separator, str1, str2,...)
The first is a separator
COLLECT_SET(col)
The function only accepts basic data types. Its main function is to de summarize the values of a field and generate an Array type field.
SET can be replaced by list, and the difference is whether to de duplicate
UDTF
explode
You can split an array into multiple rows of data
Split (field name, ")
Same as java split
lateral view
Usage: final view udtf (expression) tablealias as columnalias
Explanation: used with split, expand and other udtfs. It can split a column of data into multiple rows of data. On this basis, it can aggregate the split data.
Example:
string pageid | Array adid_list |
---|---|
"front_page" | [1, 2, 3] |
"contact_page" | [3, 4, 5] |
SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable(Profile table name) AS adid;
string pageid | int adid |
---|---|
"front_page" | 1 |
"front_page" | 2 |
"front_page" | 3 |
"contact_page" | 3 |
"contact_page" | 4 |
"contact_page" | 5 |
Window function
Over
The window function needs to be used with over
mysql8. Window functions are supported after 0
OVER(): Specifies the size of the data window in which the analysis function works. The size of the data window may change with the change of rows
partition by: can be understood as being divided into several windows
CURRENT ROW: CURRENT ROW
N predicting: data in the previous n rows
n FOLLOWING: data in the next n rows
UNBOUNDED: starting point,
Unbounded forecasting means starting from the front and UNBOUNDED FOLLOWING means ending at the back
LAG(col,n, default): data in the nth row ahead
LEAD(col,n, default_val): data of the nth row in the future
NTILE(n): distribute the rows of the ordered window to the groups of the specified data. Each group has a number, starting from 1. For each row, NTILE returns the number of the group to which the row belongs. Note: n must be of type int.
Data preparation: name, orderdate, cost
jack,2017-01-01,10 tony,2017-01-02,15 jack,2017-02-03,23 tony,2017-01-04,29 jack,2017-01-05,46 jack,2017-04-06,42 tony,2017-01-07,50 jack,2017-01-08,55 mart,2017-04-08,62 mart,2017-04-09,68 neil,2017-05-10,12 mart,2017-04-11,75 neil,2017-06-12,80 mart,2017-04-13,94
(1) Query the customers and total number of people who purchased in April 2017
select name,count(*) over () from business where substring(orderdate,1,7) = '2017-04' group by name;
(2) Query the customer's purchase details and monthly total purchase amount
select name,orderdate,cost,sum(cost) over(partition by month(orderdate)) from business;
The cost of each customer is accumulated according to the date
select *,sum(cost) over(partition by name order by orderdate)from business; orderby Equivalent to accumulation select *,sum(cost) over(partition by name rows between ubounded preceding[Start line] and current row[Current row])from business;
result
jack 2017-01-05 46 46 jack 2017-01-08 55 101 jack 2017-01-01 10 111 jack 2017-04-06 42 153 jack 2017-02-03 23 176 mart 2017-04-13 94 94 mart 2017-04-11 75 169 mart 2017-04-09 68 237 mart 2017-04-08 62 299 neil 2017-05-10 12 12 neil 2017-06-12 80 92 tony 2017-01-04 29 29 tony 2017-01-02 15 44 tony 2017-01-07 50 94
Note: if you encounter the same value, it will be treated as a window function
example
id 1 2 3 3 select id ,sum(id) over(order by id) from test; id sum(id) 1 1 2 3 3 6 3 6
Rank
Duplicate sorting is allowed when RANK(), and the total number will not change
DENSE_ Duplicate sorting is allowed during rank() sorting, and the total number will be reduced
ROW_NUMBER() cannot be repeated and will be calculated in order
select *,rank() over(order by score desc) from score; Sun WuKong mathematics 95 1 sea language 94 2 Sun WuKong language 87 3 Song song mathematics 86 4 Tingting mathematics 85 5 Song song English 84 6 sea English 84 6 Tingting English 78 8 Sun WuKong English 68 9 Tingting language 65 10 Song song language 64 11 sea mathematics 56 12
Song song is on the sea. Because of the shuffle reverse overflow, the data are the same and not sorted
Other common functions
Common date functions
unix_timestamp: returns the timestamp of the current or specified time (obsolete)
select unix_timestamp(); unix_timestamp(void) is deprecated. Use current_timestamp instead. unix_timestamp(void) is deprecated. Use 1628579776 ---------------------------------------- select unix_timestamp('2021-8-10','YYYY-MM-DD'); 1609027200
current_timestamp
hive (default)> select current_timestamp; OK _c0 2021-08-10 15:17:37.051
from_unixtime(time,format): convert timestamp to date format
select from_unixtime(1603843200); 2020-10-28 00:00:00
current_date: get the current date
select current_date;
to_date: extract the date part
select to_date('2020-10-28 12:12:12'); 2020-10-28 ---------------------------------------- The same is true below, but the date format must be correct
Year: get year
select year('2020-10-28 12:12:12');
Month: get month
select month('2020-10-28 12:12:12');
Day: get day
select day('2020-10-28 12:12:12');
hour: when getting
select hour('2020-10-28 12:12:12');
minute: get score
select minute('2020-10-28 12:12:12');
Second: gets the second
select second('2020-10-28 12:12:12');
weekofyear: the current time is the week of the year
Day of month: the current time is the day of the month
select weekofyear('2021-8-10 12:12:12'); 32 ---------------------------------------- select dayofmonth('2021-8-35 12:12:12'); 4 //Even if it exceeds the normal month / date size, no error will be reported, and it will automatically enter the next month
months_between: the month between two dates, before minus after
select months_between('2020-04-01','2021-10-28'); -18.87096774
add_months: date plus or minus month
select add_months('2020-8-28',10); 2021-06-28
datediff: the number of days between two dates
select datediff('2020-11-04','2021-8-28'); -297
date_add: date plus days
date_sub: date minus days
select date_add('2020-10-28',4); select date_sub('2020-10-28',-4);
last_day: the last day of the month of the date
select last_day('2020-08-40'); 2020-09-30
date_format(date/timestamp/string, fmt) formats the date
select date_format('2020-10-28 12:12:12','yyyy/MM/dd HH:mm:ss'); 2020/10/28 12:12:12
Data rounding
Round: round
select round(3.14); 3 select round(3.54); 4
ceil: round up
select ceil(3.14); select ceil(3.54); 4
floor: round down
select floor(3.14); select floor(3.54); 3
String operation
It is similar to Java's string method
upper: convert to uppercase
lower: convert to lowercase
Length: length
select length("atguigu");
trim: remove spaces before and after
select trim(" atguigu ");
lpad: fill to the left to the specified length
select lpad('test',5,'g');
rpad: fill right to the specified length
select rpad('test',5,'g');
regexp_replace: use regular expression to match the target string. Replace after matching successfully!
sub: intercept string
split: splits the string by the specified character
Set function
Similar to Java collection functions
size: the number of elements in the collection
select size(col) from table_name;
map_keys: returns the key in the map
select map_keys(col) from table_name;
map_values: returns the value in the map
select map_values(col) from table_name;
array_contains: determines whether an element is included in the array
select array_contains(col,'test') from table_name;
sort_array: sort the elements in the array
select sort_array(col) from table_name;