Download scripts for creating tables and inserting data
Here, for your convenience, the data script has been prepared for you.
Click the link below to download directly
Create data table script: http://tianchi-media.oss-cn-beijing.aliyuncs.com/dragonball/SQL/create_table.sql
Insert data script: http://tianchi-media.oss-cn-beijing.aliyuncs.com/dragonball/SQL/data.zip
After downloading the script, run create in MySQL environment first_ table. SQL script, create a data table, and then extract the downloaded data Zip. The extracted directory is as follows:
8-10ccf_offline_stage1_train.sql 6-winequality-white.sql 5-8-10ccf_online_stage1_train.sql 4-macro industry.sql 3-ccf_offline_stage1_test_revised.sql 2-winequality-red.sql 1-9income statement.sql 1-9company operating.sql 1-7market data.sql
The sequence number in front of the script file name indicates the topic sequence number of the data set, such as 1-7market data SQL indicates that the data set is used in questions 1 and 7.
Similarly, the sql script is also given here, which is the statement to insert data. You can import the data into the data table by simply opening it and running it in MySQL environment.
Exercise 1:
Data source: Dataset - Alibaba cloud Tianchi
Please use the quarterly revenue forecast data sets "Income Statement.xls", "Company Operating.xlsx" and "Market Data.xlsx" of A-share listed companies, with Market Data as the main table and ticker in the three tables_ The information with symbol 600383 and 600048 is combined. Only the following fields need to be displayed.
Table name | Field name |
---|---|
Income Statement | TICKER_SYMBOL |
Income Statement | END_DATE |
Income Statement | T_REVENUE |
Income Statement | T_COGS |
Income Statement | N_INCOME |
Market Data | TICKER_SYMBOL |
Market Data | END_DATE_ |
Market Data | CLOSE_PRICE |
Company Operating | TICKER_SYMBOL |
Company Operating | INDIC_NAME_EN |
Company Operating | END_DATE |
Company Operating | VALUE |
result:
select m.TICKER_SYMBOL,m.END_DATE,m.CLOSE_PRICE,i.*,c.* from (SELECT TICKER_SYMBOL,END_DATE,CLOSE_PRICE from `market data` where TICKER_SYMBOL in ('600383','600048')) as m left JOIN (select TICKER_SYMBOL,END_DATE,T_REVENUE,T_COGS,N_INCOME from `income statement` where TICKER_SYMBOL in ('600383','600048')) as i on i.TICKER_SYMBOL=m.TICKER_SYMBOL left JOIN (SELECT TICKER_SYMBOL,INDIC_NAME_EN,END_DATE,`VALUE` from `company operating` where TICKER_SYMBOL in ('600383','600048')) as c on c.TICKER_SYMBOL=m.TICKER_SYMBOL
Exercise 2:
Data source: Dataset - Alibaba cloud Tianchi
Please use Wine Quality Data set "winequality red. CSV" to find out all red wines with pH=3.03, and then rank their city acid in Chinese style (the next ranking in the same ranking should be the next consecutive integer value. In other words, there should be no "interval" between ranking)
result:
SELECT pH,`citric acid`,DENSE_RANK() OVER (ORDER BY `citric acid`) as dense_rank from `winequality-red` WHERE pH=3.03;
Exercise 3:
Using the data set ccf_offline_stage1_test_revised.csv in Coupon Usage Data for O2O, try to find out the merchants with the largest total amount of coupons and the largest number of coupons during July 2016.
Here, only the amount of full reduction is considered, and coupons with a few discounts are not considered.
Results: 1. Merchants with the largest total amount of coupons:
SELECT Merchant_id, SUM(SUBSTRING_INDEX(Discount_rate,':',-1)) as Amount from ccf_offline_stage1_test_revised WHERE Date_received>='2016-07-01' AND Date_received<='2016-07-31' GROUP BY Merchant_id ORDER BY Amount desc limit 1
2. Businesses that issue the most Coupons:
SELECT Merchant_id,count(0) as num from ccf_offline_stage1_test_revised WHERE Date_received>='2016-07-01' AND Date_received<='2016-07-31' GROUP BY Merchant_id ORDER BY num desc limit 1
Exercise 4:
Data source: Dataset - Alibaba cloud Tianchi
Please use the sheet index in macro & industry.xlsx, the data set in the quarterly revenue forecast of A-share listed companies_ Data, please calculate the power consumption of the whole society: primary industry: the value of the current month is in which month did the peak of power consumption occur in 2015? And what percentage increase / decrease compared with the same period last year?
Results: 1. In which month did the peak power consumption occur in 2015?
SELECT PERIOD_DATE,MAX(DATA_VALUE) AS maxv from `macro industry` WHERE indic_id='2020101522' AND PERIOD_DATE>='2015-01-01' AND PERIOD_DATE<='2015-12-31' GROUP BY PERIOD_DATE ORDER BY maxv desc limit 1;
2. What percentage increase / decrease compared to the same period last year?
select (SELECT DATA_VALUE from `macro industry` where PERIOD_DATE='2015-08-31' and indic_id='2020101522')/(SELECT DATA_VALUE from `macro industry` where PERIOD_DATE='2014-08-31' and indic_id='2020101522') as addv from `macro industry` limit 1;
Exercise 5:
Using the data set ccf_online_stage1_train.csv in Coupon Usage Data for O2O, try to calculate the overall rejection rate of online coupons during June 2016? And find out the merchants with the highest rejection rate of coupons.
Rejection rate = number of coupons received but not used / total number of coupons received
Results: what was the overall rejection rate of online coupons during June 2016?
select SUM(CASE WHEN Date_received is not NULL AND Date IS NULL THEN 1 ELSE 0 END)/ SUM(CASE WHEN Date_received is not NULL THEN 1 ELSE 0 END) from ccf_offline_stage1_train WHERE Date_received BETWEEN '2016-01-01' AND '2016-06-30';
Merchants with the highest rejection rate of coupons?
select Merchant_id,SUM(CASE WHEN Date_received is not NULL AND Date IS NULL THEN 1 ELSE 0 END)/ SUM(CASE WHEN Date_received is not NULL THEN 1 ELSE 0 END) as droped from ccf_offline_stage1_train where Date_received BETWEEN '2016-06-01' AND '2016-06-30' GROUP BY Merchant_id ORDER BY droped desc limit 1;
Exercise 6:
Data source: Dataset - Alibaba cloud Tianchi
Please use Wine Quality Data set winequality-white.csv to find out all white wines with pH=3.63, and then rank their residual sugar in English (discontinuous ranking)
result:
SELECT pH,`residual sugar`,RANK() OVER (ORDER BY `residual sugar`) AS RK from `winequality-white` WHERE pH=3.63;
Exercise 7:
Data source: Dataset - Alibaba cloud Tianchi
Please use the sheet data in the data set Market Data.xlsx in the quarterly revenue forecast of A-share listed companies,
What are the three industries with the largest market value by the end of 2018? And what are the three companies with the largest market capitalization in these three industries? (find out the top three companies in each industry, that is, a total of 9 companies)
Results: 1. What are the three industries with the largest market value by the end of 2018?
SELECT TYPE_NAME_CN,MAX(MARKET_VALUE) as mm from `market data` WHERE YEAR(END_DATE)=2018 GROUP BY TYPE_NAME_CN ORDER BY mm limit 3;
2. What are the three largest companies?
SELECT * FROM ( SELECT TYPE_NAME_CN,MAX(MARKET_VALUE) as mm from `market data` WHERE YEAR(END_DATE)=2018 GROUP BY TYPE_NAME_CN ORDER BY mm limit 3 ) as ind3 LEFT JOIN ( SELECT TYPE_NAME_CN,MARKET_VALUE,ROW_NUMBER() OVER (PARTITION BY TYPE_NAME_CN ORDER BY MARKET_VALUE) AS Comrank from `market data` WHERE YEAR(END_DATE)=2018 GROUP BY TYPE_NAME_CN ORDER BY mm limit 3 ) as com3 on ind3.TYPE_NAME_CN=com3.TYPE_NAME_CN WHERE com3.Comrank<=3
Exercise 8:
Using the data sets ccf_online_stage1_train.csv and ccf_offline_stage1_train.csv in Coupon Usage Data for O2O, try to find out the customers who used the most online and offline coupons during June 2016.
result:
select Merchant_id,SUM(CASE WHEN Date_received IS NOT NULL AND Date IS NOT NULL THEN 1 ELSE 0 END) as used from ccf_offline_stage1_train where Date_received BETWEEN '2016-06-01' AND '2016-06-30' GROUP BY User_id ORDER BY used desc limit 1;
Exercise 9:
Data source: Dataset - Alibaba cloud Tianchi
Please use sheet general business in Income Statement.xls and sheet en in Company Operating.xlsx, the quarterly revenue forecast data set of A-share listed companies.
Find out the corresponding net profit of the quarter with the highest passenger throughput of Baiyun Airport in all years of the data set? (note that it is the net profit corresponding to a single quarter, not the cumulative net profit.)
result:
SELECT * FROM (SELECT TICKER_SYMBOL, YEAR(END_DATE) Year, QUARTER(END_DATE) QUARTER, SUM(VALUE) Amount FROM `company operating` WHERE INDIC_NAME_EN = 'Baiyun Airport:Passenger throughput' GROUP BY TICKER_SYMBOL,YEAR(END_DATE),QUARTER(END_DATE) ORDER BY SUM(VALUE) DESC LIMIT 1 ) BaseData LEFT JOIN (SELECT TICKER_SYMBOL, YEAR(END_DATE) Year, QUARTER(END_DATE) QUARTER, SUM(N_INCOME) Amount FROM `income statement` GROUP BY TICKER_SYMBOL,YEAR(END_DATE),QUARTER(END_DATE) ) Income ON BaseData.TICKER_SYMBOL = Income.TICKER_SYMBOL AND BaseData.Year = Income.Year AND BaseData.QUARTER = Income.QUARTER
Exercise 10:
Using the data sets ccf_online_stage1_train.csv and ccf_offline_stage1_train.csv in Coupon Usage Data for O2O, try to find out the top three merchants who have been used with the most coupons online and offline during June 2016.
For example, if merchant A uses A 200 minus 50 coupon and consumer B uses A 30 minus 1 coupon, merchant A will be deducted by 51 yuan.
result:
SELECT Merchant_id, SUM(discount_amount) discount_amount FROM (SELECT Merchant_id, SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount FROM `ccf_online_stage1_train` WHERE (Date != 'null' AND Coupon_id != 'null') AND (LEFT(DATE,4)=2016 ) AND MID(DATE,5,2) = '06' GROUP BY Merchant_id UNION ALL SELECT Merchant_id, SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount FROM `ccf_offline_stage1_train` WHERE (Date != 'null' AND Coupon_id != 'null') AND (LEFT(DATE,4)=2016 ) AND MID(DATE,5,2) = '06' GROUP BY Merchant_id ) BaseData GROUP BY Merchant_id ORDER BY SUM(discount_amount) DESC LIMIT 1;
The answers to this exercise can be answered in Tianchi official account: Tianchi big data research platform: SQL training camp.