Alibaba cloud AI training camp - SQL basics 6: test questions

Posted by apaxson on Sat, 19 Feb 2022 19:29:45 +0100

Download scripts for creating tables and inserting data

Here, for your convenience, the data script has been prepared for you.
Click the link below to download directly
Create data table script: http://tianchi-media.oss-cn-beijing.aliyuncs.com/dragonball/SQL/create_table.sql
Insert data script: http://tianchi-media.oss-cn-beijing.aliyuncs.com/dragonball/SQL/data.zip

After downloading the script, run create in MySQL environment first_ table. SQL script, create a data table, and then extract the downloaded data Zip. The extracted directory is as follows:

8-10ccf_offline_stage1_train.sql
6-winequality-white.sql
5-8-10ccf_online_stage1_train.sql
4-macro industry.sql
3-ccf_offline_stage1_test_revised.sql
2-winequality-red.sql
1-9income statement.sql
1-9company operating.sql
1-7market data.sql

The sequence number in front of the script file name indicates the topic sequence number of the data set, such as 1-7market data SQL indicates that the data set is used in questions 1 and 7.
Similarly, the sql script is also given here, which is the statement to insert data. You can import the data into the data table by simply opening it and running it in MySQL environment.

Exercise 1:

Data source: Dataset - Alibaba cloud Tianchi

Please use the quarterly revenue forecast data sets "Income Statement.xls", "Company Operating.xlsx" and "Market Data.xlsx" of A-share listed companies, with Market Data as the main table and ticker in the three tables_ The information with symbol 600383 and 600048 is combined. Only the following fields need to be displayed.

Table nameField name
Income StatementTICKER_SYMBOL
Income StatementEND_DATE
Income StatementT_REVENUE
Income StatementT_COGS
Income StatementN_INCOME
Market DataTICKER_SYMBOL
Market DataEND_DATE_
Market DataCLOSE_PRICE
Company OperatingTICKER_SYMBOL
Company OperatingINDIC_NAME_EN
Company OperatingEND_DATE
Company OperatingVALUE

result:

select m.TICKER_SYMBOL,m.END_DATE,m.CLOSE_PRICE,i.*,c.* from (SELECT TICKER_SYMBOL,END_DATE,CLOSE_PRICE from `market data` where TICKER_SYMBOL in ('600383','600048')) as m
left JOIN (select TICKER_SYMBOL,END_DATE,T_REVENUE,T_COGS,N_INCOME from `income statement` where TICKER_SYMBOL in ('600383','600048')) as i on i.TICKER_SYMBOL=m.TICKER_SYMBOL
left JOIN (SELECT TICKER_SYMBOL,INDIC_NAME_EN,END_DATE,`VALUE` from `company operating` where TICKER_SYMBOL in ('600383','600048')) as c on c.TICKER_SYMBOL=m.TICKER_SYMBOL 

Exercise 2:

Data source: Dataset - Alibaba cloud Tianchi

Please use Wine Quality Data set "winequality red. CSV" to find out all red wines with pH=3.03, and then rank their city acid in Chinese style (the next ranking in the same ranking should be the next consecutive integer value. In other words, there should be no "interval" between ranking)

result:
 

SELECT pH,`citric acid`,DENSE_RANK() OVER (ORDER BY `citric acid`) as dense_rank 
from `winequality-red`
WHERE pH=3.03;

Exercise 3:

Data source: Tianchi rookie competition o2o coupon usage prediction questions and data - Tianchi competition - Alibaba cloud Tianchi

Using the data set ccf_offline_stage1_test_revised.csv in Coupon Usage Data for O2O, try to find out the merchants with the largest total amount of coupons and the largest number of coupons during July 2016.

Here, only the amount of full reduction is considered, and coupons with a few discounts are not considered.

Results: 1. Merchants with the largest total amount of coupons:

SELECT Merchant_id,
       SUM(SUBSTRING_INDEX(Discount_rate,':',-1)) as Amount
from ccf_offline_stage1_test_revised
WHERE Date_received>='2016-07-01' AND Date_received<='2016-07-31'
GROUP BY Merchant_id 
ORDER BY Amount desc
limit 1

2. Businesses that issue the most Coupons:

SELECT Merchant_id,count(0) as num
from ccf_offline_stage1_test_revised
WHERE Date_received>='2016-07-01' AND Date_received<='2016-07-31'
GROUP BY Merchant_id 
ORDER BY num desc
limit 1

Exercise 4:

Data source: Dataset - Alibaba cloud Tianchi

Please use the sheet index in macro & industry.xlsx, the data set in the quarterly revenue forecast of A-share listed companies_ Data, please calculate the power consumption of the whole society: primary industry: the value of the current month is in which month did the peak of power consumption occur in 2015? And what percentage increase / decrease compared with the same period last year?

Results: 1. In which month did the peak power consumption occur in 2015?

SELECT PERIOD_DATE,MAX(DATA_VALUE) AS maxv
from `macro industry`
WHERE indic_id='2020101522' AND PERIOD_DATE>='2015-01-01' AND PERIOD_DATE<='2015-12-31'
GROUP BY PERIOD_DATE
ORDER BY maxv desc
limit 1;

2. What percentage increase / decrease compared to the same period last year?

select 
(SELECT DATA_VALUE from `macro industry`
where PERIOD_DATE='2015-08-31' and indic_id='2020101522')/(SELECT DATA_VALUE from `macro industry`
where PERIOD_DATE='2014-08-31' and indic_id='2020101522') as addv
from `macro industry`
limit 1;

Exercise 5:

Data source: Tianchi rookie competition o2o coupon usage prediction questions and data - Tianchi competition - Alibaba cloud Tianchi

Using the data set ccf_online_stage1_train.csv in Coupon Usage Data for O2O, try to calculate the overall rejection rate of online coupons during June 2016? And find out the merchants with the highest rejection rate of coupons.

Rejection rate = number of coupons received but not used / total number of coupons received

Results: what was the overall rejection rate of online coupons during June 2016?

select SUM(CASE WHEN Date_received is not NULL AND Date IS NULL THEN 1 ELSE 0 END)/ 
       SUM(CASE WHEN Date_received is not NULL THEN 1 ELSE 0 END)
from ccf_offline_stage1_train
WHERE Date_received BETWEEN '2016-01-01' AND '2016-06-30';

Merchants with the highest rejection rate of coupons?

select Merchant_id,SUM(CASE WHEN Date_received is not NULL AND Date IS NULL THEN 1 ELSE 0 END)/ 
       SUM(CASE WHEN Date_received is not NULL THEN 1 ELSE 0 END) as droped
from ccf_offline_stage1_train
where Date_received BETWEEN '2016-06-01' AND '2016-06-30'
GROUP BY Merchant_id
ORDER BY droped desc
limit 1;

Exercise 6:

Data source: Dataset - Alibaba cloud Tianchi

Please use Wine Quality Data set winequality-white.csv to find out all white wines with pH=3.63, and then rank their residual sugar in English (discontinuous ranking)

result:

SELECT pH,`residual sugar`,RANK() OVER (ORDER BY `residual sugar`) AS RK
from `winequality-white`
WHERE pH=3.63;

Exercise 7:

Data source: Dataset - Alibaba cloud Tianchi

Please use the sheet data in the data set Market Data.xlsx in the quarterly revenue forecast of A-share listed companies,

What are the three industries with the largest market value by the end of 2018? And what are the three companies with the largest market capitalization in these three industries? (find out the top three companies in each industry, that is, a total of 9 companies)

Results: 1. What are the three industries with the largest market value by the end of 2018?

SELECT TYPE_NAME_CN,MAX(MARKET_VALUE) as mm
from `market data`
WHERE YEAR(END_DATE)=2018
GROUP BY TYPE_NAME_CN
ORDER BY mm
 limit 3;

2. What are the three largest companies?

SELECT * FROM
(
SELECT TYPE_NAME_CN,MAX(MARKET_VALUE) as mm
from `market data`
WHERE YEAR(END_DATE)=2018
GROUP BY TYPE_NAME_CN
ORDER BY mm
 limit 3
) as ind3
LEFT JOIN
(
SELECT TYPE_NAME_CN,MARKET_VALUE,ROW_NUMBER() OVER (PARTITION BY TYPE_NAME_CN ORDER BY MARKET_VALUE) AS Comrank
from `market data`
WHERE YEAR(END_DATE)=2018
GROUP BY TYPE_NAME_CN
ORDER BY mm
 limit 3
) as com3
on ind3.TYPE_NAME_CN=com3.TYPE_NAME_CN
WHERE com3.Comrank<=3

Exercise 8:

Data source: Tianchi rookie competition o2o coupon usage prediction questions and data - Tianchi competition - Alibaba cloud Tianchi

Using the data sets ccf_online_stage1_train.csv and ccf_offline_stage1_train.csv in Coupon Usage Data for O2O, try to find out the customers who used the most online and offline coupons during June 2016.

result:

select Merchant_id,SUM(CASE WHEN Date_received IS NOT NULL AND Date IS NOT NULL THEN 1 ELSE 0 END) as used
from ccf_offline_stage1_train
where Date_received BETWEEN '2016-06-01' AND '2016-06-30'
GROUP BY User_id
ORDER BY used desc
limit 1;

Exercise 9:

Data source: Dataset - Alibaba cloud Tianchi

Please use sheet general business in Income Statement.xls and sheet en in Company Operating.xlsx, the quarterly revenue forecast data set of A-share listed companies.

Find out the corresponding net profit of the quarter with the highest passenger throughput of Baiyun Airport in all years of the data set? (note that it is the net profit corresponding to a single quarter, not the cumulative net profit.)

result:

SELECT * FROM 
(SELECT TICKER_SYMBOL, YEAR(END_DATE) Year, QUARTER(END_DATE) QUARTER, SUM(VALUE) Amount 
FROM `company operating` 
WHERE INDIC_NAME_EN = 'Baiyun Airport:Passenger throughput' 
GROUP BY TICKER_SYMBOL,YEAR(END_DATE),QUARTER(END_DATE) 
ORDER BY SUM(VALUE) DESC LIMIT 1 ) BaseData 
LEFT JOIN 
(SELECT TICKER_SYMBOL, YEAR(END_DATE) Year, QUARTER(END_DATE) QUARTER, SUM(N_INCOME) Amount 
FROM `income statement` 
GROUP BY TICKER_SYMBOL,YEAR(END_DATE),QUARTER(END_DATE) ) Income 
ON BaseData.TICKER_SYMBOL = Income.TICKER_SYMBOL AND BaseData.Year = Income.Year AND BaseData.QUARTER = Income.QUARTER

Exercise 10:

Data source: Tianchi rookie competition o2o coupon usage prediction questions and data - Tianchi competition - Alibaba cloud Tianchi

Using the data sets ccf_online_stage1_train.csv and ccf_offline_stage1_train.csv in Coupon Usage Data for O2O, try to find out the top three merchants who have been used with the most coupons online and offline during June 2016.

For example, if merchant A uses A 200 minus 50 coupon and consumer B uses A 30 minus 1 coupon, merchant A will be deducted by 51 yuan.

result:

SELECT Merchant_id, SUM(discount_amount) discount_amount 
FROM 
(SELECT Merchant_id, SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount 
FROM `ccf_online_stage1_train` 
WHERE (Date != 'null' AND Coupon_id != 'null') AND (LEFT(DATE,4)=2016 ) AND MID(DATE,5,2) = '06' 
GROUP BY Merchant_id 
UNION ALL 
SELECT Merchant_id, SUM(SUBSTRING_INDEX(`Discount_rate`,':',-1)) AS discount_amount 
FROM `ccf_offline_stage1_train` 
WHERE (Date != 'null' AND Coupon_id != 'null') AND (LEFT(DATE,4)=2016 ) AND MID(DATE,5,2) = '06' 
GROUP BY Merchant_id ) BaseData
GROUP BY Merchant_id
ORDER BY  SUM(discount_amount) DESC
LIMIT 1;

The answers to this exercise can be answered in Tianchi official account: Tianchi big data research platform: SQL training camp.

Topics: SQL AI Alibaba Cloud