Summary of SQL interview questions

Posted by Paghilom on Sun, 02 Jan 2022 20:55:27 +0100

1. Table t_ act_ The records table contains two fields: uid (user ID), imp_date (date yyyy MM DD) and continuous time

1) Calculate the maximum number of consecutive check-in days for each user every month in 2020
2) It is calculated that there will be a list of logged in users for two consecutive days every month in 2020
3) Calculate the number of users logged in every month for 5 consecutive days in 2020

1) Calculate the maximum number of consecutive check-in days for each user every month in 2020

with temp_1 as (
	select distinct uid, imp_date
	from t_act_records
	where year(imp_date)=2020
	)
	,temp_2 as(
	select uid, imp_date,row_number()over(partition by uid,month(imp_date) order by imp_date) rank
	from temp_1
	)
	,temp_3 as(
	select uid,month(imp_date)as month,date_sub(imp_date,rank),count(1) as cnt
	from temp_2
	group by 1,2,3)
select month,uid,max(cnt)
from temp_3
group by 1,2

2) It is calculated that there will be a list of logged in users for two consecutive days every month in 2020

with temp_1 as (
	select distinct uid, imp_date
	from t_act_records
	where year(imp_date)=2020
	)
	,temp_2 as(
	select uid, imp_date,lead(impt_date,1) over(partiton by month(impt_date),uid order by imp_date) Second landing date
	from temp_1
	)
	,temp_3 as (
	select uid,month(impt_date),datediff(Second landing date,imp_date) Difference
	from temp_2
	)
select month(imp_date),uid
from temp_3
group by 1,2
having Difference=1

Method 2

with temp_1 as (
	select distinct uid, imp_date
	from t_act_records
	where year(imp_date)=2020
	)
	,temp_2 as(
	select uid, imp_date,row_number()over(partition by uid,month(imp_date) order by imp_date) rank
	from temp_1
	)
	,temp_3 as(
	select uid,month(imp_date)as month,date_sub(imp_date,rank) as diff
	from temp_2
	)
select month,uid,
from temp_3
group by 1,2
having count(diff)>2

3) Calculate the number of users logged in every month for 5 consecutive days in 2020

with temp_1 as (
	select distinct uid, imp_date
	from t_act_records
	where year(imp_date)=2020
	)
	,temp_2 as(
	select uid, imp_date,row_number()over(partition by uid,month(imp_date) order by imp_date) rank
	from temp_1
	)
	,temp_3 as(
	select uid,month(imp_date)as month,date_sub(imp_date,rank) as diff
	from temp_2
	)
select month,count(distinct uid),
from temp_3
group by 1
having count(diff)>5

– method 2

with temp_1 as (
	select distinct uid, imp_date
	from t_act_records
	where year(imp_date)=2020
	)
	,temp_2 as(
	select uid, imp_date,row_number()over(partition by uid,month(imp_date) order by imp_date) rank
	from temp_1
	)
	,temp_3 as(
	select uid,month(imp_date)as month,date_sub(imp_date,rank) as diff
	from temp_2
	)
select month,uid,
from temp_3
group by 1,2
having count(diff)>2

Note: (if there is an aggregate function in the select statement, other retrieved columns should also be included in the aggregate function or group by clause.)

2. Course sales order form t_ order_ The records table contains four fields: uid (user ID), order_time (date), order_id (order ID), subject (subject) aggregation function

1) Find the record of the first order of each user. If an order containing multiple courses is placed at the same time, it will be sorted according to the order of Chinese, mathematics and English

select uid,subject,min(order_time)
from t_order_records
group by uid,subject
order by field('subject','language','mathematics','English')

2) Count the daily order quantity and the order quantity up to the current day

with temp_1 as (
	select date(order_time) time_1,count(oder_id) as cnt_today
	from t_order_records
	group by 1
	)
	,temp_2 as (
	select date(order_time) time_2,sum(order_id) over(partition by date(order_time)) as cnt_sum
	from t_order_records
	)
select cnt_today,cnt_sum
from temp_1 
left join temp_2
on temp_1.time_1 = temp_2.time_2

3. Student transcript t_ student_ The score table contains four fields: class_ Class ID, student_id, course_time, score

Find out the three records with the highest scores of each student (hint: there is no gap between dense_rank rankings)
Find out the students who are higher than the average score of each class and course

1) Find the top three records of each student

(Note: there is no gap between the dense_rank rankings)

with temp_1 as (
select *,row_number() over (partition by studemt_id order by score desc)as rank
from t_student_score)
select class_id,student_id,course_time,score
from temp_1
where rank<=3

2) Find out the students who are higher than the average score of each class and course

with temp_2 as (
select class_id,course_id,avg(score)as avg_socre
from t_student_score
group by 1,2
)
select student_id
from t_student_score
left join temp_2
on t_student_score.course_id = temp_2.course_id 
and t_student_score.class_id = temp_2.class_id
on _student_score.score > temp_2.avg_score

4. A group's business face-to-face test questions

There is an employee table dept_emp has the following fields:

Employee ID: emp_no
Department ID: dept_no
Entry time: from_date
Resignation time: to_date

There are employee payroll salaries, including the following fields:

Employee ID: emp_no
Entry time: from_date
Resignation time: to_date
Salary: salary

Get the information about the highest salary of employees in all departments and give dept_no, emp_no and its corresponding salary are arranged in ascending order by department number
You can find the salary increase of all employees since their employment and give the employee number emp_no and its corresponding salary increase, in ascending order according to the growth

1) Get the information about the highest salary of employees in all departments and give dept_no, emp_no and its corresponding salary are arranged in ascending order by department number

with temp_1 as (
	select a.dept_no,a.emp_no,b.salary,dense_rank() over (partition by a.dept_no order by salary) rank
		from dept_emp a left join salaries b on a.emp_no = b.emp_no and a.from_date = b.from_date and a.to_date = b.to_date
)

select a.dept_no,a.emp_no,b.salary
	from temp_1 
	where rank =1
	order by a.dept_no desc

2) You can find the salary increase of all employees since their employment and give the employee number emp_no and its corresponding salary increase, in ascending order according to the growth

with temp_1 as(
select emp_no,min(salary) as min_salary
from salaries
group by 1
)
	,temp_2 as(
select emp_no, salary
from salaries
where
to_date = '9999-01-1')
select emp_no,temp_2.salary-temp_1.salary as growth
from temp_1
inner join temp_2
on temp_1.emp_no = temp_2.emp_no
order by growth asc

5. Calculate the monthly working hours and total working hours of each employee from January to March

There is a dataList of employee working hours, including the following fields:

Employee ID: emp_no
Month: date
Working hours: dur_time
Calculate the monthly working hours and total working hours of each employee from January to March

with temp_1 as(
select emp_no,month(date)sum(dur_time) mon_time
from dayaList
where month(date)>=1 and month(date)<=3
group by emp_no,month(date)
)
,temp_2 as(
select emp_no,sum() over (partition by emp_no order by date) as sum_time
from data2List
)
select emp_no,yue,mon_time,sum_time
from temp_1 inner join  temp_2
on temp_1.emp_no = temp_2.emp_no

6. quick Kwai points interview questions

User login table user_log, including the following fields:

log_day: login date
device_id: user equipment id
app_id: the id of the user app, where device_id and app_id determines the unique user

1) Calculate the retention rate of a day (the retention rate of the next day, 3 days, 7 days and 30 days). Take '2020-01-01' as an example

Idea:

Ensure that the user is a new user and has not logged in before January 1, 2020
Self connect on the next day, 3rd, 7th and 30th.
Calculate the daily increase, retention for the next day, retention for 3 days, retention for 7 days and retention for 30 days.

with user_log as(
select device_id||app_id user_id,log_day
from user_log)
-- Make sure you are a new user
,a as(
select user_id,log_day
from user_log
where log_day = '2020-01-01'
and user_id not in (
select distinct user_id
from user_log
where log_day< '2020-01-01'
)
,temp_2 as(
select a.log_day,count(distinct a.user_id) cn_0,count(distinct b.user_id) cn_1,count(distinct c.user_id) cn_3,count(distinct d.user_id) cn_7,count(distinct e.user_id) cn_30
from a
left join user_log b
on a.user_id = b.user_id and datediff(b.log_day,a.log_day) =1
left join user_log c
on a.user_id = c.user_id and datediff(c.log_day,a.log_day) =3
left join user_log d
on a.user_id = d.user_id and datediff(d.log_day,a.log_day) =7
left join user_log e
on a.user_id = e.user_id and datediff(e.log_day,a.log_day) =30
group by 1
)
select a.log_day,concat(round(100*cn_1/cn_0),'%') Next day retention rate
,concat(round(100*cn_3/cn_0),'%') 3 Daily retention rate
,concat(round(100*cn_7/cn_0),'%') 7 Daily retention rate
,concat(round(100*cn_30/cn_0),'%') 30 Daily retention rate
from temp_2

2) Calculate daily retention rate

with user_log as(
select device_id||app_id user_id,log_day
from user_log)
, a as (
select user_id, log_day
from user_log a
where a.log_day = a.log_day
and user_id not in 
(select user_id, log_day
from user_log b
where b.log_day < a.log_day)
)
,temp_2 as(
select a.log_day,count(distinct a.user_id) cn_0,count(distinct b.user_id) cn_1,count(distinct c.user_id) cn_3,count(distinct d.user_id) cn_7,count(distinct e.user_id) cn_30
from a
left join user_log b
on a.user_id = b.user_id and datediff(b.log_day,a.log_day) =1
left join user_log c
on a.user_id = c.user_id and datediff(c.log_day,a.log_day) =3
left join user_log d
on a.user_id = d.user_id and datediff(d.log_day,a.log_day) =7
left join user_log e
on a.user_id = e.user_id and datediff(e.log_day,a.log_day) =30
group by 1
)
select a.log_day,concat(round(100*cn_1/cn_0),'%') Next day retention rate
,concat(round(100*cn_3/cn_0),'%') 3 Daily retention rate
,concat(round(100*cn_7/cn_0),'%') 7 Daily retention rate
,concat(round(100*cn_30/cn_0),'%') 30 Daily retention rate
from temp_2

7. Character splitting (use hive lateral view)

RELATIONSHIP, a fan RELATIONSHIP table, contains the following fields:

User id: user_id
User attention media id: media_id
Date: imp_date
It is found that one user pays attention to multiple media at the same time in the fan relationship table,
Example: for a user with user ID 1001, the corresponding attention media ID data is 101010201031.

1) Please split this situation in the table into multiple entries so that the user id in each data corresponds to only - user attention media IDs.

select user_id,imp_date,lateral view explode(solit(media_id,',')) temp_1 as t_media_id from relationship

8. sql test questions of a foreign enterprise (use case when)

There is a user table user, which contains the following fields:

id: user id
Name: user name

With integral table grade_info, including the following fields:

user_id: user id
grade_num: integral score
Type: type, increase / decrease The values are 'add' and 'reduce'
When users participate in different activities on the platform every day, the platform points will rise or fall accordingly.

1) Now find the id and name of the user with the highest score and the total score (there may be multiple), and the query results are sorted in ascending order by id.

WITH t AS (
    SELECT  user_id,SUM(grade_num) AS grade_sum
    FROM (SELECT user_id,grade_num * IF(type= 'add',1,-1) AS grade_num
          FROM grade_info )AS t1
    GROUP BY user_id )
SELECT id,name,grade_sum
FROM t
JOIN user u ON u.id = t.user_id
WHERE grade_sum  = (SELECT MAX(grade_sum) FROM t)

9. Query the number of resumes received in each month for each position submitting resumes in 2025 and the corresponding number of resumes received in the same month and position in 2026, using left and right

ith a as (
	(select job
		,left(date,7) first_year_mon
		,sum(num) first_year_cnt 
	from resume_info 
	where left(date,4)='2025' 
	group by left(date,7),job) 
)
	,b as (select job
		,left(date,7) second_year_mon
		,sum(num) second_year_cnt from resume_info 
		where left(date,4)='2026' 
		group by left(date,7),job
)


# This topic mainly investigates the string interception function. Of course, you can also use date_format function to format the date
select a.job,first_year_mon,first_year_cnt,second_year_mon,second_year_cnt from
a
left join b
on a.job = b.job and right(a.first_year_mon,2) = right(b.second_year_mon,2)
order by first_year_mon desc ,job desc

10. Netease test questions

A user login table user_login_table, including the following fields:

login_time: login time
user_id: user id
Product: product
online_second: online time

Use sql statement to get the login number, effective login number and effective login number of official channels of each product, and then sort by product order (from small to large)
(effective login: if a single login is greater than or equal to 5min, one record is a login; official channel: take ''@ http://163.com ’(account number at the end)
Take out the products with the top 2 effective login users and the top 5 users with the total online time. Sort by product id (from small to large) and online duration (from large to small)

1) Use sql statement to get the login number, effective login number and effective login number of official channels of each product, and then sort by product order (from small to large)

with temp_1 as(
select product,count(user_id) as login_all
from user_login_table
group by product)
,temp_2 as(
select product,count(user_id) as login_real
from user_login_table
where online_second>=300
group by product
)
,temp_3 as (
	select product
			,count(distinct user_id) as login_real_official
	from user_login_table
	where online_second>=300 and user_id like '%@http://163.com'
	group by product
	)
select login_all,login_real,login_real_official
from temp_1
left join temp_2
on temp_1.product = temp_2.product
left join temp_3
on temp_1.product = temp_3.product
order by product

Method 2

select product
	,count(distinct user_id) as login_num
	,count(distinct if(online_second>=300,user_id,ull)) as effect_login_num
	,count(distinct if(online_second>=300 and user_id like '%@http://163.com',null))
	AS official_login_num
FROM user_login_table
GROUP BY product;

2) Take out the products with the top 2 effective login users and the top 5 users with the total online time. Sort by product id (from small to large) and online duration (from large to small)

with d as (
	select product
	from
	(
		select product
				,dense_rank() over (partition by product order by login_real) as cnt_rank
		from (
			select product
					,count(distinct user_id) as login_real
			from user_login_table
			where online_second>=300
			group by product
			)
	)
)
	,e as(
	select user_id
        ,product
        ,play_time
        ,dense_rank() over (partition by product order by play_time) as time_rank
    from 
    (
	select user_id
	        ,product
	        ,sum(online_second) as play_time 
	    from user_login_table
	    where online_second>=300
	    group by user_id,product
    	)
		)
select d.product,e.user_id,e.play_time
from d 
left join e
on d.product = e.product
where cnt_rank<=2 and time_rank<=5
order by pay_time desc;

11. Take out the music that the user likes and pays attention to, and can't communicate with the user_ The user with id = 1 has repeated his favorite music

A table of people loved and followed by cloud music users, including the following fields:

user_id: user ID
follower_id: attention person id

A list of music songs, including the following fields:

music_id: music number
music_name: music name

A record table of music each user likes_ Like, including the following fields:

user_id: user ID
music_id: music number

1. Now we're going to give you_ The user with id = 1 provides music recommendation and outputs some music that the user may like. The principle is to take out the music that the user likes and pays attention to, and can't communicate with the user_ The user with id = 1 has repeated his favorite music

1) Take out the music that the user likes and pays attention to, and can't communicate with the user_ The user with id = 1 has repeated his favorite music

with a as(
	select user_id
		   ,music_id
	from follow f 
	left join music_likes m_l
	on f.user_id = m_l.user_id
	where user_id=1
)
	,b(
	select f.user_id,f.followe_id,m_l.music_id
	from follow f
	left join music_likes m_l
	on f.followe_id = m_l.user_id
	where f.user_id=1
		)
select user_id,case when a.music_id is null then b.music_id end as rec_music_id
from a right left join b on a.music_id = b.music_id

12. lead() lag()

There is a user behavior table tracking_log, including the following fields:

user_id: user id
opr_id: operation number, value A,B
log_time: operation time

Calculate the number of visitors per day and their average number of operations
Count the number of users who meet the following conditions every day: operation A is followed by operation B, and operation AB must be adjacent

1) Calculate the number of visitors per day and their average number of operations

select date(log_time),count(distinct user_id),count(opr_id)/count(distinct user_id)
from tracking_log group by date(log_time)

2) Count the number of users who meet the following conditions every day: operation A is followed by operation B, and operation AB must be adjacent

with a as (
	select user_id,opr_id,date(log_time),lead(opr_id,1) over(partition by user_id,date(log_time) order by log_time) as opr_id_2
	from tracking_log
)

select date(log_time),count(distinct user_id)
from a
where (opr_id ='A' and opr_id_2 = 'B') or (opr_id ='B' and opr_id_2 = 'A')
group by date(log_time)

Topics: SQL

Programmer Think