1, Analysis target
1. Establish daily operation index system;
2. Analyze the composition of existing users and count the composition of daily users;
3. Analyze the re purchase situation of users and guide the subsequent re purchase operation.
2, Data description
User_info user information table:
user_action user behavior table:
3, Implementation steps
First, it is necessary to build an operation index system so that operators can see the user scale and the change status of the scale; Then we need to distinguish users according to user stages (new users, activated users, sleep users and lost users), and count the number of users in different stages every day, so that the operation can understand the user scale in different stages and design more appropriate user operation schemes.
Table creation:
create table user_info (user_id string, age_between string, sex int, user_level int, reg_time date ) row format delimited fields terminated by '\t' tblproperties("skip.header.line.count"="1"); load data local inpath '/home/hadoop/datas/HomeWork/user_info_sample.txt' into table user_info; ALTER TABLE user_info SET SERDEPROPERTIES ('serialization.encoding'='GBK'); create table User_action (user_id string, sku_id string, action_time Timestamp, model_id int, type int, cate int, brand int ) row format delimited fields terminated by '\t' tblproperties("skip.header.line.count"="1"); --Skip the first line when reprinting --Load data load data local inpath '/home/hadoop/datas/HomeWork/user_action_sample.txt' into table user_action;
1. Establish user index system
We need to gradually establish the user index system according to the user activity index:
- Number of active users per day: DAU, that is, the weight removal statistics of active users per day;
select date(action_time) action_time, count(distinct user_id) DAU from user_action group by date(action_time);
- Number of newly registered users per day: DNU, that is, the weight removal statistics of newly registered users per day;
select reg_time, count(distinct user_id) from user_info group by reg_time;
- Daily consumption conversion rate: that is, the percentage between the number of users who have placed orders and the number of active users every day;
select date(action_time) action_time, concat(round(count(distinct if(type=4,user_id,Null))/count(distinct user_id)*100,2),"%") as per_con_rate from user_action group by date(action_time) order by date(action_time);
- Daily consumption conversion rate of new users: that is, the percentage between the number of users who place orders and the number of new registered users every day;
select reg_time, concat(round(count(distinct if(type=4,user_info.user_id,Null))/count(distinct user_info.user_id)*100,2),"%") as new_per_con_rate from user_info left join user_action on user_info.reg_time=date(user_action.action_time) and user_info.user_id=user_action.user_id group by reg_time order by reg_time;
Analyze the change characteristics of various indicators in the near future:
select action_time, concat(round((DAU - DAU_1)/DAU_1 *100,2),"%") as DAU_DOD, concat(round((DNU - DNU_1)/DNU_1 *100,2),"%") as DNU_DOD, concat(round((per_con_rate - per_con_rate_1) / per_con_rate_1 *100,2),"%") as per_con_rate_DOD, concat(round((new_per_con_rate - new_per_con_rate_1) / new_per_con_rate_1 *100,2),"%") as new_per_con_rate_DOD, concat(round((DAU - DAU_7)/DAU_7 *100,2),"%") as DAU_WOW, concat(round((DNU - DNU_7)/DNU_7 *100,2),"%") as DNU_WOW, concat(round((per_con_rate - per_con_rate_7) / per_con_rate_7 *100,2),"%") as per_con_rate_WOW, concat(round((new_per_con_rate - new_per_con_rate_7) / new_per_con_rate_7 *100,2),"%") as new_per_con_rate_WOW from( select *, lag(DAU,1) over(order by action_time) as DAU_1, lag(DNU,1) over(order by action_time) as DNU_1, lag(per_con_rate,1) over(order by action_time) as per_con_rate_1, lag(new_per_con_rate,1) over(order by action_time) as new_per_con_rate_1, lag(DAU,7) over(order by action_time) as DAU_7, lag(DNU,7) over(order by action_time) as DNU_7, lag(per_con_rate,7) over(order by action_time) as per_con_rate_7, lag(new_per_con_rate,7) over(order by action_time) as new_per_con_rate_7 from( select date(action_time) action_time, count(distinct user_id) DAU, concat(round(count(distinct if(type=4,user_id,Null))/count(distinct user_id)*100,2),"%") as per_con_rate from user_action group by date(action_time) ) as table_a left join( select reg_time, count(distinct user_info.user_id) as DNU, concat(round(count(distinct if(type=4,user_info.user_id,Null))/count(distinct user_info.user_id)*100,2),"%") as new_per_con_rate from user_info left join user_action on user_info.user_id=user_action.user_id and reg_time=date(action_time) group by reg_time) as table_b on table_a.action_time=table_b.reg_time) as table_c order by action_time;
2. Establish phased user labels
- Take the user on the day of registration as a new user
riqi=reg_time
- Users who have been registered for less than one week are regarded as inactive users (to avoid new users becoming jump out users)
(max_time-reg_time<7 or max_time is Null) and riqi-reg_time<14
- On the 8th day after registration, the user enters the activation period. The activation period is from the 8th to the 14th day after registration. The user who visits back during the activation period becomes the activated user. (to make an inactive user an active user)
max_time-reg_time>7 and riqi-max_time<7
- If the user is not active for a week after becoming an active user; Or users who are not activated during the activation period (days 8 to 14) after registration and are not active after day 14 are regarded as sleep users. (mainly for wake-up)
(max_time-reg_time>7 and riqi-max-time>7 and riqi-max_time<=21) or(riqi-reg_time>14 and riqi-reg_time<=28 and (max_time-reg_time<7 or max_time is Null))
- After becoming a sleep user, there are no active users for two weeks as lost users. (although it is in the same wake-up range, the wake-up difficulty of lost users is greater than that of sleeping users, which will not be considered in wake-up activities with less budget.)
(max_time-reg_time>7 and riqi-max_time>21) or (riqi-reg_time>28 and (max_time-reg_time<7 or max_time is Null))
The reason why we need to count users at different stages is that it is difficult for us to observe the composition of users in our user pool only from DAU, DNU or MAU data. Have users settled down and become more and more? Or is it decreasing? These can only be counted after we break down users in different stages.
2.1 count the user size at different stages of the day according to user behavior
When the number of new users, inactive users and active users increases, and the number of sleep users and lost users decreases, our user scale is on the rise; On the contrary, it shows a downward trend. Analyze the recent change trend of our user scale and judge the health of our user pool.
-- Prepare statistical date data (statistics are made every day, so the time in the user behavior table is pulled out separately as the statistical time) create table riqi as select date(action_time) riqi from user_action group by date(action_time); -- Prepare the last user behavior time (since the statistics date is the last behavior time, the last behavior time on the statistics date is constrained as the last behavior time). riqi The table is a continuous time pulled out separately, not the last behavior time of the user, riqi Table time (independent of user behavior time) create table base_info as select riqi, user_id, reg_time, (select max(action_time) max_time from user_action where date(action_time)<riqi.riqi and user_id=user_info.user_id) as max_time from riqi,user_info; select riqi, user_type, count(*) from( select *, case when riqi=reg_time then "new_user" when ( day(max_time-reg_time)<7 or max_time is Null) and day(riqi-reg_time)<14 then "inaction_user" when day(max_time-reg_time)>7 and day(riqi-max_time)<7 then "action_user" when (day(max_time-reg_time)>7 and day(riqi-max_time)>7 and day(riqi-max_time)<=21) or (day(riqi-reg_time)>14 and day(riqi-reg_time)<=28 and (day(max_time-reg_time)<7 or max_time is Null)) then "sleep_user" when (day(max_time-reg_time)>7 and day(riqi-max_time)>21) or (day(riqi-reg_time)>28 and (day(max_time-reg_time)<7 or max_time is Null)) then "lose_user" end as user_type from base_info ) as a group by riqi,user_type;
2.2 analyze the changes of users in different stages in the last two weeks by week
Changes in the number of users, the number of views per capita, the number of clicks per capita, the number of shopping carts per capita, the number of orders per capita, and the purchase conversion rate.
select if(date(action_time)>"2016-04-08","week_1","week_2") as week, count(distinct user_id) DAU_week, round(count(if(type=1,user_id,Null))/count(distinct user_id),2) avg_user1, round(count(if(type=6,user_id,Null))/count(distinct user_id),2) avg_user6, round(count(if(type=2,user_id,Null))/count(distinct user_id),2) avg_user2, round(count(if(type=4,user_id,Null))/count(distinct user_id),2) avg_user4, concat(round(count(distinct if(type=4,user_id,Null))/count(distinct user_id)*100,2),"%") buy_rate from( select * from user_action where date(action_time)>"2016-04-01" ) as a group by date(action_time)>"2016-04-08";
2.3 compare and analyze the characteristics of crowd portrait and behavior data of new user group and activated user group
Briefly describe whether the new user group introduced by the product is healthy.
-- New user groups select age_between,sex, count(distinct a.user_id) count_user, round(count(if(type=1,a.user_id,Null))/count(distinct a.user_id),2) avg_user1, round(count(if(type=6,a.user_id,Null))/count(distinct a.user_id),2) avg_user6, round(count(if(type=2,a.user_id,Null))/count(distinct a.user_id),2) avg_user2, round(count(if(type=4,a.user_id,Null))/count(distinct a.user_id),2) avg_user4, concat(round(count(distinct if(type=4,a.user_id,Null))/count(distinct a.user_id)*100,2),"%") buy_rate from (select * from user_info where reg_time>"2016-04-08" and reg_time<="2016-04-15") as a left join (select * from user_action where date(action_time)>"2016-04-08") as b on a.user_id=b.user_id group by age_between,sex grouping sets(age_between,sex); -- Activate user groups select age_between,sex, count(distinct a.user_id) count_user, round(count(if(type=1,a.user_id,Null))/count(distinct a.user_id),2) avg_user1, round(count(if(type=6,a.user_id,Null))/count(distinct a.user_id),2) avg_user6, round(count(if(type=2,a.user_id,Null))/count(distinct a.user_id),2) avg_user2, round(count(if(type=4,a.user_id,Null))/count(distinct a.user_id),2) avg_user4, concat(round(count(distinct if(type=4,a.user_id,Null))/count(distinct a.user_id)*100,2),"%") buy_rate from (select * from user_info where reg_time<="2016-04-08") as a inner join (select * from user_action where date(action_time)>"2016-04-08") as b on a.user_id=b.user_id group by age_between,sex grouping sets(age_between,sex);
3. User repurchase analysis
For user operation, due to the high cost of acquiring customers, improving user repurchase is the key to maximize user value.
When should we pay attention to repurchase? Kevin hilstrom gives reference in lean data analysis:
-
The repeat purchase rate reaches 1% ~ 15% within 90 days; Indicates that you are in user acquisition mode; Put more energy and resources into the acquisition and transformation of new users;
-
The repeat purchase rate reaches 15 ~ 30% within 90 days; Indicates that you are in mixed mode; Balance the energy and resources spent on the transformation of new users and the retention and re purchase of old users;
-
The repeat purchase rate reaches more than 30% within 90 days; Indicates that you are in loyalty mode; Put more energy and resources into user repurchase;
3.1 calculate the re purchase of users from February 1 to April 30
- Re purchase rate of users = number of re purchase users / number of active users;
- Number of re purchase users: the number of users who have purchased twice or more in a period of time
After statistics, the repurchase rate of users is 8.87%, indicating that the product is in the user acquisition mode.
select count(if(buy_count>=2,user_id,Null)) re_buy, count(distinct user_id) AU, concat(round(count(if(buy_count>=2,user_id,Null))/count(distinct user_id)*100,2),"%") as re_buy_rate from (select user_id, count(if(type=4,user_id,Null)) buy_count from user_action where date(action_time)>="2016-02-01" group by user_id) as a;
3.2 compare the user portrait and behavior characteristics of re purchase users and non re purchase users
Combined with the user information table and user purchase behavior table, we can see the characteristics of re purchase users: re purchase users are mostly men aged 26-35.
-- Re purchase user select age_between,sex, count(if(buy_count>=2,a.user_id,Null)) re_buy, count(distinct a.user_id) AU, concat(round(count(if(buy_count>=2,a.user_id,Null))/count(distinct a.user_id)*100,2),"%") as re_buy_rate from (select user_id,count(if(type=4,user_id,Null)) buy_count from user_action group by user_id) as a left join (select user_id,age_between,sex from user_info) as b on a.user_id=b.user_id group by age_between,sex grouping sets(age_between,sex); -- Non repurchase user select age_between,sex, count(if(buy_count<2,a.user_id,Null)) not_re_buy, count(distinct a.user_id) AU, concat(round(count(if(buy_count<2,a.user_id,Null))/count(distinct a.user_id)*100,2),"%") as not_re_buy_rate from (select user_id,count(if(type=4,user_id,Null)) buy_count from user_action group by user_id) as a left join (select user_id,age_between,sex from user_info) as b on a.user_id=b.user_id group by age_between,sex grouping sets(age_between,sex);
3.3 analyze the user activation and loss of recent re purchase users in combination with the phased user labels in the last week
select user_type,count(distinct user_id) user_num from( select distinct a.user_id as user_id, case when reg_time="2016-04-15" then "new_user" when (day(max_time-reg_time)<=7 or max_time is Null)and reg_time>"2016-04-01" then "inaction_user" when day(max_time-reg_time)>7 and max_time>"2016-04-08" then "action_user" when (day(max_time-reg_time)>7 and max_time<"2016-04-08" and max_time>="2016-03-25") or (reg_time<"2016-04-01" and reg_time>="2016-03-18" and (day(max_time-reg_time)<7 or max_time is Null)) then "sleep_user" when (day(max_time-reg_time)>7 and max_time<"2016-03-25") or (reg_time<"2016-03-18" and (day(max_time-reg_time)<7 or max_time is Null)) then "lose_user" end as user_type from (select a1.user_id user_id,a1.action_time action_time,max(a1.action_time)over(partition by a1.user_id) max_time from( select user_id,date(action_time) action_time,count(user_id)over(partition by user_id) buy_count from user_action where type=4) as a1 where buy_count>=2)as a left join (select user_id,reg_time from user_info) as b on a.user_id=b.user_id ) as c group by user_type;
4. Report
https://gitee.com/Ariel999/data-analysis-case/blob/master/Hive%E5%AD%A6%E4%B9%A0/%E7%94%A8%E6%88%B7%E8%BF%90%E8%90%A5%E5%89%8D%E6%9C%9F%E5%88%86%E6%9E%90%E6%8A%A5%E5%91%8A-%E5%BC%A0%E8%8D%A3%E8%8D%A3.pptx