Data analysis and practice | explore the reasons for the decline in sales

Posted by gorskyLTD on Tue, 21 Dec 2021 03:30:54 +0100

Hello, I'm Ding Xiaojie.

The source of the case in this paper is the book data analysis practice, which uses R language. In the next period of time, I will reproduce the case as much as possible with Python + Tableau for everyone to learn.

Scene description

The monthly sales of a small APP game operated by a company increased steadily, but it suddenly decreased in July. Whether from the market environment or the environment of the game itself, the sales of the game still have room to continue to grow
, there may be two main factors affecting the sales of the game:

New game activities
Change of commercial publicity

After understanding, it is found that

There is no significant change between the game activities and the activities last month
Due to budget constraints, publicity efforts have decreased

Next, we will confirm the above conclusions based on data analysis and put forward methods to restore sales.

data description

DAU(Daily Active User)

User data visited at least once a day, 139112 lines.

field	type	meaning
log_date	str	Access time
app_name	str	Application name
user_id	numpy.int64	User ID

DPU(Daily Payment User)

884 lines of user data (0.056 yuan) consuming at least 1 yen per day.

field	type	meaning
log_date	str	Access time
app_name	str	Application name
user_id	numpy.int64	User ID
payment	numpy.int64	Consumption amount

Install

Record the time when each user logs in to the game for the first time, 29329 lines.

field	type	meaning
install_date	str	First login time
app_name	str	Application name
user_id	numpy.int64	User ID

Data analysis

data fetch

Read three data sets.

import pandas as pd

DAU_data = pd.read_csv('DAU.csv')
DPU_data = pd.read_csv('DPU.csv')
install_data = pd.read_csv('install.csv')

Displays the first five rows of the DAU dataset.

DAU_data.head()

Displays the first five rows of the DPU dataset.

DPU_data.head()

Displays the first five rows of the Install dataset.

install_data.head()

Data merging

Merge the daily active user data DAU with the user's first login data Install to save the user_id and
app_name as the reference key. In this way, you can get the first login time of the user.

all_data = pd.merge(DAU_data,
                    install_data,
                    on=['user_id', 'app_name'])
all_data.head()

After the first login time of the user is obtained, it is merged with the daily consumption user data DPU, and the left connection method is used to retain all_ For all data in data, the missing value is NaN by default.

all_data = pd.merge(all_data,
                    DPU_data,
                    on=['log_date', 'app_name', 'user_id'],
                    how='left')
all_data.head()

data processing

Fill the null value in the payment column with 0.

all_data['payment'] = all_data['payment'].fillna('0')
all_data

Convert the unit of payment column to meta, log_date and install_date only keeps the month.

all_data['payment'] = all_data['payment'].astype(float)
all_data['payment'] = all_data['payment'] * 0.056
all_data['log_date'] = pd.to_datetime(all_data['log_date']).map(lambda x : x.strftime("%m")[1] + 'month')
all_data['install_date'] = pd.to_datetime(all_data['install_date']).map(lambda x : x.strftime("%m")[1] + 'month')
all_data.head()

Division of new and old users

Users whose login month > first login month are defined as old users, and others are defined as new users.

all_data['user'] = all_data['log_date'] > all_data['install_date']
all_data['user'] = all_data['user'].map({False: 'new user', True: 'Old users'})
all_data.head()

According to log_ Date and user groups sum the payment and count the sales brought by new and old users in each month.

user_category = all_data.groupby(['log_date', 'user'])['payment'].sum().reset_index()
user_category.head()

It can be seen that the sales brought by old users in June and July are basically the same, but the sales brought by new users in July are significantly less than that in June.

Divide the sales into regions to see which level of user consumption is decreasing.

import numpy as np

sale_df = all_data.loc[all_data['payment'] > 0, ['log_date', 'payment']].copy()
bins = list(range(0, 101, 30)) + [np.inf]
labels = [str(n) + '-' + str(n + 30) for n in bins[:-2]] + ['>90']
sale_df['payment_cut'] = sale_df['payment'].apply(lambda x : pd.cut([x], bins=bins, labels=labels)[0])
sale_df.head()

As can be seen from the histogram above, compared with June, the number of users with consumption less than 60 yuan in July decreased by some.

At this point, we can draw some conclusions.

conclusion

Among the new users, the number of users producing consumption has decreased, especially the small consumption users with less consumption amount. Therefore, the company needs to carry out business publicity activities again and return to the previous level, so as to improve the awareness of potential users of the company's products and add new users.

Case reference

[1] Data analysis practice [Japan] jiujuan long Zhili Yangping / written by Xiao Feng / translated

For those who have just started Python or want to start python, you can contact the author through the small card below to communicate and learn together. They all come from novices. Sometimes a simple question card takes a long time, but they may suddenly realize it at the touch of others, and sincerely hope that everyone can make common progress. There are also nearly 1000 sets of resume templates and hundreds of e-books waiting for you to get them!

👇🏻 Pay attention to the small cards, reply to the "communication group" and learn Python together 👇🏻

Topics: Python Data Analysis Data Mining

Programmer Think