Trend Fitting-Logistic Growth Model for the Number of Customers Requesting Quotations in Post-automobile Market

Posted by ozkan on Tue, 04 Jan 2022 05:11:02 +0100

@TOC

1 Background description

Normally, the growth process of a company's number of customers in one line of business is similar to the population growth in a region. Generally, it will go through these growth stages, namely, silence, growth, explosion, stability. The trend curve of the whole process conforms to the "S" curve. Taking the number of inquiry customers in the automotive aftermarket as an example, when the penetration rate reaches the upper limit and the loss and retention reaches a balance, the number of customers will maintain a stable level. Based on this premise, we try to fit the growth trend of the number of customers with the logistic growth model.

2 Data Exploratory Analysis

Read the data with python, view the information of the data, see if there is missing data, calculate descriptive statistical indicators of the data, and get a general idea of the distribution of the data. The Python code is as follows:

import pandas as pd

data = pd.read_excel(r'D:\logistic Growth Fitting\High-end mid-end car inquiry customer data.xlsx',engine='openpyxl')
print(data.info())
print(data[['Number of high end car inquiry customers','Number of inquiry customers for mid-range vehicles']].describe())

Run result:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59 entries, 0 to 58
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   Month        59 non-null     int64
 1   Number of high-end car inquiry customers 59 non-null     int64
 2   Number of inquiry customers for mid-range vehicles 59 non-null     int64
dtypes: int64(3)
memory usage: 1.5 KB
None
           Number of high end car inquiry customers      Number of inquiry customers for mid-range vehicles
count     59.000000     59.000000
mean   14757.254237  12549.542373
std    12560.147425  13848.859081
min      263.000000     45.000000
25%     2693.500000    830.500000
50%    13423.000000   7604.000000
75%    21892.500000  19858.500000
max    42052.000000  47767.000000

From the results, we can see that there is no missing data, and the data of high-end cars is more stable than that of mid-end cars.

Here we draw a scatter chart to observe the current trend of the number of mid-and high-end car inquiry customers at this stage. The specific python code is as follows:

import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']    # Solve display problems in Chinese
plt.rcParams['axes.unicode_minus'] = False      # Solving the problem of negative number of coordinate axes

t = data['Month'].to_list()
t1 = np.array(range(1,len(t)+1))

P_A = data['Number of high end car inquiry customers'].values
P_B = data['Number of inquiry customers for mid-range vehicles'].values

# Plot scatterplots
fig1 = plt.figure(figsize=(15,6))
plt.scatter(x=t1,y=P_A,c='g',marker='o',label='Number of high end car inquiry customers')
plt.scatter(x=t1,y=P_B,c='b',marker='*',label='Number of inquiry customers for mid-range vehicles')
plt.xticks(ticks=t1,labels=t,rotation=90)
plt.xlabel('date')
plt.ylabel('Number of customers')
plt.legend(loc=0)
plt.show()

Run result:

As you can see from the picture, due to the influence of the Spring Festival, there has been a cliff drop in the number of inquiry clients in February from 19 to 21 years. Looking at the sample set, there are outliers. If you want to fit the growth rate more accurately, your friends can process the data in February each year. There are two ways to process this: (1) exclude the February data; (2) replace the February data with the arithmetic mean of the three data before February.

3 Model Fitting

Here's a logistic growth model P ( t ) = K P 0 e r ( t − t 0 ) K + P 0 ( e r ( t − t 0 ) − 1 ) P(t)=\frac{KP_0 e^{r(t-t_0)}}{K+P0( e^{r(t-t_0)}-1)} P(t) =K+P 0(er(t_t0) 1) K P 0 er(t_t0) is used as the fitting object to fit the sample data, and the parameters to be estimated are calculated by least squares method. K , P 0 , r K,P_0, r K, P 0, r, where

  • t0 is the start time
  • K is the environmental capacity, that is, the limit P(t) can reach when growing to the end.
  • P0 is the initial capacity, which is the number of times t=0.
  • R is the growth rate, the larger r the faster the growth, the faster the approximation to K value, the smaller r the slower the growth, the slower the approximation

The code is as follows:

import numpy as np
from scipy.optimize import curve_fit    # Call this library to fit curves

# Define logistic growth function
def logistic_increase_function(t1, K, P0, r):
    t0 =1
    exp_value = np.exp(r * (t1-t0))
    return (K * exp_value * P0) / (K + (exp_value - 1) * P0)

'''Fitting the number of high-end car inquiry customers'''
# Estimate Fitting Using Least Squares
popt_A, pcov_A = curve_fit(logistic_increase_function, t1, P_A)

# Obtaining fitting coefficients
print("K:capacity  P0:initial_value   r:increase_rate")
print(popt_A)	# [7.91986321e+04 1.85666466e+03 6.59639316e-02]

# Calculate Fit Value
P_A_fit = logistic_increase_function(t1, popt_A[0], popt_A[1], popt_A[2])

'''Fitting the number of mid-end car inquiry customers'''
popt_B, pcov_B = curve_fit(logistic_increase_function, t1, P_B)

# Obtaining fitting coefficients
print("K:capacity  P0:initial_value   r:increase_rate")
print(popt_B)	# [1.20368534e+05 6.55225678e+02 8.25444681e-02]

# Calculate Fit Value
P_B_fit = logistic_increase_function(t1, popt_B[0], popt_B[1], popt_B[2])

You can see that the growth rate of the number of mid-end car inquiry customers is larger than that of high-end cars, which can also be verified by the slope of the scatterplot curve above.

4 Fitting result mapping and future trend prediction

Future trends predict that we will take a little longer to predict when the stability period will be reached, coded as follows:

# Construct a future date list
t_future = [202112] + [i for i in range(202201,202213)] + [i for i in range(202301,202313)] + [i for i in range(202401,202413)] + [i for i in range(202501,202513)] + [i for i in range(202601,202613)] + [i for i in range(202701,202713)]
future = [ i for i in range(t1[-1]+1,t1[-1]+1+len(t_future))]
future = np.array(future)

# Future Forecast
A_future_predict = logistic_increase_function(future, popt_A[0], popt_A[1], popt_A[2])
B_future_predict = logistic_increase_function(future, popt_B[0], popt_B[1], popt_B[2])

# Mapping
fig2 = plt.figure(figsize=(15,6))
plot1_A = plt.plot(t1, P_A, 's', label="High-end cars values",)    # Scatter plot
plot2_A = plt.plot(t1, P_A_fit, 'r', label='High-end cars fit values')    # curve
plot3_A = plt.plot(future, A_future_predict, 's', label='High-end cars predict values')   # Scatter plot
plot1_B = plt.plot(t1, P_B, 's', label="Mid-End Car values",)    # Scatter plot
plot2_B = plt.plot(t1, P_B_fit, 'r', label='Mid-End Car fit values')    # curve
plot3_B = plt.plot(future, B_future_predict, 's', label='Mid-End Car predict values')   # Scatter plot
plt.xlabel('date')
plt.ylabel('Number of customers')
plt.title('Fitting the growth of inquiry customers')
plt.legend(loc=0)  # Specify the lower right corner of the legend position
plt.show()

Drawing results:

From the drawing results, the number of mid-and high-end car inquiry customers of the company is growing, and it is expected to reach a stable period in the first half of 2027.

5 Complete code

import numpy as np
import pandas as pd
from scipy.optimize import curve_fit    # Call this library to fit curves
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']    # Solve display problems in Chinese
plt.rcParams['axes.unicode_minus'] = False      # Solving the problem of negative number of coordinate axes

data = pd.read_excel(r'D:\logistic Growth Fitting\High-end mid-end car inquiry customer data.xlsx',engine='openpyxl')
print(data.info())
print(data[['Number of high end car inquiry customers','Number of inquiry customers for mid-range vehicles']].describe())

t = data['Month'].to_list()
t1 = np.array(range(1,len(t)+1))

P_A = data['Number of high end car inquiry customers'].values
P_B = data['Number of inquiry customers for mid-range vehicles'].values

# Plot scatterplots
fig1 = plt.figure(figsize=(15,6))
plt.scatter(x=t1,y=P_A,c='g',marker='o',label='Number of high end car inquiry customers')
plt.scatter(x=t1,y=P_B,c='b',marker='*',label='Number of inquiry customers for mid-range vehicles')
plt.xticks(ticks=t1,labels=t,rotation=90)
plt.xlabel('date')
plt.ylabel('Number of customers')
plt.legend(loc=0)
plt.show()

# Define logistic growth function
def logistic_increase_function(t1, K, P0, r):
    t0 =1
    # t:time   t0:initial time    P0:initial_value    K:capacity  r:increase_rate
    '''
    K For environmental capacity, that is, to grow to the end, P(t)The limit that can be reached.
    P0 For initial capacity, this is t=0 Number of moments.
    r For growth rate, r The larger the growth, the faster the approaching K Value, r The smaller the growth, the slower the approximation
    '''
    exp_value = np.exp(r * (t1-t0))
    return (K * exp_value * P0) / (K + (exp_value - 1) * P0)

'''Fitting the number of high-end car inquiry customers'''
# Estimate Fitting Using Least Squares
popt_A, pcov_A = curve_fit(logistic_increase_function, t1, P_A)

# Obtaining fitting coefficients
print("K:capacity  P0:initial_value   r:increase_rate")
print(popt_A)

# Calculate Fit Value
P_A_fit = logistic_increase_function(t1, popt_A[0], popt_A[1], popt_A[2])

'''Fitting the number of mid-end car inquiry customers'''
popt_B, pcov_B = curve_fit(logistic_increase_function, t1, P_B)

# Obtaining fitting coefficients
print("K:capacity  P0:initial_value   r:increase_rate")
print(popt_B)

# Calculate Fit Value
P_B_fit = logistic_increase_function(t1, popt_B[0], popt_B[1], popt_B[2])

# Construct a future date list
t_future = [202112] + [i for i in range(202201,202213)] + [i for i in range(202301,202313)] + [i for i in range(202401,202413)] + [i for i in range(202501,202513)] + [i for i in range(202601,202613)] + [i for i in range(202701,202713)]
future = [ i for i in range(t1[-1]+1,t1[-1]+1+len(t_future))]
future = np.array(future)

# Future Forecast
A_future_predict = logistic_increase_function(future, popt_A[0], popt_A[1], popt_A[2])
B_future_predict = logistic_increase_function(future, popt_B[0], popt_B[1], popt_B[2])

# Mapping
fig2 = plt.figure(figsize=(15,6))
plot1_A = plt.plot(t1, P_A, 's', label="High-end cars values",)    # Scatter plot
plot2_A = plt.plot(t1, P_A_fit, 'r', label='High-end cars fit values')    # curve
plot3_A = plt.plot(future, A_future_predict, 's', label='High-end cars predict values')   # Scatter plot
plot1_B = plt.plot(t1, P_B, 's', label="Mid-End Car values",)    # Scatter plot
plot2_B = plt.plot(t1, P_B_fit, 'r', label='Mid-End Car fit values')    # curve
plot3_B = plt.plot(future, B_future_predict, 's', label='Mid-End Car predict values')   # Scatter plot
plt.xlabel('date')
plt.ylabel('Number of customers')
plt.title('Fitting the growth of inquiry customers')
plt.legend(loc=0)
plt.show()

Model Fitting Reference Blog: https://blog.csdn.net/weixin_36474809/article/details/104101055/

Topics: Python Data Mining data visualization