fork someone else's project and fill it out again. My code is as follows
https://gitee.com/fakerlove/machine-learning/tree/master/code
1. Wu Enda machine learning course - Assignment 1 - linear regression
Reference link
https://blog.csdn.net/qq_20412595/article/details/82181855
Definition and classification of regression analysis
Regression Analysis is a statistical method to analyze data. The purpose is to understand whether two or more variables are related, the direction and intensity of correlation, and establish a mathematical model to observe specific variables to predict the variables of interest to researchers. More specifically, Regression Analysis can help people understand the change of dependent variable when only one independent variable changes.
Generally speaking, through regression analysis, we can estimate the conditional expectation of dependent variables from the given independent variables. Regression analysis is a model to establish the relationship between dependent variable Y (or dependent variable, response variable) and independent variable X (or independent variable, explanatory variable).
The main algorithms of regression analysis include:
- Linear regression
- Logistic regression
- Polynomial regression
- Step regression
- Ridge regression
- Lasso regression
- Elastic net regression
1.1 univariate linear regression
pdf is the title of English
1) Topic introduction
In this part of the exercise, you will use 1 to implement a linear regression variable to predict the profit of the food truck.
Suppose you are the CEO of a company. The restaurant franchise store is considering opening a new outlet in different cities.
The chain has trucks in every city. You have data on the profit and population of the city.
You want to use this data to help you choose the next city to expand.
2) Data introduction
ex1data1.txt format is as follows
6.1101,17.592 5.5277,9.1302 8.5186,13.662 7.0032,11.854 5.8598,6.8233 8.3829,11.886 7.4764,4.3483 8.5781,12 6.4862,6.5987
ex1data .txt file contains the data set of linear regression problem.
The first column is the population of a city, and the second column is the profit of a dining car in that city. A negative value of profit represents a loss.
When reading data, it is read in by rows. Each row is decomposed to obtain two data. List of storage and use of data.
# Read the data in the file a.txt with the separator ",", and read the data in double format data = np.loadtxt('ex1data1.txt', delimiter=',', dtype=np.double)
Next, to visualize the data, you need to use the drawing library matplotlib
# There are 2 * 2 and 4 pictures in total. Now draw the first picture plt.subplot(2, 2, 1) # Show Chinese labels plt.rcParams["font.sans-serif"] = ["SimHei"] plt.rcParams["axes.unicode_minus"] = False plt.title("Linear regression variables are used to predict the profit of food trucks") # Set x plt.xlabel("Urban population, unit: 10000")# Set x axis plt.ylabel("Profit in thousands") # Set y axis # Scatter plot with scatter plt.scatter(data[:, 0], data[:, 1], marker='x') plt.show()
[the transfer of external chain pictures fails. The source station may have anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-qvw1FvSL-1645517549301)(picture/image-20211018164844778.png)]
Use the gradient descent method to update the function - (no, you can refer to it) Article link)
J
(
θ
)
J(\theta)
J( θ) Set for all loss values,
m
m
m is the number of data groups
J
(
θ
)
=
1
2
m
∑
i
=
1
m
(
f
(
x
)
−
y
)
2
f
(
x
)
=
w
x
+
b
J(\theta)=\frac{1}{2m}\sum_{i=1}^m(f(x)-y)^2 \\ f(x)=wx+b
J(θ)=2m1i=1∑m(f(x)−y)2f(x)=wx+b
At the bottom of the article is the updated formula of gradient descent
x
′
←
x
−
α
∇
f
(
x
)
x^\prime \leftarrow x-\alpha \nabla f(x)
x′←x−α∇f(x)
Available w w w and b b The updated formula of b is:
w ← w − α ∂ J ∂ w w\leftarrow w-\alpha \frac{\partial J}{\partial w} w←w−α∂w∂J
b ← b − α ∂ J ∂ b b\leftarrow b-\alpha \frac{\partial J}{\partial b} b←b−α∂b∂J
so
w ← w − α 1 m ∑ i = 1 m ( w x i + b − y i ) x i w \leftarrow w-\alpha \frac{1}{m}\sum_{i=1}^m(wx_i+b-y_i)x_i w←w−αm1∑i=1m(wxi+b−yi)xi
b ← b − α 1 m ∑ i = 1 m ( w x i + b − y i ) b \leftarrow b-\alpha \frac{1}{m}\sum_{i=1}^m(wx_i+b-y_i) b←b−αm1∑i=1m(wxi+b−yi)
among α \alpha α A positive number greater than 0 is usually called step size or learning rate.
3) Code
import matplotlib.pyplot as plt import numpy as np class Linear_Regression: # Gradient descent, update w and b def gradient_descent(self, w, b, alpha): for i in range(self.m): x_i = self.data[i][0] y_i = self.data[i][1] w -= (alpha / self.m) * (w * x_i + b - y_i) * x_i b -= (alpha / self.m) * (w * x_i + b - y_i) return w, b # Calculate loss value def cal_loss(self, w, b): # J is the total loss J = 0 for i in range(self.m): x_i = self.data[i][0] y_i = self.data[i][1] J += 1.0 / (2 * self.m) * pow((w * x_i + b - y_i), 2) return J # Initialization data def __init__(self): # Read the data in the file a.txt with the separator ",", and read the data in double format self.data = np.loadtxt('ex1data1.txt', delimiter=',', dtype=np.float64) # m sets the number of rows self.m = self.data.shape[0] # Data value # There are 2 * 2 and 4 pictures in total. Now draw the first picture plt.subplot(2, 2, 1) # Show Chinese labels plt.rcParams["font.sans-serif"] = ["SimHei"] plt.rcParams["axes.unicode_minus"] = False plt.title("Linear regression variables are used to predict the profit of food trucks") # Set x plt.xlabel("Urban population, unit: 10000") # Set x axis plt.ylabel("Profit in thousands") # Set y axis # Scatter plot with scatter plt.scatter(self.data[:, 0], self.data[:, 1], marker='x') # Start function def main(self): alpha = 0.01 # Learning rate iterations = 1500 # Number of iteration rounds of gradient descent # The following two are parameter values with estimation, which need to be updated each time w = 0.0 b = 0.0 w_all = [] b_all = [] # Is the total loss function cost = [] print("-------Start the calculation-----") # Calculate the loss, the initial value loss print("Initial loss value",self.cal_loss(w, b)) # About 1500 rounds of training for i in range(iterations): w_all.append(w) b_all.append(b) w, b = self.gradient_descent(w, b, alpha) temp = self.cal_loss(w, b) cost.append(temp) print("Final result w---", w, "Final result b---", b) x = [5.0, 22.5] y = [5.0 * w + b, 22.5 * w + b] plt.subplot(2, 2, 2) plt.plot(x, y, color="red") plt.title("Linear regression question 1") plt.xlabel("Population of the city") plt.ylabel("profit") plt.scatter(self.data[:, 0], self.data[:, 1], marker='x') print(str(cost)) plt.subplot(2, 2, 3) plt.title("loss function J") plt.xlabel("Number of iterations") plt.ylabel("magnitude of the loss") plt.plot(range(len(cost)), cost, color="red") plt.show() print("magnitude of the loss", cost[0:5]) print("b--", b_all[0:5]) print("w--", w_all[0:5]) print("-------It's over-----") if __name__ == '__main__': obj = Linear_Regression() obj.main()
give the result as follows
[the transfer of external chain pictures fails. The source station may have an anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-5nm3Yv9p-1645517549305)(picture/image-20211020194136114.png)]
1.2 multiple linear regression
1.2.1 problems and precautions
When solving the linear regression model, there are two problems that need to be paid attention to
One is the problem of feature combination. For example, the length and width of the house are used as two features to participate in the construction of the model. It is better to multiply them to obtain the area and then solve it as a feature. In this way, the dimension of feature selection is reduced.
The second is Feature Scaling, which is also a problem that many machine learning models need to pay attention to.
In some models, the optimal solution is not equivalent to the original after uneven scaling in various dimensions, such as SVM. For such a model, unless the distribution range of each dimensional data is relatively close, it must be standardized to avoid model parameters being dominate d by data with large or small distribution range.
In some models, the optimal solution is equivalent to the original after uneven scaling in various dimensions, such as logistic regression. For such a model, whether it is standardized or not will not change the optimal solution in theory. However, because the iterative algorithm is often used in the actual solution, if the shape of the objective function is too "flat", the iterative algorithm may converge very slowly or even not. Therefore, for the model with scalability invariance, it is best to standardize the data.
1) Normalized benefits
-
Improve the convergence speed of the model
As shown in the following figure, the value of x1 is 0-2000, while the value of x2 is 1-5. If there are only these two features, a narrow and long ellipse will be obtained when optimizing it, resulting in a zigzag route in the direction of vertical contour line when the gradient decreases, which will make the iteration very slow. In contrast, the iteration in the right figure will be very fast
Not normalized
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-nyBc5jdI-1645517549306)(picture/20171126162643419.png)]
After normalization
[the transfer of external chain pictures fails. The source station may have an anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-50FlgnI6-1645517549307)(picture/20171126162702871.png)]
-
Improve the accuracy of the model
Another advantage of normalization is to improve the accuracy, which is effective when some distance calculation algorithms are involved. For example, the algorithm needs to calculate Euclidean distance, as shown in the figure above x 2 x_2 The value range of x2 is relatively small, and when it comes to distance calculation, it has a far greater impact on the results x 1 x_1 x1 brings small, so it will cause the loss of accuracy. So normalization is necessary. It can make the contribution of each feature to the result the same.
2) Common planning methods
-
Linear normalization
(Min-Max Normalization)
Linear normalization will convert the input data to the range of [0,1]. The formula is as follows
X n o r m = X − X m i n X m a x − X m i n X_{norm}=\frac{X-X_{min}}{X_{max}-X_{min}} Xnorm=Xmax−XminX−Xmin
-
0-means standardization
0 mean normalization method normalizes the original data set into a data set with mean value of 0 and variance of 1. The normalization formula is as follows:
z = x − μ σ z=\frac{x-\mu}{\sigma} z=σx−μ
Among them, μ,σ They are the mean and method of the original data set. This normalization method requires that the distribution of the original data can be approximately Gaussian distribution, otherwise the normalization effect will become very bad.
1.2.2 model introduction
Model
h θ ( x ) = θ T X = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_\theta(x)=\theta^TX=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n hθ(x)=θTX=θ0+θ1x1+θ2x2+⋯+θnxn
After obtaining the model, we need to find the required loss function, * * general linear regression, we use the mean square error as the loss function** The algebraic expression of the loss function is as follows:
J = ∑ i = 0 m ( h θ ( x ) − y i ) J=\sum_{i=0}^m(h_\theta(x)-y_i) J=∑i=0m(hθ(x)−yi)
Expressed in matrix as J ( θ ) = 1 2 ( X θ − Y ) T ( X θ − Y ) J(\theta)=\frac{1}{2}(X_\theta-Y)^T(X_\theta-Y) J(θ)=21(Xθ−Y)T(Xθ−Y)
We often use two methods to minimize the loss function θθ Parameters: one is gradient descent method and the other is least square method.
If the gradient descent method is used, then θ \theta θ The iterative formula is as follows:
θ = θ − α X T ( θ X − Y ) \theta=\theta-\alpha X^T(\theta X-Y) θ=θ−αXT(θX−Y)
def cost(self, X, theta, y): ''' Calculate loss function :param X: matrix :param theta: step :param y: result :return: matrix X Function of ''' m = X.shape[0] temp = X.dot(theta) - y return temp.T.dot(temp) / (2 * m) def gradient_descent(self, X, theta, y, alpha, iterations): ''' :param X: matrix X :param theta: variable :param y: result :param alpha: step :param iterations: Number of iterations :return: ''' m = X.shape[0] print("Number of rows--", m) print("y Number of rows", len(y)) c = [] # Store calculated loss value for i in range(iterations): theta -= (alpha / m) * X.T.dot(X.dot(theta) - y) # Calculate loss value c.append(self.cost(X, theta, y)) return theta, c
If the least square method is used, then θ The result formula is as follows:
θ = ( X T X ) − 1 X T Y \theta=(X^TX)^{-1}X^TY θ=(XTX)−1XTY
python code is as follows
def normal_equation(X, y): return np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)
Of course, linear regression and other commonly used algorithms, such as Newton method and quasi Newton method, are not described in detail here.
Note that in the business of data fitting, we should pay attention to the problem of multicollinearity and the test of regression equation.
1.2.3 topic introduction
In this section, you will implement a linear regression with multiple variables to predict house prices. Suppose you want to sell your house and want to know what a good market price is.
One way is to first collect information about recently sold houses and make a house model price.
1.2.4 data introduction
2104,3,399900 1600,3,329900 2400,3,369000 1416,2,232000 3000,4,539900 1985,4,299900 1534,3,314900 1427,3,198999 1380,3,212000 1494,3,242500 1940,4,239999 2000,3,347000 1890,3,329999 4478,5,699900
ex1data2.txt file contains the price training set of Oregon port land.
The first pillar is the size of the house (square feet)
The second column is the number of bedrooms,
The third column is the price of the house.
1.2.5 codes are as follows:
import matplotlib.pyplot as plt import numpy as np class Linear_Regression: def normal_equation(self, X, y): ''' least square method :param X: :param y: :return: ''' return np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y) def cost(self, X, theta, y): ''' Calculate loss function :param X: matrix :param theta: step :param y: result :return: matrix X Function of ''' m = X.shape[0] temp = X.dot(theta) - y return temp.T.dot(temp) / (2 * m) def gradient_descent(self, X, theta, y, alpha, iterations): ''' :param X: matrix X :param theta: variable :param y: result :param alpha: step :param iterations: Number of iterations :return: ''' m = X.shape[0] print("Number of rows--", m) print("y Number of rows", len(y)) c = [] # Store calculated loss value for i in range(iterations): theta -= (alpha / m) * X.T.dot(X.dot(theta) - y) # Calculate loss value c.append(self.cost(X, theta, y)) return theta, c def maxminnorm(self, array): ''' Numpy Normalization of array :param array: Normalized array required :return: ''' maxcols = array.max(axis=0) mincols = array.min(axis=0) data_shape = array.shape data_rows = data_shape[0] data_cols = data_shape[1] t = np.empty((data_rows, data_cols)) for i in range(data_cols): t[:, i] = (array[:, i] - mincols[i]) / (maxcols[i] - mincols[i]) return t def __init__(self): # Read the data in the file a.txt with the separator ",", and read the data in double format self.data = np.loadtxt('ex1data2.txt', delimiter=',', dtype=np.float64) # m sets the number of rows self.m = self.data.shape[0] # Data normalization self.normalization_data = self.maxminnorm(self.data) def main(self): ''' Main function :return: ''' theta = np.zeros((2,)) print(theta) alpha = 0.1 iterations = 10000 # The first and second columns are X, data X = self.normalization_data[:, 0:2] # The third column is the price y = self.normalization_data[:, 2] theta, c = self.gradient_descent(X=X, theta=theta, y=y, alpha=alpha, iterations=iterations) # Visual descent process plt.rcParams["font.sans-serif"] = ["SimHei"] plt.rcParams["axes.unicode_minus"] = False plt.plot() plt.title("loss function J(θ)") plt.xlabel("Reception times") plt.ylabel("magnitude of the loss") plt.plot([i for i in range(iterations)], c, color="red") plt.show() print("Use gradient descent:", theta) print("Using least squares", self.normal_equation(X, y)) pass if __name__ == '__main__': obj = Linear_Regression() obj.main()
give the result as follows
[the external chain picture transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-74PRp4m5-1645517549309)(picture/image-20211103170549479.png)]