Wu Enda machine learning course - Assignment 1 - linear regression

Posted by sean04 on Tue, 22 Feb 2022 09:21:01 +0100

fork someone else's project and fill it out again. My code is as follows

https://gitee.com/fakerlove/machine-learning/tree/master/code

Code source link

1. Wu Enda machine learning course - Assignment 1 - linear regression

Reference link

https://blog.csdn.net/qq_20412595/article/details/82181855

Definition and classification of regression analysis

Regression Analysis is a statistical method to analyze data. The purpose is to understand whether two or more variables are related, the direction and intensity of correlation, and establish a mathematical model to observe specific variables to predict the variables of interest to researchers. More specifically, Regression Analysis can help people understand the change of dependent variable when only one independent variable changes.

Generally speaking, through regression analysis, we can estimate the conditional expectation of dependent variables from the given independent variables. Regression analysis is a model to establish the relationship between dependent variable Y (or dependent variable, response variable) and independent variable X (or independent variable, explanatory variable).

The main algorithms of regression analysis include:

  • Linear regression
  • Logistic regression
  • Polynomial regression
  • Step regression
  • Ridge regression
  • Lasso regression
  • Elastic net regression

1.1 univariate linear regression

pdf is the title of English

1) Topic introduction

In this part of the exercise, you will use 1 to implement a linear regression variable to predict the profit of the food truck.

Suppose you are the CEO of a company. The restaurant franchise store is considering opening a new outlet in different cities.

The chain has trucks in every city. You have data on the profit and population of the city.

You want to use this data to help you choose the next city to expand.

2) Data introduction

ex1data1.txt format is as follows

6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987

ex1data .txt file contains the data set of linear regression problem.

The first column is the population of a city, and the second column is the profit of a dining car in that city. A negative value of profit represents a loss.

When reading data, it is read in by rows. Each row is decomposed to obtain two data. List of storage and use of data.

# Read the data in the file a.txt with the separator ",", and read the data in double format
data = np.loadtxt('ex1data1.txt', delimiter=',', dtype=np.double)

Next, to visualize the data, you need to use the drawing library matplotlib

# There are 2 * 2 and 4 pictures in total. Now draw the first picture
plt.subplot(2, 2, 1)
# Show Chinese labels
plt.rcParams["font.sans-serif"] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False
plt.title("Linear regression variables are used to predict the profit of food trucks")  # Set x
plt.xlabel("Urban population, unit: 10000")# Set x axis
plt.ylabel("Profit in thousands")  # Set y axis
 # Scatter plot with scatter
plt.scatter(data[:, 0], data[:, 1], marker='x')

plt.show()

[the transfer of external chain pictures fails. The source station may have anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-qvw1FvSL-1645517549301)(picture/image-20211018164844778.png)]

Use the gradient descent method to update the function - (no, you can refer to it) Article link)

J ( θ ) J(\theta) J( θ) Set for all loss values, m m m is the number of data groups
J ( θ ) = 1 2 m ∑ i = 1 m ( f ( x ) − y ) 2 f ( x ) = w x + b J(\theta)=\frac{1}{2m}\sum_{i=1}^m(f(x)-y)^2 \\ f(x)=wx+b J(θ)=2m1​i=1∑m​(f(x)−y)2f(x)=wx+b
At the bottom of the article is the updated formula of gradient descent x ′ ← x − α ∇ f ( x ) x^\prime \leftarrow x-\alpha \nabla f(x) x′←x−α∇f(x)

Available w w w and b b The updated formula of b is:

w ← w − α ∂ J ∂ w w\leftarrow w-\alpha \frac{\partial J}{\partial w} w←w−α∂w∂J​​

b ← b − α ∂ J ∂ b b\leftarrow b-\alpha \frac{\partial J}{\partial b} b←b−α∂b∂J​​

so

w ← w − α 1 m ∑ i = 1 m ( w x i + b − y i ) x i w \leftarrow w-\alpha \frac{1}{m}\sum_{i=1}^m(wx_i+b-y_i)x_i w←w−αm1​∑i=1m​(wxi​+b−yi​)xi​​

b ← b − α 1 m ∑ i = 1 m ( w x i + b − y i ) b \leftarrow b-\alpha \frac{1}{m}\sum_{i=1}^m(wx_i+b-y_i) b←b−αm1​∑i=1m​(wxi​+b−yi​)​​

among α \alpha α A positive number greater than 0 is usually called step size or learning rate.

3) Code

import matplotlib.pyplot as plt
import numpy as np


class Linear_Regression:

    # Gradient descent, update w and b
    def gradient_descent(self, w, b, alpha):
        for i in range(self.m):
            x_i = self.data[i][0]
            y_i = self.data[i][1]
            w -= (alpha / self.m) * (w * x_i + b - y_i) * x_i
            b -= (alpha / self.m) * (w * x_i + b - y_i)
        return w, b

    # Calculate loss value
    def cal_loss(self, w, b):
        # J is the total loss
        J = 0
        for i in range(self.m):
            x_i = self.data[i][0]
            y_i = self.data[i][1]
            J += 1.0 / (2 * self.m) * pow((w * x_i + b - y_i), 2)
        return J

        # Initialization data

    def __init__(self):
        # Read the data in the file a.txt with the separator ",", and read the data in double format
        self.data = np.loadtxt('ex1data1.txt', delimiter=',', dtype=np.float64)
        # m sets the number of rows
        self.m = self.data.shape[0]

        # Data value
        # There are 2 * 2 and 4 pictures in total. Now draw the first picture
        plt.subplot(2, 2, 1)
        # Show Chinese labels
        plt.rcParams["font.sans-serif"] = ["SimHei"]
        plt.rcParams["axes.unicode_minus"] = False
        plt.title("Linear regression variables are used to predict the profit of food trucks")  # Set x
        plt.xlabel("Urban population, unit: 10000")  # Set x axis
        plt.ylabel("Profit in thousands")  # Set y axis
        # Scatter plot with scatter
        plt.scatter(self.data[:, 0], self.data[:, 1], marker='x')

    # Start function
    def main(self):

        alpha = 0.01  # Learning rate
        iterations = 1500  # Number of iteration rounds of gradient descent
        # The following two are parameter values with estimation, which need to be updated each time
        w = 0.0
        b = 0.0
        w_all = []
        b_all = []
        # Is the total loss function
        cost = []
        print("-------Start the calculation-----")
        # Calculate the loss, the initial value loss
        print("Initial loss value",self.cal_loss(w, b))
        # About 1500 rounds of training
        for i in range(iterations):
            w_all.append(w)
            b_all.append(b)
            w, b = self.gradient_descent(w, b, alpha)
            temp = self.cal_loss(w, b)
            cost.append(temp)

        print("Final result w---", w, "Final result b---", b)
        x = [5.0, 22.5]
        y = [5.0 * w + b, 22.5 * w + b]
        plt.subplot(2, 2, 2)
        plt.plot(x, y, color="red")
        plt.title("Linear regression question 1")
        plt.xlabel("Population of the city")
        plt.ylabel("profit")
        plt.scatter(self.data[:, 0], self.data[:, 1], marker='x')

        print(str(cost))
        plt.subplot(2, 2, 3)
        plt.title("loss function  J")
        plt.xlabel("Number of iterations")
        plt.ylabel("magnitude of the loss")
        plt.plot(range(len(cost)), cost, color="red")
        plt.show()
        print("magnitude of the loss", cost[0:5])
        print("b--", b_all[0:5])
        print("w--", w_all[0:5])
        print("-------It's over-----")

if __name__ == '__main__':
    obj = Linear_Regression()
    obj.main()

give the result as follows

[the transfer of external chain pictures fails. The source station may have an anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-5nm3Yv9p-1645517549305)(picture/image-20211020194136114.png)]

1.2 multiple linear regression

1.2.1 problems and precautions

When solving the linear regression model, there are two problems that need to be paid attention to

One is the problem of feature combination. For example, the length and width of the house are used as two features to participate in the construction of the model. It is better to multiply them to obtain the area and then solve it as a feature. In this way, the dimension of feature selection is reduced.

The second is Feature Scaling, which is also a problem that many machine learning models need to pay attention to.

In some models, the optimal solution is not equivalent to the original after uneven scaling in various dimensions, such as SVM. For such a model, unless the distribution range of each dimensional data is relatively close, it must be standardized to avoid model parameters being dominate d by data with large or small distribution range.

In some models, the optimal solution is equivalent to the original after uneven scaling in various dimensions, such as logistic regression. For such a model, whether it is standardized or not will not change the optimal solution in theory. However, because the iterative algorithm is often used in the actual solution, if the shape of the objective function is too "flat", the iterative algorithm may converge very slowly or even not. Therefore, for the model with scalability invariance, it is best to standardize the data.

1) Normalized benefits

  • Improve the convergence speed of the model

    As shown in the following figure, the value of x1 is 0-2000, while the value of x2 is 1-5. If there are only these two features, a narrow and long ellipse will be obtained when optimizing it, resulting in a zigzag route in the direction of vertical contour line when the gradient decreases, which will make the iteration very slow. In contrast, the iteration in the right figure will be very fast

    Not normalized

    [the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-nyBc5jdI-1645517549306)(picture/20171126162643419.png)]

    After normalization

    [the transfer of external chain pictures fails. The source station may have an anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-50FlgnI6-1645517549307)(picture/20171126162702871.png)]

  • Improve the accuracy of the model

    Another advantage of normalization is to improve the accuracy, which is effective when some distance calculation algorithms are involved. For example, the algorithm needs to calculate Euclidean distance, as shown in the figure above x 2 x_2 The value range of x2 is relatively small, and when it comes to distance calculation, it has a far greater impact on the results x 1 x_1 x1 brings small, so it will cause the loss of accuracy. So normalization is necessary. It can make the contribution of each feature to the result the same.

2) Common planning methods

  • Linear normalization

    (Min-Max Normalization)

    Linear normalization will convert the input data to the range of [0,1]. The formula is as follows

    X n o r m = X − X m i n X m a x − X m i n X_{norm}=\frac{X-X_{min}}{X_{max}-X_{min}} Xnorm​=Xmax​−Xmin​X−Xmin​​

  • 0-means standardization

    0 mean normalization method normalizes the original data set into a data set with mean value of 0 and variance of 1. The normalization formula is as follows:

    z = x − μ σ z=\frac{x-\mu}{\sigma} z=σx−μ​

    Among them, μ,σ They are the mean and method of the original data set. This normalization method requires that the distribution of the original data can be approximately Gaussian distribution, otherwise the normalization effect will become very bad.

1.2.2 model introduction

Model

h θ ( x ) = θ T X = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_\theta(x)=\theta^TX=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n hθ​(x)=θTX=θ0​+θ1​x1​+θ2​x2​+⋯+θn​xn​

After obtaining the model, we need to find the required loss function, * * general linear regression, we use the mean square error as the loss function** The algebraic expression of the loss function is as follows:

J = ∑ i = 0 m ( h θ ( x ) − y i ) J=\sum_{i=0}^m(h_\theta(x)-y_i) J=∑i=0m​(hθ​(x)−yi​)

Expressed in matrix as J ( θ ) = 1 2 ( X θ − Y ) T ( X θ − Y ) J(\theta)=\frac{1}{2}(X_\theta-Y)^T(X_\theta-Y) J(θ)=21​(Xθ​−Y)T(Xθ​−Y)

We often use two methods to minimize the loss function θθ Parameters: one is gradient descent method and the other is least square method.

If the gradient descent method is used, then θ \theta θ The iterative formula is as follows:

θ = θ − α X T ( θ X − Y ) \theta=\theta-\alpha X^T(\theta X-Y) θ=θ−αXT(θX−Y)​

  def cost(self, X, theta, y):
        '''
        Calculate loss function
        :param X: matrix
        :param theta: step
        :param y: result
        :return: matrix X Function of
        '''
        m = X.shape[0]
        temp = X.dot(theta) - y
        return temp.T.dot(temp) / (2 * m)
      
  def gradient_descent(self, X, theta, y, alpha, iterations):
        '''
        :param X: matrix X
        :param theta: variable
        :param y: result
        :param alpha: step
        :param iterations: Number of iterations
        :return:
        '''
        m = X.shape[0]
        print("Number of rows--", m)
        print("y Number of rows", len(y))
        c = []  # Store calculated loss value
        for i in range(iterations):
            theta -= (alpha / m) * X.T.dot(X.dot(theta) - y)
            # Calculate loss value
            c.append(self.cost(X, theta, y))

        return theta, c

If the least square method is used, then θ The result formula is as follows:

θ = ( X T X ) − 1 X T Y \theta=(X^TX)^{-1}X^TY θ=(XTX)−1XTY​

python code is as follows

def normal_equation(X, y):
    return np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)

Of course, linear regression and other commonly used algorithms, such as Newton method and quasi Newton method, are not described in detail here.

Note that in the business of data fitting, we should pay attention to the problem of multicollinearity and the test of regression equation.

1.2.3 topic introduction

In this section, you will implement a linear regression with multiple variables to predict house prices. Suppose you want to sell your house and want to know what a good market price is.

One way is to first collect information about recently sold houses and make a house model price.

1.2.4 data introduction

2104,3,399900
1600,3,329900
2400,3,369000
1416,2,232000
3000,4,539900
1985,4,299900
1534,3,314900
1427,3,198999
1380,3,212000
1494,3,242500
1940,4,239999
2000,3,347000
1890,3,329999
4478,5,699900

ex1data2.txt file contains the price training set of Oregon port land.

The first pillar is the size of the house (square feet)

The second column is the number of bedrooms,

The third column is the price of the house.

1.2.5 codes are as follows:

import matplotlib.pyplot as plt
import numpy as np


class Linear_Regression:

    def normal_equation(self, X, y):
        '''
        least square method
        :param X:
        :param y:
        :return:
        '''
        return np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)

    def cost(self, X, theta, y):
        '''
        Calculate loss function
        :param X: matrix
        :param theta: step
        :param y: result
        :return: matrix X Function of
        '''
        m = X.shape[0]
        temp = X.dot(theta) - y
        return temp.T.dot(temp) / (2 * m)

    def gradient_descent(self, X, theta, y, alpha, iterations):
        '''

        :param X: matrix X
        :param theta: variable
        :param y: result
        :param alpha: step
        :param iterations: Number of iterations
        :return:
        '''
        m = X.shape[0]
        print("Number of rows--", m)
        print("y Number of rows", len(y))
        c = []  # Store calculated loss value
        for i in range(iterations):
            theta -= (alpha / m) * X.T.dot(X.dot(theta) - y)
            # Calculate loss value
            c.append(self.cost(X, theta, y))

        return theta, c

    def maxminnorm(self, array):
        '''
        Numpy Normalization of array
        :param array: Normalized array required
        :return:
        '''
        maxcols = array.max(axis=0)
        mincols = array.min(axis=0)
        data_shape = array.shape
        data_rows = data_shape[0]
        data_cols = data_shape[1]
        t = np.empty((data_rows, data_cols))
        for i in range(data_cols):
            t[:, i] = (array[:, i] - mincols[i]) / (maxcols[i] - mincols[i])
        return t

    def __init__(self):
        # Read the data in the file a.txt with the separator ",", and read the data in double format
        self.data = np.loadtxt('ex1data2.txt', delimiter=',', dtype=np.float64)
        # m sets the number of rows
        self.m = self.data.shape[0]
        # Data normalization
        self.normalization_data = self.maxminnorm(self.data)

    def main(self):
        '''
        Main function
        :return:
        '''
        theta = np.zeros((2,))
        print(theta)
        alpha = 0.1
        iterations = 10000
        # The first and second columns are X, data
        X = self.normalization_data[:, 0:2]
        # The third column is the price
        y = self.normalization_data[:, 2]
        theta, c = self.gradient_descent(X=X,
                                         theta=theta,
                                         y=y,
                                         alpha=alpha,
                                         iterations=iterations)
        # Visual descent process
        plt.rcParams["font.sans-serif"] = ["SimHei"]
        plt.rcParams["axes.unicode_minus"] = False
        plt.plot()
        plt.title("loss function  J(θ)")
        plt.xlabel("Reception times")
        plt.ylabel("magnitude of the loss")
        plt.plot([i for i in range(iterations)], c, color="red")
        plt.show()
        print("Use gradient descent:", theta)
        print("Using least squares", self.normal_equation(X, y))
        pass


if __name__ == '__main__':
    obj = Linear_Regression()
    obj.main()

give the result as follows

[the external chain picture transfer fails. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-74PRp4m5-1645517549309)(picture/image-20211103170549479.png)]

Topics: Machine Learning