Machine learning linear regression

Posted by cavedave on Sun, 20 Feb 2022 11:25:16 +0100

linear regression

Linear regression, also known as ordinary least square method, is the simplest and most classic regression method for regression problems

1. Use numpy linear regression

(1) Function: p = polyfit(x,y,n)

x is the abscissa of the known discrete data point, y is the ordinate of the known discrete data point,
n is the highest power of the polynomial to be fitted, which is given by us, and different polynomials are used for fitting

p = polyfit(x,y,n) return value p is the coefficient of polynomial p(x) from high degree to low degree from left to right, and the length is n+1
p(x)=p1xn+p2xn−1+...+pnx+p(n+1)

(2) Function: NP poly1d()
The return value of this function is a polynomial equation you sum up

(3) Function: y=polyval(p,x);
The value of the dependent variable y corresponding to x is obtained according to the fitted function

Determination of order of polynomial n:

1.1 fitting straight line

First, let's simulate the data scatter:

import numpy as np
import matplotlib.pyplot as plt

Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2])
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3])

Draw a scatter image:

import numpy as np
import matplotlib.pyplot as plt

Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2])
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3])
plt.scatter(Xi,Yi)
plt.show()

Output result:

Next, we use the three functions just mentioned to fit the straight line

import numpy as np
import matplotlib.pyplot as plt
# Deal with garbled code
matplotlib.rcParams['font.sans-serif'] = ['SimHei']  # Display Chinese in bold
#Analog data
Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2])
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3])

X=np.linspace(1,10,100)
yi_fit=np.polyfit(Xi,Yi,1)#The coefficients of the polynomial are calculated by the least square method, and the highest power of the polynomial is 1

yi_1d=np.poly1d(yi_fit)##Fit the equation according to the calculated coefficient

yi_hat=yi_1d(X)#Substitute the abscissa and calculate the fitted y coordinate
plt.scatter(Xi,Yi)
plt.plot(X,yi_hat,c='red')#Draw the fitted line image
plt.show()

Output result:

Let's output these two items:

print("yi_fit:",yi_fit)
yi_1d=np.poly1d(yi_fit)
print("yi_1d",yi_1d)
yi_hat=yi_1d(X)


Output result:

yi_fit: [0.90045842 0.83105566]
yi_1d :
0.9005 x + 0.8311
 The process has ended with exit code 0

1.2 high order polynomial fitting curve

Firstly, we use trigonometric function to simulate data scatter and draw the image

import numpy as np
import matplotlib.pyplot as plt

#Analog data
def f(x):
    return 2 * np.sin(x) + 3
# X and Y
x = np.linspace(0, 4 * np.pi)
y = f(x) + 0.2 * np.random.randn(len(x))  # add noise 

plt.scatter(x,y)
plt.show()

Output result:

Next, we use the three functions just now and use higher-order polynomials to fit the curve

import numpy as np
import matplotlib.pyplot as plt

#Analog data
def f(x):
    return 2 * np.sin(x) + 3
# X and Y
x = np.linspace(0, 2 * np.pi)#Set the definition field of trigonometric function
y = f(x) + 0.2 * np.random.randn(len(x))  # add noise 

y_fit = np.polyfit(x, y, 3)#The coefficients of the polynomial are calculated by the least square method. At this time, the highest power of the polynomial = 3
print('y_fit:', y_fit)

y_fit_1d = np.poly1d(y_fit)#Fit the equation according to the calculated coefficient
print('y_fit_1d:\n', y_fit_1d)

y_hat = np.polyval(y_fit, x)#Substitute the data and calculate the fitting y coordinate
# This form can also be: y_hat = y_fit_1d(x)
print('y_hat:', y_hat)

print('Correlation coefficients:')#correlation coefficient
print(np.corrcoef(y_hat, y))

plt.figure(dpi=200)
plot1 = plt.plot(x, y, 'o', label='Original Values')
plot3 = plt.plot(x, f(x), 'g', label='Original Curve')
plot2 = plt.plot(x, y_hat, 'r', label='Fitting Curve')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('Polyfitting')
plt.show()

Output result:

If we change the simulated data definition domain to (0~4np.pi) without changing the highest power of the polynomial, see what the fitted curve is like:

import numpy as np
import matplotlib.pyplot as plt

#Analog data
def f(x):
    return 2 * np.sin(x) + 3
# X and Y
x = np.linspace(0, 4 * np.pi)#Set the definition field of trigonometric function
y = f(x) + 0.2 * np.random.randn(len(x))  # add noise 

y_fit = np.polyfit(x, y, 3)#The coefficients of the polynomial are calculated by the least square method. At this time, the highest power of the polynomial = 3
print('y_fit:', y_fit)

y_fit_1d = np.poly1d(y_fit)#Fit the equation according to the calculated coefficient
print('y_fit_1d:\n', y_fit_1d)

y_hat = np.polyval(y_fit, x)#Substitute the data and calculate the fitting y coordinate
# This form can also be: y_hat = y_fit_1d(x)
print('y_hat:', y_hat)

print('Correlation coefficients:')#correlation coefficient
print(np.corrcoef(y_hat, y))

plt.figure(dpi=200)
plot1 = plt.plot(x, y, 'o', label='Original Values')
plot3 = plt.plot(x, f(x), 'g', label='Original Curve')
plot2 = plt.plot(x, y_hat, 'r', label='Fitting Curve')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('Polyfitting')
plt.show()

Output:
Correlation coefficients:
[[1. 0.49067649]
[0.49067649 1. ]]

Obviously, the fitting is not very accurate, so we need to change the highest power to an appropriate number
y_fit = np.polyfit(x, y, 7)

.
Output:
Correlation coefficients:
[[1. 0.99209803]
[0.99209803 1. ]]


If the correlation coefficient is close to 1, it means that the fitting is very accurate

2. Use sklearn for regression

When using sklearn for regression, the data needs to be in two-dimensional form

2.1 fitting straight line

from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
##Sample data (Xi,Yi) needs to be converted into array (list) form
Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2]).reshape(-1,1)
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3]).reshape(-1,1)
##Set model
model = linear_model.LinearRegression()
##Training data
model.fit(Xi, Yi)
##Using the trained model to predict the data
y_plot = model.predict(Xi)
##Print weights for linear equations
print(model.coef_) ## 0.90045842
##mapping
plt.scatter(Xi, Yi, color='red',label="ynagben",linewidth=2)
plt.plot(Xi, y_plot, color='green',label="nihe",linewidth=2)
plt.legend(loc='best')
plt.show()

Output:

Fitting curve

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
##Sample data (Xi,Yi) needs to be converted into array (list) form
Xi=np.array([1,2,3,4,5,6]).reshape(-1,1)
#Yi=np.array([9,18,31,48,69,94])
Yi=np.array([9.1,18.3,32,47,69.5,94.8]).reshape(-1,1)
##Ridge regression is specified here as the basis function
model = make_pipeline(PolynomialFeatures(2), Ridge())
model.fit(Xi, Yi)
##According to the prediction results of the model
y_plot = model.predict(Xi)
##mapping
plt.scatter(Xi, Yi, color='red',label="s",linewidth=2)
plt.plot(Xi, y_plot, color='green',label="n",linewidth=2)
plt.legend(loc='lower right')
plt.show()

Output:

Topics: Machine Learning