# Machine learning linear regression

Posted by cavedave on Sun, 20 Feb 2022 11:25:16 +0100

Linear regression, also known as ordinary least square method, is the simplest and most classic regression method for regression problems

## 1. Use numpy linear regression

(1) Function: p = polyfit(x,y,n)

x is the abscissa of the known discrete data point, y is the ordinate of the known discrete data point,
n is the highest power of the polynomial to be fitted, which is given by us, and different polynomials are used for fitting

p = polyfit(x,y,n) return value p is the coefficient of polynomial p(x) from high degree to low degree from left to right, and the length is n+1
p(x)=p1xn+p2xn−1+...+pnx+p(n+1)

(2) Function: NP poly1d()
The return value of this function is a polynomial equation you sum up

(3) Function: y=polyval(p,x);
The value of the dependent variable y corresponding to x is obtained according to the fitted function

Determination of order of polynomial n:

### 1.1 fitting straight line

First, let's simulate the data scatter:

```import numpy as np
import matplotlib.pyplot as plt

Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2])
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3])

```

Draw a scatter image:

```import numpy as np
import matplotlib.pyplot as plt

Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2])
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3])
plt.scatter(Xi,Yi)
plt.show()
```

Output result: Next, we use the three functions just mentioned to fit the straight line

```import numpy as np
import matplotlib.pyplot as plt
# Deal with garbled code
matplotlib.rcParams['font.sans-serif'] = ['SimHei']  # Display Chinese in bold
#Analog data
Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2])
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3])

X=np.linspace(1,10,100)
yi_fit=np.polyfit(Xi,Yi,1)#The coefficients of the polynomial are calculated by the least square method, and the highest power of the polynomial is 1

yi_1d=np.poly1d(yi_fit)##Fit the equation according to the calculated coefficient

yi_hat=yi_1d(X)#Substitute the abscissa and calculate the fitted y coordinate
plt.scatter(Xi,Yi)
plt.plot(X,yi_hat,c='red')#Draw the fitted line image
plt.show()
```

Output result: Let's output these two items:

```print("yi_fit:",yi_fit)
yi_1d=np.poly1d(yi_fit)
print("yi_1d",yi_1d)
yi_hat=yi_1d(X)

```

Output result:

```yi_fit: [0.90045842 0.83105566]
yi_1d :
0.9005 x + 0.8311
The process has ended with exit code 0

```

### 1.2 high order polynomial fitting curve

Firstly, we use trigonometric function to simulate data scatter and draw the image

```import numpy as np
import matplotlib.pyplot as plt

#Analog data
def f(x):
return 2 * np.sin(x) + 3
# X and Y
x = np.linspace(0, 4 * np.pi)
y = f(x) + 0.2 * np.random.randn(len(x))  # add noise

plt.scatter(x,y)
plt.show()

```

Output result: Next, we use the three functions just now and use higher-order polynomials to fit the curve

```import numpy as np
import matplotlib.pyplot as plt

#Analog data
def f(x):
return 2 * np.sin(x) + 3
# X and Y
x = np.linspace(0, 2 * np.pi)#Set the definition field of trigonometric function
y = f(x) + 0.2 * np.random.randn(len(x))  # add noise

y_fit = np.polyfit(x, y, 3)#The coefficients of the polynomial are calculated by the least square method. At this time, the highest power of the polynomial = 3
print('y_fit:', y_fit)

y_fit_1d = np.poly1d(y_fit)#Fit the equation according to the calculated coefficient
print('y_fit_1d:\n', y_fit_1d)

y_hat = np.polyval(y_fit, x)#Substitute the data and calculate the fitting y coordinate
# This form can also be: y_hat = y_fit_1d(x)
print('y_hat:', y_hat)

print('Correlation coefficients:')#correlation coefficient
print(np.corrcoef(y_hat, y))

plt.figure(dpi=200)
plot1 = plt.plot(x, y, 'o', label='Original Values')
plot3 = plt.plot(x, f(x), 'g', label='Original Curve')
plot2 = plt.plot(x, y_hat, 'r', label='Fitting Curve')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('Polyfitting')
plt.show()
```

Output result: If we change the simulated data definition domain to (0~4np.pi) without changing the highest power of the polynomial, see what the fitted curve is like:

```import numpy as np
import matplotlib.pyplot as plt

#Analog data
def f(x):
return 2 * np.sin(x) + 3
# X and Y
x = np.linspace(0, 4 * np.pi)#Set the definition field of trigonometric function
y = f(x) + 0.2 * np.random.randn(len(x))  # add noise

y_fit = np.polyfit(x, y, 3)#The coefficients of the polynomial are calculated by the least square method. At this time, the highest power of the polynomial = 3
print('y_fit:', y_fit)

y_fit_1d = np.poly1d(y_fit)#Fit the equation according to the calculated coefficient
print('y_fit_1d:\n', y_fit_1d)

y_hat = np.polyval(y_fit, x)#Substitute the data and calculate the fitting y coordinate
# This form can also be: y_hat = y_fit_1d(x)
print('y_hat:', y_hat)

print('Correlation coefficients:')#correlation coefficient
print(np.corrcoef(y_hat, y))

plt.figure(dpi=200)
plot1 = plt.plot(x, y, 'o', label='Original Values')
plot3 = plt.plot(x, f(x), 'g', label='Original Curve')
plot2 = plt.plot(x, y_hat, 'r', label='Fitting Curve')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('Polyfitting')
plt.show()
```

Output:
Correlation coefficients:
[[1. 0.49067649]
[0.49067649 1. ]] Obviously, the fitting is not very accurate, so we need to change the highest power to an appropriate number
y_fit = np.polyfit(x, y, 7)

Output:
Correlation coefficients:
[[1. 0.99209803]
[0.99209803 1. ]] If the correlation coefficient is close to 1, it means that the fitting is very accurate

## 2. Use sklearn for regression

When using sklearn for regression, the data needs to be in two-dimensional form

### 2.1 fitting straight line

```from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
##Sample data (Xi,Yi) needs to be converted into array (list) form
Xi=np.array([6.19,2.51,7.29,7.01,5.7,2.66,3.98,2.5,9.1,4.2]).reshape(-1,1)
Yi=np.array([5.25,2.83,6.41,6.71,5.1,4.23,5.05,1.98,10.5,6.3]).reshape(-1,1)
##Set model
model = linear_model.LinearRegression()
##Training data
model.fit(Xi, Yi)
##Using the trained model to predict the data
y_plot = model.predict(Xi)
##Print weights for linear equations
print(model.coef_) ## 0.90045842
##mapping
plt.scatter(Xi, Yi, color='red',label="ynagben",linewidth=2)
plt.plot(Xi, y_plot, color='green',label="nihe",linewidth=2)
plt.legend(loc='best')
plt.show()
```

Output: ### Fitting curve

```from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
##Sample data (Xi,Yi) needs to be converted into array (list) form
Xi=np.array([1,2,3,4,5,6]).reshape(-1,1)
#Yi=np.array([9,18,31,48,69,94])
Yi=np.array([9.1,18.3,32,47,69.5,94.8]).reshape(-1,1)
##Ridge regression is specified here as the basis function
model = make_pipeline(PolynomialFeatures(2), Ridge())
model.fit(Xi, Yi)
##According to the prediction results of the model
y_plot = model.predict(Xi)
##mapping
plt.scatter(Xi, Yi, color='red',label="s",linewidth=2)
plt.plot(Xi, y_plot, color='green',label="n",linewidth=2)
plt.legend(loc='lower right')
plt.show()
```

