# Linear regression

Posted by Tanus on Fri, 18 Feb 2022 04:07:40 +0100

# Univariate linear regression

## Model description

For the model affected by a single variable, we usually abstract it into the form of univariate primary function, and we assume a hypothetical function H( θ)=θ 0+ θ 1x

## Cost function

When we make a hypothetical function, we need to determine θ 0 and θ 1 the values of the two coefficients, so that the solved hypothetical function can better fit the data. At this time, we need to introduce the cost function to calculate the difference between the value obtained by the determined function and the value corresponding to the original data.
The specific formula is as follows:

With the cost function, we need to continuously reduce the cost through the cost function to achieve a better solution. At this time, we need to use gradient descent, which is also a method that will be used in many algorithms later.
The specific forms are as follows:

It should be noted here that all data should be calculated first θ The value is being updated.

## example

If we assume that the profit of opening a store is linked to the population where it is located, it is obvious that we can assume a univariate linear function to fit it.

```'''
Author: csc
Date: 2021-05-02 11:00:18
LastEditTime: 2021-05-03 23:53:47
Description: In User Settings Edit
FilePath: \undefinedd:\pycharm_code\simple_varia\main.py
'''
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def costfunction(x, y, theta):
inner = np.power(x @ theta - y, 2)
return np.sum(inner) / (2 * len(x))

def gradientDescent(x, y, theta, alpha, iters):
costs = []

for i in range(iters):
theta = theta - (x.T @ (x @ theta - y)) * alpha / len(x)
cost = costfunction(x, y, theta)
costs.append(cost)

if i % 100 == 0:
print(cost)

return theta, costs

def main():
names=['population', 'profit'])

data.plot.scatter('population', 'profit', label='population')
data.insert(0, 'ones', 1)

x = data.iloc[:, 0:-1]
y = data.iloc[:, -1]
x = x.values
y = y.values
y = y.reshape(x.shape[0], 1)
theta = np.zeros((x.shape[1], 1))

cost_init = costfunction(x, y, theta)
print("cost_init is  ", end=' ')
print(cost_init)

alpha = 0.01
iters = 2000
theta, costs = gradientDescent(x, y, theta, alpha, iters)

fig, ax = plt.subplots()
ax.plot(np.arange(iters), costs)
ax.set(xlabel='iters',
ylabel='cost',
title='cost vs iters')
plt.savefig("cost.png")
plt.show()

_x = np.linspace(y.min(), y.max(), 100)
_y = theta[0, 0] + theta[1, 0] * _x
fig, ax = plt.subplots()
ax.scatter(x[:, 1], y, label='training data')
ax.plot(_x, _y, 'r', label='predict')
ax.legend()
ax.set(xlabel='populaiton',
ylabel='profit')
plt.savefig("result.png")
plt.show()

if __name__ == '__main__':
main()

```

# Multivariate linear regression

## Feature scaling & learning rate

When our predicted value is no longer determined by one variable, but by multiple variables, the dimension of each variable needs to be considered at this time. If their data differ too much in the coordinate axis, the gradient will decline slowly and affect the operation efficiency of the algorithm.
X = (x-u) / s usually we use this form for feature scaling, where u is the mean, s can be the maximum or standard deviation.
Ideally, the feature scaling is between [- 1,1], but the actual scaling to [- 3,3] is also acceptable

It has been proved here that the learning rate is small enough, J( θ) Will be small enough

## example

Assuming a house is ready to be sold, it is known that the price of the house is affected by the size of the area and the number of bedrooms.

```'''
Author: csc
Date: 2021-05-02 18:31:06
LastEditTime: 2021-05-04 10:25:33
Description: In User Settings Edit
FilePath: \undefinedd:\pycharm_code\mul_varia\main.py
'''
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def normalize_feature(data):
return (data - data.mean()) / data.std()

def costfunction(x, y, theta):
inner = np.power(x @ theta - y, 2)
return np.sum(inner) / (2 * len(x))

def gradientDescent(x, y, theta, alpha, iters):
costs = []

for i in range(iters):
theta = theta - (x.T @ (x @ theta - y)) * alpha / len(x)
cost = costfunction(x, y, theta)
costs.append(cost)

if i % 100 == 0:
print(cost)

return theta, costs

def main():
names=['size', 'bedrooms', 'price'])
# normalization
data = normalize_feature(data)
# deal data
data.insert(0, 'ones', 1)
x = data.iloc[:, 0:-1]
y = data.iloc[:, -1]
x = x.values
y = y.values
y = y.reshape(x.shape[0], 1)
# cost
theta = np.zeros((x.shape[1], 1))
cost_init = costfunction(x, y, theta)
print()
print(cost_init)
candinate_alpha = [0.0003, 0.003, 0.03, 0.0001, 0.001, 0.01]
iters = 2000
fig, ax = plt.subplots()
for alpha in candinate_alpha:
_, costs = gradientDescent(x, y, theta, alpha, iters)
print()
ax.plot(np.arange(iters), costs, label=alpha)
ax.legend()

ax.set(xlabel='iters',
ylabel='cost',
title='cost vs iters')
plt.savefig("alpha.png")
plt.show()

if __name__ == '__main__':
main()

```

# Normal equation

## concept

In the linear regression equation described above, we use the gradient descent algorithm to solve it iteratively θ， Although it is also applicable to many methods in later learning - strong applicability, it will become very annoying when we have enough variables. So is there a way to solve it directly - normal equation. As proved by mathematicians, the normal equation is as follows:

θ=(XTX)-1XTy

Obviously, we use i the inverse matrix, but we don't need to consider whether it is reversible. Even if it is irreversible, it can still be processed in the packaged function. There are two main reasons for the irreversibility of the matrix: 1. Too many characteristic variables; 2. Redundant characteristic variables

However, there are advantages and disadvantages. Although the normal equation is concise and can be solved directly without learning rate and iteration, we note that it performs multiple matrix multiplication operations, and the time complexity of each matrix operation is O(n3), so it can be used when the number of data sets m is tens of thousands or less. For larger data sets, we use the gradient descent method. Due to the characteristics of the gradient descent method, it can still work smoothly when the data set is large

## code

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def normalEquation(x, y):
theta = np.linalg.inv(x.T@x)@x.T@y
return theta

def main():
names=['population', 'profit'])
data.insert(0, "ones", 1)
x = data.iloc[:, 0:-1]
y = data.iloc[:, -1]
x = x.values
y = y.values
y = y.reshape(x.shape[0], 1)
theta = normalEquation(x, y)
print(theta)

if __name__ == '__main__':
main()

```