Some records and summary of learning modeling in the past six months

## 1, Linear region

For such a data with obvious linear trend, we should find a straight line to make it have the ability to predict the trend of the data, and adopt the least square estimates, that is, the fitted straight line conforms to the criterion of minimizing the sum of squares of residuals. The following model can be obtained by translating the above into mathematical language

This is a convex optimization problem. We can use the method of advanced mathematics to substitute the constraints into the objective function to obtain an unconstrained optimization, and then calculate the partial derivatives of a and b respectively to obtain the estimates of a and b. statistically, it can be proved that LSE obtains the unbiased estimates of parameters a and b, i.e

In addition, the estimation of parameters a and b can also be obtained by maximum likelihood estimation, and the result is the same as LSE.

Sum of squares of total deviations: SST

Sum of squares of regression: SSR

Sum of squares of residuals: SSE

It can be proved that the sum of three squares satisfies

According to the least square criterion, for different models, the sum of squares of residuals of different models can be compared to select the model with the best fitting ability

Define goodness of fitThe smaller the SSE, the greater the value, but it will not exceed 1.

Supply: in machine learning, the error of the model is divided into training error and generalization error. The R side here can only reflect the training error of the model, and a small training error does not necessarily mean that the model has excellent prediction ability, that is, a small training error does not mean a small generalization error. When the model is over fitted, we think it does not have the ability to predict data. For details, refer to Li Hang's statistical machine learning, which will not be described in detail in this paper. When performing polynomial regression on two-dimensional data, the polynomial degree used is too high, which often leads to over fitting (which can be imagined as the extreme case of interpolation).

## 2, Visualization of + ggplot in R language

lm function can be used not only for univariate linear fitting, but also for multivariate linear and nonlinear fitting.

##Input dataset x <- c(0.1, 0.11 ,0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.20, 0.21, 0.23) y <- c(42 ,43 ,45 ,45 ,45 ,47.5 ,49 ,53 ,50 ,55 ,55 ,60) ##Create data frame linear <- data.frame(x,y) ##fitting model <- lm(y~x,data=linear) ##see summary(model) ##Output results Call: lm(formula = y ~ x, data = linear) Residuals: Min 1Q Median 3Q Max -2.00449 -0.63600 -0.02401 0.71297 2.32451 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 28.083 1.567 17.92 6.27e-09 *** x 132.899 9.606 13.84 7.59e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.309 on 10 degrees of freedom Multiple R-squared: 0.9503, Adjusted R-squared: 0.9454 F-statistic: 191.4 on 1 and 10 DF, p-value: 7.585e-08

The model gives the estimated value of each parameter. Estimate ＾ y = 28.083+132.899x ＾ it can be seen that the p-value is very small, so it can be considered that our regression equation is significant

We can use the function confint(model) to get the confidence interval of the model in the range of 2.5% - 97.5%

confint(model) 2.5 % 97.5 % (Intercept) 24.59062 31.57455 x 111.49556 154.30337

You can use geom in ggplot_ Smooth function for model visualization

geom_smooth(data,formula,method,se=T,colour,size)

data ~ dataset

formula ~ fitting rule can refer to lm function

Method ~ fitting method {loss: locally weighted regression} lm: linear regression glm: generalized linear regression gam: generalized additive regression

Colour ~ line colour

size ~ line width

se ~ add confidence interval. The default value is True

ggplot(linear,aes(x=x,y=y))+ geom_point(shape=1,colour="black")+ ##geom_abline(intercept = 28.083,slope=132.899) geom_smooth(method = lm,se = F,colour = "steelblue",size=0.5)+ theme(plot.title = element_text(hjust=0.5,size=15))+ labs(title="Linear Regression")+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(),axis.line =element_line(colour = "black"))+ annotate("text",x=0.2,y=58,label="y=132.9x+28.08")+ annotate("text",x=0.2,y=59,parse=TRUE, label = "atop(R^2==0.9503)",size=4)

If the formula to be fitted is y=ax^2+b, the following code can be used

x <- c(0.1, 0.11 ,0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.20, 0.21, 0.23) y <- c(42 ,43 ,45 ,45 ,45 ,47.5 ,49 ,53 ,50 ,55 ,55 ,60) ##Create data frame unlinear <- data.frame(x,y) model <- lm(y~I(x^2),unlinear) summary(model) Call: lm(formula = y ~ I(x^2), data = linear) Residuals: Min 1Q Median 3Q Max -1.46660 -0.59878 -0.07453 0.32904 2.95051 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 38.3482 0.8144 47.09 4.5e-13 *** I(x^2) 404.8879 27.5137 14.72 4.2e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.234 on 10 degrees of freedom Multiple R-squared: 0.9559, Adjusted R-squared: 0.9514 F-statistic: 216.6 on 1 and 10 DF, p-value: 4.201e-08

It can be seen that the multiple R-squared = 0.9559 at this time is even higher than the 0.9503 of the linear model

y = 38.3482+404.8879x^2

When using lm function for nonlinear fitting, I() must be used in formula

When selecting the model, we should consider not only its training error, but also its generalization ability,

For large sample data, a method is given here, which can take the data to be fitted as non training set and prediction set, use the training set for fitting, and then calculate the sum of squares of residuals of the model on the prediction set, which can be approximately regarded as its generalization error.

In machine learning, a penalty term, regularization, is often added to the error function to prevent overfitting. Its principle is to reduce the parameter variance in the model parameter space (the variance of parameter estimation is often large in overfitting)

The regularization term generally selects the L1 normal form (absolute value) of the parameter vector and the L2 normal form (square sum square) of the parameter vector

The former is called lasso regression and the latter is called ridge regression. For details, please refer to the relevant contents of machine learning, which will not be described in this paper

ps: there's a thing called Occam razor principle. The simpler the model, the better (personal understanding)