11. Integrated learning practice

Posted by DocUK on Mon, 17 Jan 2022 23:56:47 +0100

API parsing

AdaBoost algorithm (reduction coefficient needs to be added to solve the problem of infinite amplification caused by abnormal data, resulting in error)

parameter
AdaBoostClassifier
AdaBoostRegressor

base_estimator

Weak classifier object, the default is CART classification tree DecisionTreeClassifier;

Weak regressor object, the default is CART regression tree DecisionTreeRegressor;

algorithm

Samme and samme R: Samme indicates that the classification effect of the sample set is used as the weight of the weak classifier in the construction process; SAMME.R uses the prediction probability of classifying the sample set as the weight of the weak classifier. Due to samme R uses continuous probability measures, so the general iteration is faster than samme, and the default parameter is samme R; Emphasis: use samme R must require base_ The weak classifier model specified by the estimator must support probability prediction, that is, it has predict_proba method.

Not supported

loss

Not supported

Specify the calculation method of error. Optional parameters are "Iinear", "square", "exponential", and the default is "linear". Generally, it does not need to be changed
n_estimators
For the number of weak classifiers, too small value may lead to under fitting, and too large value may lead to over fitting. Generally, 50 ~ 100 is more suitable, and the default is 50

learning_rate

Specify the weight reduction coefficient v of each weak classifier, which is 1 by default: generally, the parameter is adjusted from a relatively small value; The smaller the value, the more weak classifiers are needed (if not added, the data processing in front of the exception data will be more accurate, and the data behind will be worse and worse)

GBDT algorithm

parameter
GradientBoostingClassifier
GradientBoostingRegressor

alpha

Not supported

When using the huber or quantile loss function, you need to give the quantile value, which is 0.9 by default; If there are many noise data, the parameter value can be appropriately reduced (the influence of abnormal data can be reduced by adjusting parameters)

loss

Given the loss function, log likelihood function deviation and exponential loss function are optional; The default is deviation: modification is not recommended
Given the loss function, the mean square deviation ls, absolute loss lad, huber loss huber, quantile loss quantile can be selected; Default Is; Generally, default Is adopted; If there are many noise data, huber Is recommended; In case of piecewise prediction, quantile Is recommended
n_ estimators
The maximum number of iterations. Too small value may lead to under fitting, and too large value may lead to over fitting. Generally, 50 ~ 100 is more suitable, and the default is 50
learning rate
Specify the weight reduction coefficient v of each weak classifier, which is 1 by default; -- Generally, the parameter is adjusted from a relatively small value: the smaller the value, the more weak classifiers are required
subsample
When the training model is given, the proportional value of subsampling is in the range of (0,1), and the default value is 1, which means that subsampling is not used; if the given value is less than 1, it means that some data are used for model training, which can reduce the over fitting of the model: it is recommended [0.5,0.8]; the sampling method is not to put back sampling
init
Given the initialized model, it can not be given

GDBT bottom layer regression code (bottom layer of lifting tree)

The core of GBDT is residual

There is a dependency between each base learner, and the current learner uses the last residual to calculate
Finally, the residual is getting smaller and smaller, which can make the sum of all base learners close to the real value

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
#Import dataset
df = pd.DataFrame([[1,5.56],[2,5.7],[3,5.91],[4,6.4],[5,6.8]
                   ,[6,7.05],[7,8.9],[8,8.7],[9,9],[10,9.05]],columns=["x","y"])
#Set the container for storing base learners and set the number of base learners
M = [] #An array that stores the decision tree model
n_trees = 4  #Set the number of trees
X = df.iloc[:,[0]]  #Construct X
Y = df.iloc[:,[-1]]  #Construct Y
y_ = Y(Here, you can obtain the information of the data by re assignment)
#Storage model
#Calculate the residual as the input value of the next base learner
for i in range(n_trees):
    model = DecisionTreeRegressor(max_depth=2).fit(X,Y)  #New decision tree model
    M.append(model)  #Add decision tree model to array
    Y_het = model.predict(X) #Output model predictions
    Y_het = pd.DataFrame(Y_het,columns=["y"]) #Convert model predictions to DataFrame
    print(Y_het)
    Y = Y - Y_het #Change the original Y and let the next learner continue learning
    print(i, Y)
#Set up a container to store all samples
#Record the value generated each time for addition and calculation
res = np.zeros(df.shape[0]) #Initialize all zero vector
for i in M: #Traversal model array
    res += i.predict(X) #The predicted values of each model are superimposed on the res variable
print(res) #Output the final predicted value for each sample label

GBDT library adjustment regression

#It is much simpler to adjust the library
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
import warnings
warnings.filterwarnings('ignore')
model = GradientBoostingRegressor(n_estimators=50)
model.fit(X, Y)
y_ = model.predict(X)
print(y_)
print(mean_squared_error(Y, y_))

Topics: Python Algorithm Machine Learning AI Deep Learning