Using the sklearn package of python language to realize the basic machine learning operation is very simple. After all, we only need to call part of the source code.

Basic machine learning operations are nothing more than four parts:

- Import corresponding packages and data
- Data preprocessing;
- Model establishment and fitting
- Model evaluation and prediction

# 1. Import corresponding packages and data

First, let's go to the first step, guide the package and data. Sklearn mainly includes six categories, which can be seen on the main page of the official website. Here I import five functions in sklearn, in which StandardScaler is used as data standardization and train_test_split is used as training set and test set for data segmentation, and RandomForestRegressor is used for the establishment of regression model.

## 1.1 Guide Package

import numpy as np import pandas as pd from sklearn.ensemble import RandomForestRegressor from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from matplotlib import pyplot as plt from xgboost import XGBRegressor as XGBR

## 1.2 import data

data = pd.read_csv(r'./data.csv') data.info()

# 2. Data preprocessing

The basic and necessary operations of data preprocessing include deletion of missing values, data standardization and data segmentation.

## 2.1 deletion of missing values

data.isnull().any()#Check for missing values data = data.dropna()#Delete missing values data.info()#View deleted data information

## 2.2 data standardization

#Extract characteristic variables and dependent variables XX = data.loc[:,['SAVI_MIN','SAVI_MEDIAN','SAVI_MAX', 'NEAR_DIST','NEAR_NUMBER', 'NDMI_MIN','NDMI_MEDIAN','NDMI_MAX', 'ASPECT', 'DEM','SLOPE', 'NDRE_MEDIAN','NDRE_MIN','NDRE_MAX', 'elev_percentile_30th', 'elev_percentile_60th','elev_percentile_90th', 'elev_mean','elev_variance' , 'CanopyCover', 'GapFraction', 'density_metrics[0]','density_metrics[3]', 'density_metrics[6]','density_metrics[9]',]] Y = data['cha_'].values #Data standardization X = StandardScaler().fit_transform(XX) Y = StandardScaler().fit_transform(Y.reshape(-1,1))

## 2.3 data segmentation

The most important function is train_test_split, you need to recite the full text on the official website. The parameters inside are more important, as well as the return value.

#Split 70% of the training set and 30% of the test set validation_size = 0.3 seed = 10 x_train,x_test,y_train,y_test = train_test_split(X,Y, test_size = validation_size,random_state = seed)

# 3. Model establishment and fitting

Establish the model and carry out model fitting. There are two steps in total. The first line of code is model establishment; The second line of code is model fitting (fitting should use training data fitting).

There are two most important parameters of random forest regression, which are n_estimators and max_depth, the model accuracy can be appropriately improved by adjusting.

RF=RandomForestRegression(n_estimators=50, n_jobs=-1,random_state=10) RF.fit(x_train,y_train)

# 4. Model evaluation and prediction

Common regression evaluation indicators include MSE, MAE, R2, etc. the formula can be found by yourself. The final result is the result of the model.

y_true = RF.predict(x_test) print("mse:",mean_squared_error(y_test,y_true))#MSE print("mae:",mean_absolute_error(y_test,y_true))#MAE print("r2:",r2_score(y_test,y_true))#R2

The above is the overall process of machine learning, thank you!