Article catalog
Support vector machine theory
summary
Support vector machine svm is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space, which makes it different from perceptron; Secondly, the perceptron has a kernel technique, which makes it a nonlinear classifier. The learning strategy of the perceptron is to maximize the interval
- Support vector machine classification
Classification of support vector machines: linear support vector machine in linearly separable case, linear support vector machine and non-linear support vector machine
Linear separable support vector machine
It is assumed that the input space and the feature space are two different spaces. The input space is Euclidean space or discrete set, and the feature space is Euclidean space or Hilbert space. Linear separable support vector machine and linear support vector machine assume that the elements of the two spaces correspond one to one, and map the input in the input space to the feature vector in the feature space. The goal of learning is to find a separated hyperplane in the feature space, which can divide the instances into different classes. The separated hyperplane corresponds to the equation w · x+b = 0, which is determined by the normal vector w and intercept B and can be expressed by (w,b). The separation hyperplane divides the feature space into two parts, one is positive class and the other is negative class. The normal vector points to a positive class on one side and a negative class on the other. The hyperplane is:
W\cdot x + b = 0The classification decision function is (linear separable vector machine):
f(x)=sign( W\cdot x + b)- [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ray7laem-16384374562)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/ZulEO9.png )]
Function interval: Generally speaking, the distance from the hyperplane can represent the confidence of classification prediction
|wx+b|It can represent the distance between point x and the hyperplane relatively
wx+bThe consistency between the symbol of and the symbol of y indicates whether the classification is correct
y(wx+b) to represent the correctness and certainty of classification, which is the concept of function interval. Geometric interval: functional interval can represent the correctness and certainty of classification prediction. However, when choosing the separation hyperplane, only the function interval is not enough. Because as long as we change w and b proportionally, for example, change them to 2w and 2b, the hyperplane does not change, but the function interval becomes twice the original. We can constrain the normal vector of the separation plane, such as normalization
||w||=1.
d = \frac{w}{||w||}\cdot x_i + \frac{b}{||w||}Where | w | is the L2 norm of W Interval maximization: the basic idea of support vector machine is to solve the separation hyperplane that can correctly divide the training set and has the largest geometric interval. The separation hyperplane with the largest geometric interval is unique Support vector: in the case of linear separability, the sample points of the training data set are separated from the nearest sample point support vector from the hyperplane
H_1 H_2All support vectors,
H_1: w\cdot x +b = 1 H_2: w\cdot x +b = -1among
H_1, H_2The between is the interval
\frac{2}{||w||}Is the interval boundary As shown in the figure below: [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-vsfkumso-163843744565)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/n9XEPb.png )]
Linear support vector machine
The support vector machine learning method for linear separable problems is not applicable to linear non separable training data Training dataset:
T={(x_1,y_1),(x_2,y_2),...(x_N,y_N)}Linear indivisibility is
(x_i,y_i)In order to solve this problem, a relaxation variable can be introduced for each sample point (x_i,y_i)
\xi_i\geq0, so that the function interval plus the relaxation variable is greater than or equal to 1.
y_i(w\cdot x_i + b) \geq 1-\xi_iBasic knowledge of reproduction
Detach hyperplane:
w^Tx+b=0Distance from point to line:
r=\frac{|w^Tx+b|}{||w||_2} ||w||_2Is 2-norm:
||w||_2=\sqrt[2]{\sum^m_{i=1}w_i^2}The straight line is a hyperplane, and the sample can be expressed as:
w^Tx+b\ \geq+1 w^Tx+b\ \leq -1Function interval:
label(w^Tx+b)\ or\ y_i(w^Tx+b)Geometric interval:
r=\frac{label(w^Tx+b)}{||w||_2}, when the data is correctly classified, the geometric interval is the distance from the point to the hyperplane
In order to maximize the geometric interval, the basic problem of SVM can be transformed into solving:(
\frac{r^*}{||w||}Is a geometric interval(
{r^*}Is a function interval)
\max\ \frac{r^*}{||w||} (subject\ to)\ y_i({w^T}x_i+{b})\geq {r^*},\ i=1,2,..,mFor the convenience of calculation
\max\ \frac{1}{||w||} s.t.\ y_i({w^T}x_i+{b})\geq {1},\ i=1,2,..,mObjective function:
\min\ \frac{1}{2}||w||^2+C\sum\xi_i\qquad s.t.\ y_i(w^Tx_i+b)\geq1-\xi_iThe derivation process is as follows:
- [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG hakhzbid-163843744568)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/28D53F8F-9310-432F-B8F9-2EE3331D77D9.png )] [the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-tt4hgmmh-16384374570)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/764C87BD-59EE-4F1A-BB0F-F19DA1A5F057.png )]
- [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-4nt8pzch-16384374573)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/F2313EB3-70C6-4282-9C74-908C36D98600.png )]
- [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-x1xg8tlm-16384374576)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/2B20B9F8-F861-4AED-9D74-B0A1B7E13325.png )]
numpy reproduction
# -*- coding:utf-8 -*- # /usr/bin/python import time import pandas as pd import numpy as np from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt from tqdm import tqdm class SVM(): def __init__(self,maxIter,kernel="linear"): '''parameter''' self.maxIter = maxIter self._kernel = kernel def initArgs(self,features,labels): '''Initialization parameters''' self.m, self.n = features.shape self.X = features self.Y = labels self.b = 0.0 # Save Ei in a list self.alpha = np.ones(self.m) self.E = [self._E(i) for i in range(self.m)] # Relaxation variable self.C = 1.0 def _KKT(self,i): y_g = self._g(i) * self.Y[i] if self.alpha[i] == 0: return y_g >= 1 elif 0 < self.alpha[i] < self.C: return y_g == 1 else: return y_g <= 1 # g(x) predicted value, enter xi (X[i]) def _g(self, i): r = self.b for j in range(self.m): r += self.alpha[j] * self.Y[j] * self.kernel(self.X[i], self.X[j]) return r # kernel function def kernel(self, x1, x2): if self._kernel == 'linear': return sum([x1[k] * x2[k] for k in range(self.n)]) elif self._kernel == 'poly': return (sum([x1[k] * x2[k] for k in range(self.n)]) + 1) ** 2 return 0 # E (x) is the difference between the predicted value of g(x) for input X and y def _E(self, i): return self._g(i) - self.Y[i] def _init_alpha(self): # The outer loop first traverses all sample points satisfying 0 < a < C to check whether KKT is satisfied index_list = [i for i in range(self.m) if 0 < self.alpha[i] < self.C] # Otherwise, traverse the entire training set non_satisfy_list = [i for i in range(self.m) if i not in index_list] index_list.extend(non_satisfy_list) for i in index_list: if self._KKT(i): continue E1 = self.E[i] # If E2 is +, select the smallest; if E2 is negative, select the largest if E1 >= 0: j = min(range(self.m), key=lambda x: self.E[x]) else: j = max(range(self.m), key=lambda x: self.E[x]) return i, j def _compare(self, _alpha, L, H): if _alpha > H: return H elif _alpha < L: return L else: return _alpha def fit(self, features, labels): self.initArgs(features, labels) for i in tqdm(range(self.maxIter)):# train time.sleep(0.2) i1, i2 = self._init_alpha() # boundary if self.Y[i1] == self.Y[i2]: L = max(0, self.alpha[i1] + self.alpha[i2] - self.C) H = min(self.C, self.alpha[i1] + self.alpha[i2]) else: L = max(0, self.alpha[i2] - self.alpha[i1]) H = min(self.C, self.C + self.alpha[i2] - self.alpha[i1]) E1 = self.E[i1] E2 = self.E[i2] # eta=K11+K22-2K12 eta = self.kernel(self.X[i1], self.X[i1]) + self.kernel(self.X[i2], self.X[i2]) - 2 * self.kernel( self.X[i1], self.X[i2]) if eta <= 0: # print('eta <= 0') continue alpha2_new_unc = self.alpha[i2] + self.Y[i2] * (E2 - E1) / eta alpha2_new = self._compare(alpha2_new_unc, L, H) alpha1_new = self.alpha[i1] + self.Y[i1] * self.Y[i2] * (self.alpha[i2] - alpha2_new) b1_new = -E1 - self.Y[i1] * self.kernel(self.X[i1], self.X[i1]) * (alpha1_new - self.alpha[i1]) - self.Y[ i2] * self.kernel(self.X[i2], self.X[i1]) * (alpha2_new - self.alpha[i2]) + self.b b2_new = -E2 - self.Y[i1] * self.kernel(self.X[i1], self.X[i2]) * (alpha1_new - self.alpha[i1]) - self.Y[ i2] * self.kernel(self.X[i2], self.X[i2]) * (alpha2_new - self.alpha[i2]) + self.b if 0 < alpha1_new < self.C: b_new = b1_new elif 0 < alpha2_new < self.C: b_new = b2_new else: # Select midpoint b_new = (b1_new + b2_new) / 2 # Update parameters self.alpha[i1] = alpha1_new self.alpha[i2] = alpha2_new self.b = b_new self.E[i1] = self._E(i1) self.E[i2] = self._E(i2) return 'train done!' def predict(self,data): r = self.b for i in range(self.m): r += self.alpha[i] * self.Y[i] * self.kernel(data, self.X[i]) return 1 if r > 0 else -1 def score(self,X_test,y_test): right_count = 0 for i in range(len(X_test)): result = self.predict(X_test[i]) if result == y_test[i]: right_count += 1 return right_count / len(X_test) def _weight(self): # linear model yx = self.Y.reshape(-1, 1) * self.X self.w = np.dot(yx.T, self.alpha) return self.w data = pd.read_csv('dataset.csv') print(data.label.value_counts()) print("data\n",data) # Data visualization to verify linear separability plt.scatter(data[:50]['sepal length'], data[:50]['sepal width'], label='0') plt.scatter(data[50:100]['sepal length'], data[50:100]['sepal width'], label='1') plt.xlabel('sepal length') plt.ylabel('sepal width') plt.legend() plt.show() plt.savefig("show.png") data = np.array(data.iloc[:100, [0, 1, -1]]) # Training model X_train, X_test, y_train, y_test = train_test_split(data[:,:-1], data[:,-1], test_size=0.25) svm = SVM(maxIter=150) svm.fit(X_train,y_train) result = svm.score(X_test, y_test) print(result)