Section 2: support vector machine SVM, i.e. numpy

Posted by Byron on Sun, 05 Dec 2021 04:29:28 +0100

Article catalog

Support vector machine theory

summary

Support vector machine svm is a binary classification model. Its basic model is the linear classifier with the largest interval defined in the feature space, which makes it different from perceptron; Secondly, the perceptron has a kernel technique, which makes it a nonlinear classifier. The learning strategy of the perceptron is to maximize the interval

  • Support vector machine classification

Classification of support vector machines: linear support vector machine in linearly separable case, linear support vector machine and non-linear support vector machine

Linear separable support vector machine

It is assumed that the input space and the feature space are two different spaces. The input space is Euclidean space or discrete set, and the feature space is Euclidean space or Hilbert space. Linear separable support vector machine and linear support vector machine assume that the elements of the two spaces correspond one to one, and map the input in the input space to the feature vector in the feature space. The goal of learning is to find a separated hyperplane in the feature space, which can divide the instances into different classes. The separated hyperplane corresponds to the equation w · x+b = 0, which is determined by the normal vector w and intercept B and can be expressed by (w,b). The separation hyperplane divides the feature space into two parts, one is positive class and the other is negative class. The normal vector points to a positive class on one side and a negative class on the other. The hyperplane is:

W\cdot x + b = 0

The classification decision function is (linear separable vector machine):

f(x)=sign( W\cdot x + b)
  • [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ray7laem-16384374562)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/ZulEO9.png )]

Function interval: Generally speaking, the distance from the hyperplane can represent the confidence of classification prediction

|wx+b|

It can represent the distance between point x and the hyperplane relatively

wx+b

The consistency between the symbol of and the symbol of y indicates whether the classification is correct

y(wx+b) to represent the correctness and certainty of classification, which is the concept of function interval

. Geometric interval: functional interval can represent the correctness and certainty of classification prediction. However, when choosing the separation hyperplane, only the function interval is not enough. Because as long as we change w and b proportionally, for example, change them to 2w and 2b, the hyperplane does not change, but the function interval becomes twice the original. We can constrain the normal vector of the separation plane, such as normalization

||w||=1

.

d = \frac{w}{||w||}\cdot x_i + \frac{b}{||w||}

Where | w | is the L2 norm of W Interval maximization: the basic idea of support vector machine is to solve the separation hyperplane that can correctly divide the training set and has the largest geometric interval. The separation hyperplane with the largest geometric interval is unique Support vector: in the case of linear separability, the sample points of the training data set are separated from the nearest sample point support vector from the hyperplane

H_1 H_2

All support vectors,

H_1: w\cdot x +b = 1 H_2: w\cdot x +b = -1

among

H_1, H_2

The between is the interval

\frac{2}{||w||}

Is the interval boundary As shown in the figure below: [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-vsfkumso-163843744565)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/n9XEPb.png )]

Linear support vector machine

The support vector machine learning method for linear separable problems is not applicable to linear non separable training data Training dataset:

T={(x_1,y_1),(x_2,y_2),...(x_N,y_N)}

Linear indivisibility is

(x_i,y_i)

In order to solve this problem, a relaxation variable can be introduced for each sample point (x_i,y_i)

\xi_i\geq0

, so that the function interval plus the relaxation variable is greater than or equal to 1.

y_i(w\cdot x_i + b) \geq 1-\xi_i

Basic knowledge of reproduction

Detach hyperplane:

w^Tx+b=0

Distance from point to line:

r=\frac{|w^Tx+b|}{||w||_2} ||w||_2

Is 2-norm:

||w||_2=\sqrt[2]{\sum^m_{i=1}w_i^2}

The straight line is a hyperplane, and the sample can be expressed as:

w^Tx+b\ \geq+1 w^Tx+b\ \leq -1

Function interval:

label(w^Tx+b)\ or\ y_i(w^Tx+b)

Geometric interval:

r=\frac{label(w^Tx+b)}{||w||_2}

, when the data is correctly classified, the geometric interval is the distance from the point to the hyperplane

In order to maximize the geometric interval, the basic problem of SVM can be transformed into solving:(

\frac{r^*}{||w||}

Is a geometric interval(

{r^*}

Is a function interval)

\max\ \frac{r^*}{||w||} (subject\ to)\ y_i({w^T}x_i+{b})\geq {r^*},\ i=1,2,..,m

For the convenience of calculation

\max\ \frac{1}{||w||} s.t.\ y_i({w^T}x_i+{b})\geq {1},\ i=1,2,..,m

Objective function:

\min\ \frac{1}{2}||w||^2+C\sum\xi_i\qquad s.t.\ y_i(w^Tx_i+b)\geq1-\xi_i

The derivation process is as follows:

  • [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG hakhzbid-163843744568)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/28D53F8F-9310-432F-B8F9-2EE3331D77D9.png )] [the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-tt4hgmmh-16384374570)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/764C87BD-59EE-4F1A-BB0F-F19DA1A5F057.png )]
  • [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-4nt8pzch-16384374573)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/F2313EB3-70C6-4282-9C74-908C36D98600.png )]
  • [external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-x1xg8tlm-16384374576)( https://raw.githubusercontent.com/errolyan/tuchuang/master/uPic/2B20B9F8-F861-4AED-9D74-B0A1B7E13325.png )]

numpy reproduction

# -*- coding:utf-8 -*-
# /usr/bin/python

import time
import pandas as pd
import numpy as np
from sklearn.model_selection import  train_test_split
import matplotlib.pyplot as plt
from tqdm import tqdm

class SVM():
    def __init__(self,maxIter,kernel="linear"):
        '''parameter'''
        self.maxIter = maxIter
        self._kernel = kernel

    def initArgs(self,features,labels):
        '''Initialization parameters'''
        self.m, self.n = features.shape
        self.X = features
        self.Y = labels
        self.b = 0.0

        # Save Ei in a list
        self.alpha = np.ones(self.m)
        self.E = [self._E(i) for i in range(self.m)]
        # Relaxation variable
        self.C = 1.0

    def _KKT(self,i):
        y_g = self._g(i) * self.Y[i]
        if self.alpha[i] == 0:
            return y_g >= 1
        elif 0 < self.alpha[i] < self.C:
            return y_g == 1
        else:
            return y_g <= 1

    # g(x) predicted value, enter xi (X[i])
    def _g(self, i):
        r = self.b
        for j in range(self.m):
            r += self.alpha[j] * self.Y[j] * self.kernel(self.X[i], self.X[j])
        return r

    # kernel function
    def kernel(self, x1, x2):
        if self._kernel == 'linear':
            return sum([x1[k] * x2[k] for k in range(self.n)])
        elif self._kernel == 'poly':
            return (sum([x1[k] * x2[k] for k in range(self.n)]) + 1) ** 2

        return 0

    # E (x) is the difference between the predicted value of g(x) for input X and y
    def _E(self, i):
        return self._g(i) - self.Y[i]

    def _init_alpha(self):
        # The outer loop first traverses all sample points satisfying 0 < a < C to check whether KKT is satisfied
        index_list = [i for i in range(self.m) if 0 < self.alpha[i] < self.C]
        # Otherwise, traverse the entire training set
        non_satisfy_list = [i for i in range(self.m) if i not in index_list]
        index_list.extend(non_satisfy_list)

        for i in index_list:
            if self._KKT(i):
                continue

            E1 = self.E[i]
            # If E2 is +, select the smallest; if E2 is negative, select the largest
            if E1 >= 0:
                j = min(range(self.m), key=lambda x: self.E[x])
            else:
                j = max(range(self.m), key=lambda x: self.E[x])
            return i, j

    def _compare(self, _alpha, L, H):
        if _alpha > H:
            return H
        elif _alpha < L:
            return L
        else:
            return _alpha

    def fit(self, features, labels):
        self.initArgs(features, labels)

        for i in tqdm(range(self.maxIter)):# train
            time.sleep(0.2)
            i1, i2 = self._init_alpha()

            # boundary
            if self.Y[i1] == self.Y[i2]:
                L = max(0, self.alpha[i1] + self.alpha[i2] - self.C)
                H = min(self.C, self.alpha[i1] + self.alpha[i2])
            else:
                L = max(0, self.alpha[i2] - self.alpha[i1])
                H = min(self.C, self.C + self.alpha[i2] - self.alpha[i1])
            E1 = self.E[i1]
            E2 = self.E[i2]
            # eta=K11+K22-2K12
            eta = self.kernel(self.X[i1], self.X[i1]) + self.kernel(self.X[i2], self.X[i2]) - 2 * self.kernel(
                self.X[i1], self.X[i2])
            if eta <= 0:
                # print('eta <= 0')
                continue

            alpha2_new_unc = self.alpha[i2] + self.Y[i2] * (E2 - E1) / eta
            alpha2_new = self._compare(alpha2_new_unc, L, H)

            alpha1_new = self.alpha[i1] + self.Y[i1] * self.Y[i2] * (self.alpha[i2] - alpha2_new)

            b1_new = -E1 - self.Y[i1] * self.kernel(self.X[i1], self.X[i1]) * (alpha1_new - self.alpha[i1]) - self.Y[
                i2] * self.kernel(self.X[i2], self.X[i1]) * (alpha2_new - self.alpha[i2]) + self.b
            b2_new = -E2 - self.Y[i1] * self.kernel(self.X[i1], self.X[i2]) * (alpha1_new - self.alpha[i1]) - self.Y[
                i2] * self.kernel(self.X[i2], self.X[i2]) * (alpha2_new - self.alpha[i2]) + self.b

            if 0 < alpha1_new < self.C:
                b_new = b1_new
            elif 0 < alpha2_new < self.C:
                b_new = b2_new
            else:
                # Select midpoint
                b_new = (b1_new + b2_new) / 2

            # Update parameters
            self.alpha[i1] = alpha1_new
            self.alpha[i2] = alpha2_new
            self.b = b_new

            self.E[i1] = self._E(i1)
            self.E[i2] = self._E(i2)
        return 'train done!'

    def predict(self,data):
        r = self.b
        for i in range(self.m):
            r += self.alpha[i] * self.Y[i] * self.kernel(data, self.X[i])
        return 1 if r > 0 else -1

    def score(self,X_test,y_test):
        right_count = 0
        for i in range(len(X_test)):
            result = self.predict(X_test[i])
            if result == y_test[i]:
                right_count += 1
        return right_count / len(X_test)

    def _weight(self):
        # linear model
        yx = self.Y.reshape(-1, 1) * self.X
        self.w = np.dot(yx.T, self.alpha)
        return self.w

data = pd.read_csv('dataset.csv')
print(data.label.value_counts())
print("data\n",data)

# Data visualization to verify linear separability
plt.scatter(data[:50]['sepal length'], data[:50]['sepal width'], label='0')
plt.scatter(data[50:100]['sepal length'], data[50:100]['sepal width'], label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()
plt.show()
plt.savefig("show.png")

data = np.array(data.iloc[:100, [0, 1, -1]])

# Training model
X_train, X_test, y_train, y_test = train_test_split(data[:,:-1], data[:,-1], test_size=0.25)
svm = SVM(maxIter=150)
svm.fit(X_train,y_train)
result = svm.score(X_test, y_test)
print(result)