Machine Learning in Action reading notes
Chapter 6 support vector machine
1, Professional vocabulary in this chapter
- Support Vector Machines (SVM): it is a ready-made classifier, "ready-made" means that the classifier can be used directly without modification. SVM can make good classification decisions for data points outside the training set.
- Sequential minimum optimization (SMO) algorithm: an algorithm to realize SVM.
- Kernel: the kernel function can be used to extend SVM to more data sets
- linearly separable data: the two groups of data have been separated enough, and it is easy to draw a straight line to separate the two groups of data points.
- Separating hyperplane: a straight line separating linearly separable data sets is called a separating hyperplane.
- hyperplane: the decision boundary of classification. The farther the data point is from the decision boundary, the more reliable the final prediction result will be.
- margin: the distance from the point to the dividing surface is called the interval. We want the interval to be as large as possible.
- Support vector: the points closest to the separation hyperplane. We need to try to maximize the distance from the support vector to the separation surface.
- slack variable: allows some data points to be on the wrong side of the split face.
- Array filtering: array filtering is only useful for Numpy types, not for regular list s in python. For example, print (alpha [alpha > 0])
- kernel: it can convert the data into a form easy for the classifier to understand
- radial basis function: one of kernel functions
- inner product (also known as dot product): refers to the operation of multiplying two vectors to obtain a single scalar or numerical value.
- Kernel trick: the method of replacing inner product with kernel function is called kernel trick or kernel substation
2, Support vector machine
- Advantages: low generalization error rate, low computational overhead and easy interpretation of results.
- Disadvantages: it is sensitive to parameter adjustment and kernel function selection. The original classifier is only suitable for class II problems without modification.
- Applicable data types: numerical and nominal data
1. Optimization of classifier
Classification, using function pairs similar to the haverside step function (i.e. unit step function):
w
T
x
+
b
w^Tx+b
wTx+b
The effect is:
f
(
u
)
,
u
=
w
T
x
+
b
f(u) ,u=w^Tx+b
f(u),u=wTx+b
Where, when U < 0, f(u) outputs - 1, otherwise it outputs + 1.
When calculating the distance from the data point to the separation surface and determining the placement position of the separation surface, the calculation interval passes through:
l
a
b
e
l
∗
(
w
T
x
+
b
)
label*(w^Tx+b)
label∗(wTx+b)
To calculate. Then the benefits of - 1 and + 1 can be reflected, If the data point is directly above (i.e. + 1) and far away from the separation hyperplane, u will be a large positive number, and label * u will also be a large positive number. If the data point is in the negative direction (- 1) and far away from the separation hyperplane, label * u will still be a large positive number because the category label is - 1.
The goal now is to find w and b in the classifier definition. Therefore, we must find the data points with the minimum interval, that is, find the support vector. Once the data point with the minimum interval is found, we need to maximize the interval. We can also write:
a
r
g
m
a
x
w
,
b
{
m
i
n
n
(
l
a
b
e
l
⋅
(
w
T
x
+
b
)
)
⋅
1
∣
∣
w
∣
∣
}
arg\ \mathop{max}\limits_{w,b}\ \{\mathop{\ min}\limits_{n}(label·(w^Tx+b))·\frac{1}{||w||}\}
arg w,bmax {n min(label⋅(wTx+b))⋅∣∣w∣∣1}
Since optimizing the product is a very annoying thing, what we have to do is fix one factor and maximize the other factors. If u is all 1, you can find:
1
∣
∣
w
∣
∣
\frac{1}{||w||}
∣∣w∣∣1
To get the final solution. However, not all data points have u equal to 1. Only those closest to the hyperplane get a value of 1
In the above optimization problem, some constraints are given, and then the optimal value (fixed a factor) is obtained. Therefore, the problem is an optimization problem with constraints. The constraints here are:
l
a
b
e
l
∗
(
w
T
x
+
b
)
>
=
1.0
label*(w^Tx+b)>=1.0
label∗(wTx+b)>=1.0
For this kind of optimization problem, there is a very famous solution method, Lagrange multiplier method.
Since the constraints here are based on data points, we can write the hyperplane in the form of data points:
m
a
x
α
[
∑
i
=
1
m
α
−
1
2
∑
i
,
j
=
1
m
l
a
b
e
l
(
i
)
⋅
l
a
b
e
l
(
j
)
⋅
α
i
⋅
α
j
<
x
(
i
)
,
x
(
j
)
>
]
\mathop{max}\limits_{α}\ [\ \sum_{i=1}^mα-\frac{1}{2}\sum_{i,j=1}^mlabel^{(i)}·label^{(j)}·α_i·α_j<x^{(i)},x^{(j)}>\ ]
αmax [ i=1∑mα−21i,j=1∑mlabel(i)⋅label(j)⋅αi⋅αj<x(i),x(j)> ]
Angle brackets represent the inner product of two vectors. The constraints become:
α
>
=
0
,
and
∑
i
−
1
m
α
i
⋅
l
a
b
e
l
(
i
)
=
0
α>= 0, and \ sum_{i-1}^m α_ i·label^{(i)}=0
α>= 0, and I − 1 Σ m α i⋅label(i)=0
The relaxation variable allows some data points to be on the wrong side of the separation surface, and the optimization objective remains unchanged, but the constraints at this time become:
C
>
=
α
>
=
0
,
and
∑
i
−
1
m
α
i
⋅
l
a
b
e
l
(
i
)
=
0
C>= α>= 0, and \ sum_{i-1}^m α_ i·label^{(i)}=0
C>= α>= 0, and I − 1 Σ m α i⋅label(i)=0
The constant C here is used to control the two target weights of "maximize interval" and "ensure that the function interval of most points is less than 1.0". Constant C is a parameter. We can get different results by adjusting this parameter. Once you find all α, Then the separation hyperplane can be expressed by these alpha.
The main work in SVM is to solve these alpha.
2. General process of SVM
- Collect data: any method can be used
- Prepare data: numerical data is required
- Analyze data: helps to visually separate hyperplanes
- Training algorithm: most of the time of SVM comes from training, which mainly realizes the optimization of two parameters
- Test algorithm: a very simple calculation process can be realized
- Using algorithm: almost all classification problems can use SVM. SVM itself is a class II classifier. The application of SVM to multi class problems requires some modifications to the code
3, SMO efficient optimization algorithm
In 1996, John Platt released a powerful algorithm called SMO for training SVM. Paltt's SMO algorithm decomposes the large optimization problem into several small optimization problems.
The goal of SMO algorithm is to find a series of alpha and b. once these alpha are found, it is easy to calculate the weight vector w and get the separated hyperplane.
The working principle of SMO algorithm is: select two alpha in each cycle for optimization. Once a pair of appropriate alpha is found, increase one and reduce the other. "Appropriate" means that two conditions are met:
- These two alpha must be outside the interval boundary
- These two alpha s have not been interval processed or are not on the boundary.
1. Apply the simplified SMO algorithm to deal with small-scale data sets
The outer loop in Platt SMO algorithm determines the best alpha pair to be optimized, and the simplified version will skip this part. The process is as follows:
- First, traverse each alpha on the dataset
- Then another alpha is randomly selected from the remaining alpha set to build an alpha pair
We need to change both alpha's at the same time, because we need to satisfy the constraints:
∑
α
i
⋅
l
a
b
e
l
(
i
)
=
0
\sumα_i·label^{(i)}=0
∑αi⋅label(i)=0
Changing only one alpha may invalidate the constraint, so we always change two alpha at the same time.
To this end, we will build an auxiliary function to randomly select an integer in an interval. At the same time, another auxiliary function is needed to adjust it when the value is too large.
def loadDataSet(filename): dataMat = []; labelMat = [] fr = open(filename) for line in fr.readlines(): lineArr = line.strip().split('\t') dataMat.append([float(lineArr[0]), float(lineArr[1])]) labelMat.append(float(lineArr[2])) return dataMat, labelMat # Auxiliary function 1: randomly select an integer within a certain range def selectJrand(i, m): # i is the subscript of the first alpha, and m is the number of all alpha j = i while(j == i): j = int(random.uniform(0, m)) return j # Auxiliary function 2: used to adjust the value when it is too large def clipAlpha(aj, H, L): # Used to adjust alpha values greater than H or less than L if aj > H: aj = H if L > aj: aj = L return aj
2. Simplified SMO algorithm
The pseudo code is as follows:
Create a alpha Vector and initialize it to a 0 vector When the number of iterations is less than the maximum number of iterations (outer loop): For each data vector in the dataset (inner loop): If the data vector can be optimized: Select another data vector at random Optimize both vectors at the same time If neither vector can be optimized, exit the inner loop If all vectors are not optimized, increase the number of iterations and continue the next cycle
Simplified SMO algorithm:
'''Simplified sequence minimum optimization algorithm( SMO)''' def smoSimple(dataMatIn, classLabels, C, toler, maxIter): # Dataset, classification label, constant C, fault tolerance, maximum number of cycles before exiting dataMatrix = mat(dataMatIn) labelMat = mat(classLabels).transpose()# After transposing, each row element of the category label vector corresponds to the row in the data matrix one by one b = 0; m, n = shape(dataMatrix) alphas = mat(zeros((m, 1))) # Initialize all alpha to 0 iter = 0 # This variable stores the number of times the dataset was traversed without any alpha changes # Only after maxIter is traversed on all data sets without any alpha modification will the program stop and push out the while loop while(iter < maxIter): alphaPairsChanged = 0 # Each cycle should be set to 0 to record whether the alpha has been optimized for i in range(m): # fXi is the category we predict fXi = float(multiply(alphas, labelMat).T * \ (dataMatrix * dataMatrix[i,:].T)) + b Ei = fXi - float(labelMat[i]) #Error between predicted results and real results # If the error is large, optimize the alpha value corresponding to the data instance (judge the positive and negative intervals, and ensure that the alpha value is not equal to 0 or C) #Since the following alpha is adjusted to 0 or C when it is less than 0 or greater than C, once they are equal to these two values in the if statement, they are already on the "boundary" and can no longer be reduced or increased, so it is not worth optimizing them if((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or \ ((labelMat[i]*Ei > toler) and (alphas[i] > 0)):# Constraint C >= α >= 0 j = selectJrand(i, m) # Use the auxiliary function to randomly select the second alpha value fXj = float(multiply(alphas, labelMat).T * \ (dataMatrix * dataMatrix[j, :].T)) + b Ej = fXj - float(labelMat[j]) # The error is also calculated for the second alpha value alphaIold = alphas[i].copy() # Copy the results for the following comparison alphaJold = alphas[j].copy() if(labelMat[i] != labelMat[j]): # Ensure that alpha is between 0 and C, and adjust alpha[j] between 0 and C L = max(0, alphas[j] - alphas[i]) H = min(C, C + alphas[j] - alphas[i]) else: L = max(0, alphas[j] + alphas[i] - C) H = min(C, alphas[j] + alphas[i]) if L == H: print('L==H')# Output here means that the alpha pair has not changed successfully 1 continue # eta is the optimal modification of alpha[j] eta = 2.0 * dataMatrix[i, :] * dataMatrix[j, :].T - \ dataMatrix[i, :] * dataMatrix[i, :].T - \ dataMatrix[j, :] * dataMatrix[j, :].T if eta >= 0: print('eta>=0') # Output here means that the alpha pair has not changed successfully 2 continue alphas[j] -= labelMat[j]*(Ei - Ej)/eta alphas[j] = clipAlpha(alphas[j], H, L) # The second auxiliary function is called. When the alpha [J] value is too large, adjust it to between L and H, including the boundary if (abs(alphas[j] - alphaJold) < 0.00001): # Check alpha[j] for minor changes print('j not moving enough') # Output here means that the alpha pair has not changed successfully 3 continue alphas[i] += labelMat[j]*labelMat[i]*(alphaJold - alphas[j]) # alpha[i] and alpha[j] are the same. The change is the same in size, but in the opposite direction (one increases and the other decreases) # After optimizing alpha[i] and alpha[j], set a constant term b for these two alpha values b1 = b - Ei - labelMat[i]*(alphas[i]-alphaIold) * \ dataMatrix[i,:] * dataMatrix[i,:].T - \ labelMat[j]*(alphas[j]-alphaJold) * \ dataMatrix[i, :] * dataMatrix[j, :].T b2 = b - Ej - labelMat[i]*(alphas[i]-alphaIold) * \ dataMatrix[i,:] * dataMatrix[j,:].T - \ labelMat[j]*(alphas[j]-alphaJold) * \ dataMatrix[j, :] * dataMatrix[j, :].T if (0 < alphas[i]) and (C > alphas[i]): # Judge whether constraints are met b = b1 elif (0 < alphas[j]) and (C > alphas[j]): b = b2 else: b = (b1 + b2)/2.0 alphaPairsChanged += 1 # If the program does not execute the continue statement until the last line of the for loop, it indicates that a pair of alpha has been successfully changed and the value of alphapairschaved has been increased print('iter: %d i: %d, pairs changed: %d' % (iter, i, alphaPairsChanged)) if alphaPairsChanged == 0: # Check whether the value of alpha has been updated iter += 1 else: iter = 0 # If there is an update, set iter to 0 and continue running the program print('iteration number: %d' % iter) return b, alphas
According to the obtained alpha value, you can:
- Number of support vectors obtained:
print(alphas[alphas > 0]) #[[0.14742212 0.17251816 0.04511926 0.36505954]]
- Number of output support vectors
print(shape(alphas[alphas > 0])) # (1, 4)
- What are the output support vectors
for i in range(100): if alphas[i]>0.0: print(dataArr[i], labelArr[i]) ''' [4.658191, 3.507396] -1.0 [3.457096, -0.082216] -1.0 [5.286862, -2.358286] 1.0 [6.080573, 0.418886] 1.03 '''
However, using the simplified SMO algorithm to obtain the support vector has a long convergence time. The running speed can be accelerated by constructing a complete SMO algorithm.
3. Full SMO algorithm
In the optimization process of the full version of SMO algorithm, the only difference is the way to select alpha. The full version of Platt SMO algorithm applies some heuristic methods that can speed up.
Platt SMO algorithm selects the first alpha value through an outer loop, and the selection process will alternate between the two methods:
- One way is to perform a single pass scan on all data sets
- Another way is to realize single pass scanning in non boundary alpha, which refers to those alpha values that are not equal to boundary 0 or C.
When scanning non boundary alpha values, you first need to create a list of these alpha values, and then traverse the table. At the same time, this step will skip those known alpha values that will not change.
After selecting the first alpha value, the algorithm will select the second alpha value through inner loop. In the optimization process, the second alpha value will be obtained by maximizing the step size.
In the simplified SMO algorithm, we will calculate the error rate Ej after selecting j, but in the full SMO algorithm, we will establish a global cache to save the error value, and select the alpha value that maximizes the step size or EI Ej.
Support functions for the full Platt SMO algorithm:
'''Full version Platt SMO algorithm''' # Supporting functions of the full version of SMO: including a data structure for cleaning up code and three auxiliary functions for caching E class optStruct: # Build a class that uses objects only as a data structure def __init__(self, dataMatIn, classLabels, C, toler): self.X = dataMatIn self.labelMat = classLabels self.C = C self.tol = toler self.m = shape(dataMatIn)[0] self.alphas = mat(zeros((self.m, 1))) self.b = 0 self.eCache = mat(zeros((self.m, 2))) # Error buffer m*2 matrix. The first column of eCache gives the flag bit of whether eCache is valid (valid means it has been calculated), and the second column gives the actual E value def calcEk(oS, k): # Calculate the error value and return it. Because it is used frequently, it is carried out separately as a function fXk = float(multiply(oS.alphas, oS.labelMat).T * \ (oS.X * oS.X[k, :].T)) + oS.b Ek = fXk - float(oS.labelMat[k]) return Ek def selectJ(i, oS, Ei): # The heuristic method in the inner loop is used to select the second alpha and select the appropriate alpha value to ensure the maximum step size in each optimization maxK = -1; maxDeltaE = 0; Ej = 0 oS.eCache[i] = [1, Ei] validEcacheList = nonzero(oS.eCache[:, 0].A)[0] # Build a non-zero table if len(validEcacheList) > 1: for k in validEcacheList: if k == i: continue Ek = calcEk(oS, k) deltaE = abs(Ei - Ek) if deltaE > maxDeltaE: # Select the j with the largest step size maxK = k maxDeltaE = deltaE Ej = Ek return maxK, Ej else: j = selectJrand(i, oS.m) Ej = calcEk(oS, j) return j, Ej def updateEk(oS, k): # Calculate the error value and store it in the cache Ek = calcEk(oS, k) oS.eCache[k] = [1, Ek]
# Optimization routines in the complete Platt SMO algorithm def innerL(i, oS): Ei = calcEk(oS, i) if ((oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or \ ((oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)): j, Ej = selectJ(i, oS, Ei) # The second alpha selection heuristic alphaIold = oS.alphas[i].copy() alphaJold = oS.alphas[j].copy() if (oS.labelMat[i] != oS.labelMat[j]): L = max(0, oS.alphas[j] - oS.alphas[i]) H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) else: L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) H = min(oS.C, oS.alphas[j] + oS.alphas[i]) if L == H: print('L==H') return 0 eta = 2.0 * oS.X[i, :] * oS.X[j, :].T - \ oS.X[i, :] * oS.X[i, :].T - \ oS.X[j, :] * oS.X[j, :].T if eta >= 0: print('eta>=0') return 0 oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta oS.alphas[j] = clipAlpha(oS.alphas[j], H, L) updateEk(oS, j) # Add content to the full version and update the error cache if (abs(oS.alphas[j] - alphaJold) < 0.00001): # Check alpha[j] for minor changes print('j not moving enough') return 0 oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (\ alphaJold - oS.alphas[j]) # alpha[i] and alpha[j] are the same. The change is the same in size, but in the opposite direction (one increases and the other decreases) updateEk(oS, i) # Add content to the full version and update the error cache # After optimizing alpha[i] and alpha[j], set a constant term b for these two alpha values b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * \ oS.X[i, :] * oS.X[i, :].T - \ oS.labelMat[j] * (oS.alphas[j] - alphaJold) * \ oS.X[i, :] * oS.X[j, :].T b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * \ oS.X[i, :] * oS.X[j, :].T - \ oS.labelMat[j] * (oS.alphas[j] - alphaJold) * \ oS.X[j, :] * oS.X[j, :].T if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1 elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2 else: oS.b = (b1 + b2) / 2.0 return 1 else: return 0
The innerL() function is the same as the smoSimple() function, but there are some changes:
- It uses its own data structure, which is passed in the parameter oS
- Selecting the value of the second alpha uses the selectJ() function instead of the selectJrand() function
- Update Ecache when the alpha value changes
Next, package the above process together, which is the outer loop of selecting the first alpha value
# Package the above code together and select the outer loop of the first alpha value def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup=('lin', 0)): oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler) # Use your own data structure iter = 0 entireSet = True; alphaPairsChanged = 0 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): alphaPairsChanged = 0 if entireSet: # Traverse all values for i in range(oS.m): alphaPairsChanged += innerL(i, oS) # Select the second alpha value through innerL print('fullSet, iter:%d i:%d, pairs changed %d' % (iter, i, alphaPairsChanged)) iter += 1 else: # Traversal of non boundary values nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0] for i in nonBoundIs: alphaPairsChanged += innerL(i, oS) print('non-bound, iter: %d i:%d, pairs changed %d' % (iter, i, alphaPairsChanged)) iter += 1 if entireSet: entireSet = False elif (alphaPairsChanged == 0): entireSet = True print('iteration number: %d' % iter) return oS.b, oS.alphas
In the above code, on the one hand, the constant C should ensure that the interval of all samples is not less than 1.0, on the other hand, the classification interval should be as large as possible, and the two aspects should be balanced.
We spent a lot of time to calculate the alpha value above, but how to use the alpha value for classification?
- First, the hyperplane must be calculated based on the alpha value, which also includes the calculation of w
'''calculation w Value of''' def calcWs(alphas, dataArr, classLabels): X = mat(dataArr) labelMat = mat(classLabels).transpose() m, n = shape(X) w = zeros((n, 1)) for i in range(m): w += multiply(alphas[i]*labelMat[i], X[i, :].T) return w
Classifier classification result test:
# Complete SMO algorithm test b, alphas = smoP(dataArr, labelArr, 0.6, 0.001, 40) # print(b) #[[-2.89901748]] # print(alphas[alphas > 0]) #[[0.06961952 0.0169055 0.0169055 0.0272699 0.04522972 0.0272699 0.0243898 0.06140181 0.06140181]] ws = calcWs(alphas, dataArr,labelArr) # print(ws) datMat = mat(dataArr) if (datMat[0] * mat(ws) + b) > 0: print('Belongs to category 1') else: print('belong to-1 class') # Confirm whether the classification is correct: print(labelArr[0]) '''The above output results belong to-1 class -1.0 '''
4, Application of kernel function on complex data
1. Use kernel function to map data to high-dimensional space
The mapping from one feature space to another will map the low-dimensional feature space to the high-dimensional space, and solve the linear problem in the high-dimensional space, which is equivalent to solving the nonlinear problem in the low-dimensional space. This mapping is realized by kernel function. You can think of a kernel function as a wrapper or interface that can convert data from a difficult form to another easier form.
A particularly good thing about SVM optimization is that all operations can be written in the form of inner product, also known as point product.
2. Radial basis kernel function
Radial basis kernel function is a commonly used kernel function to measure the distance between two vectors. Gaussian version of radial basis kernel function, the specific formula is as follows:
k
(
x
,
y
)
=
e
x
p
(
−
∣
∣
x
−
y
∣
∣
2
2
σ
2
)
k(x,y)=exp(\frac{-||x-y||^2}{2σ^2})
k(x,y)=exp(2σ2−∣∣x−y∣∣2)
σ It is a user-defined speed parameter used to determine the arrival rate (reach), or the function value drops to 0.
Kernel transformation function:
'''Kernel conversion function''' def kernelTrans(X, A, kTup): # Function call: kerneleval = kerneltrans (SVS, datmat [I,:], ('rbf ', K1)) SVS is the support vector m, n = shape(X) K = mat(zeros((m, 1))) if kTup[0] == 'lin': # For determining the type of kernel function, two options are given here, which can also be extended K = X * A.T elif kTup[0] == 'rbf':# Radial basis function for j in range(m): deltaRow = X[j,:] - A K[j] = deltaRow * deltaRow.T K = exp(K/(-1*kTup[1]**2)) else: # If you encounter a tuple that cannot be recognized, throw an exception and stop the program raise NameError('Houston We Have a Problem -- \ That Kernel is not recognized') return K # Return k vector
'''Using kernel functions requires innerL()and calcEk()Function. The modified function is as follows:''' def innerL(i, oS): Ei = calcEk(oS, i) if ((oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or \ ((oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)): j, Ej = selectJ(i, oS, Ei) alphaIold = oS.alphas[i].copy() alphaJold = oS.alphas[j].copy() if (oS.labelMat[i] != oS.labelMat[j]): L = max(0, oS.alphas[j] - oS.alphas[i]) H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) else: L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) H = min(oS.C, oS.alphas[j] + oS.alphas[i]) if L == H: print('L==H') return 0 eta = 2.0 * oS.K[i, j] * oS.K[i, i] - oS.K[j, j] # Modified here 1 if eta >= 0: print('eta>=0') return 0 oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta oS.alphas[j] = clipAlpha(oS.alphas[j], H, L) updateEk(oS, j) if (abs(oS.alphas[j] - alphaJold) < 0.00001): # Check alpha[j] for minor changes print('j not moving enough') return 0 oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (\ alphaJold - oS.alphas[j]) # alpha[i] and alpha[j] are the same. The change is the same in size, but in the opposite direction (one increases and the other decreases) updateEk(oS, i) # yes alpha[i]and alpha[j]After optimization, give these two alpha Value sets a constant item b # Amendment 2 b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, i] - \ oS.labelMat[j] * (oS.alphas[j] - alphaJold) * oS.K[i, j] b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, j] - \ oS.labelMat[j] * (oS.alphas[j] - alphaJold) * oS.K[j, j] if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1 elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2 else: oS.b = (b1 + b2) / 2.0 return 1 else: return 0 def calcEk(oS, k): fXk = float(multiply(oS.alphas, oS.labelMat).T * \ oS.K[:, k] + oS.b) Ek = fXk - float(oS.labelMat[k]) return Ek
Use kernel functions in tests:
'''Using kernel functions in testing''' def testRbf(k1=1.3): # The input parameter is a user-defined variable in the Gaussian radial basis function dataArr, labelArr = loadDataSet('testSet.txt') b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1)) datMat = mat(dataArr) labelMat = mat(labelArr).transpose() svInd = nonzero(alphas.A>0)[0] # alphas. A> 0 returns True or False; nonzero: returns a two-dimensional array. The first array describes the index value of non-zero elements from the row dimension, and the second array describes the index value from the column dimension # print('svInd:', svInd) #svInd: [17 39 55 57 64 70 74 94] alpha is a 100 * 1-dimensional matrix with only 0 columns, so the number of rows of non-zero alpha is returned sVs = datMat[svInd] # print('sVs:', sVs) #sVs: [[ 4.658191 3.507396]..... # Returns the element characteristic value of the row with non-zero alpha labelSV = labelMat[svInd] # print('labelSV:', labelSV) # labelSV: [[-1.]...... # Returns the category of the support vector print('there are %d Support Vectors' % shape(sVs)[0]) # Number of output support vectors m, n = shape(datMat) errorCount = 0 for i in range(m): # The following two sentences are the key statements for classification by kernel function, and give how to classify by kernel function kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1)) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print('the training error rate is: %f' % (float(errorCount)/m)) dataArr, labelArr = loadDataSet('testSetRBF2.txt') errorCount = 0 datMat = mat(dataArr) labelMat = mat(labelArr).transpose() m, n = shape(datMat) for i in range(m): # The following two sentences are the key statements for classification using kernel functions kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1)) # sVs: features of support vector # print('kernelEval:', kernelEval) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print('the test error rate is :%f' % (float(errorCount)/m))
In the above code: first, the program reads the data set from the file, and then runs the Platt SMO algorithm on the data set, where the type of kernel function is' rbf '.
Finally, the test error rate is output. Train the error rate and the number of support vectors, and observe how they change with the change of k1.
Through modification, it is found that if the σ The training error rate will decrease, but the test error rate will increase.
The advantage of SVM is that it can classify data efficiently. If there are too few support vectors, a poor decision boundary may be obtained; If there are too many support vectors, it is equivalent to using the whole data set for classification every time. This kind of method is called k-nearest neighbor.
5, Example: handwriting recognition problem
See the complete code for the code.
6, Complete code
import random from numpy import * def loadDataSet(filename): dataMat = []; labelMat = [] fr = open(filename) for line in fr.readlines(): lineArr = line.strip().split('\t') dataMat.append([float(lineArr[0]), float(lineArr[1])]) labelMat.append(float(lineArr[2])) return dataMat, labelMat # Select any integer within a certain range def selectJrand(i, m): # i is the subscript of the first alpha, and m is the number of all alpha j = i while(j == i): j = int(random.uniform(0, m)) return j # Used to adjust the value when it is too large def clipAlpha(aj, H, L): # Used to adjust alpha values greater than H or less than L if aj > H: aj = H if L > aj: aj = L return aj '''Simplified sequence minimum optimization algorithm( SMO)''' def smoSimple(dataMatIn, classLabels, C, toler, maxIter): # Dataset, classification label, constant C, fault tolerance, maximum number of cycles before exiting dataMatrix = mat(dataMatIn) labelMat = mat(classLabels).transpose() b = 0; m, n = shape(dataMatrix) alphas = mat(zeros((m, 1))) iter = 0 # This variable stores the number of times the dataset was traversed without any alpha changes # Only after maxIter is traversed on all data sets without any alpha modification will the program stop and push out the while loop while(iter < maxIter): alphaPairsChanged = 0 # Each cycle should be set to 0 to record whether the alpha has been optimized for i in range(m): # fXi is the category we predict fXi = float(multiply(alphas, labelMat).T * \ (dataMatrix * dataMatrix[i,:].T)) + b Ei = fXi - float(labelMat[i]) #Error between predicted results and real results # If the error is large, optimize the alpha value corresponding to the data instance (judge the positive and negative intervals, and ensure that the alpha value is not equal to 0 or C) #Since the following alpha is adjusted to 0 or C when it is less than 0 or greater than C, once they are equal to these two values in the if statement, they are already on the "boundary" and can no longer be reduced or increased, so it is not worth optimizing them if((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or \ ((labelMat[i]*Ei > toler) and (alphas[i] > 0)): # Constraint C >= α >= 0 j = selectJrand(i, m) # Use the auxiliary function to randomly select the second alpha value fXj = float(multiply(alphas, labelMat).T * \ (dataMatrix * dataMatrix[j, :].T)) + b Ej = fXj - float(labelMat[j]) # The error is also calculated for the second alpha value alphaIold = alphas[i].copy() # Copy the results for the following comparison alphaJold = alphas[j].copy() if(labelMat[i] != labelMat[j]): # Ensure that alpha is between 0 and C, and adjust alpha[j] between 0 and C L = max(0, alphas[j] - alphas[i]) H = min(C, C + alphas[j] - alphas[i]) else: L = max(0, alphas[j] + alphas[i] - C) H = min(C, alphas[j] + alphas[i]) if L == H: print('L==H') # Output here means that the alpha pair has not changed successfully continue # eta is the optimal modification of alpha[j] eta = 2.0 * dataMatrix[i, :] * dataMatrix[j, :].T - \ dataMatrix[i, :] * dataMatrix[i, :].T - \ dataMatrix[j, :] * dataMatrix[j, :].T if eta >= 0: print('eta>=0') continue alphas[j] -= labelMat[j]*(Ei - Ej)/eta alphas[j] = clipAlpha(alphas[j], H, L) if (abs(alphas[j] - alphaJold) < 0.00001): # Check alpha[j] for minor changes print('j not moving enough') continue alphas[i] += labelMat[j]*labelMat[i]*(alphaJold - alphas[j]) # alpha[i] and alpha[j] are the same. The change is the same in size, but in the opposite direction (one increases and the other decreases) # After optimizing alpha[i] and alpha[j], set a constant term b for these two alpha values b1 = b - Ei - labelMat[i]*(alphas[i]-alphaIold) * \ dataMatrix[i,:] * dataMatrix[i,:].T - \ labelMat[j]*(alphas[j]-alphaJold) * \ dataMatrix[i, :] * dataMatrix[j, :].T b2 = b - Ej - labelMat[i]*(alphas[i]-alphaIold) * \ dataMatrix[i,:] * dataMatrix[j,:].T - \ labelMat[j]*(alphas[j]-alphaJold) * \ dataMatrix[j, :] * dataMatrix[j, :].T if (0 < alphas[i]) and (C > alphas[i]): b = b1 elif (0 < alphas[j]) and (C > alphas[j]): b = b2 else: b = (b1 + b2)/2.0 alphaPairsChanged += 1 # If the program does not execute the continue statement until the last line of the for loop, it indicates that a pair of alpha has been successfully changed and the value of alphapairschaved has been increased print('iter: %d i: %d, pairs changed: %d' % (iter, i, alphaPairsChanged)) if alphaPairsChanged == 0: # Check whether the value of alpha has been updated iter += 1 else: iter = 0 # If there is an update, set iter to 0 and continue running the program print('iteration number: %d' % iter) return b, alphas '''Full version Platt SMO algorithm''' # Supporting functions of the full version of SMO: including a data structure for cleaning up code and three auxiliary functions for caching E class optStruct: def __init__(self, dataMatIn, classLabels, C, toler, kTup): #Use kernel function to add kTup: tuple containing kernel function information self.X = dataMatIn self.labelMat = classLabels self.C = C self.tol = toler self.m = shape(dataMatIn)[0] self.alphas = mat(zeros((self.m, 1))) self.b = 0 self.eCache = mat(zeros((self.m, 2))) '''New content added using kernel function''' self.K = mat(zeros((self.m, self.m))) for i in range(self.m): # X is a string describing the type of kernel function used, and the other two parameters are optional parameters that may be required by the kernel function self.K[:, i] = kernelTrans(self.X, self.X[i, :], kTup) # The global matrix k value only needs to be calculated once. When you want to use the kernel function, you can call it, which saves a lot of redundant computing overhead # def calcEk(oS, k): # fXk = float(multiply(oS.alphas, oS.labelMat).T * \ # (oS.X * oS.X[k, :].T)) + oS.b # Ek = fXk - float(oS.labelMat[k]) # return Ek def selectJ(i, oS, Ei): maxK = -1; maxDeltaE = 0; Ej = 0 oS.eCache[i] = [1, Ei] validEcacheList = nonzero(oS.eCache[:, 0].A)[0] if len(validEcacheList) > 1: for k in validEcacheList: if k == i: continue Ek = calcEk(oS, k) deltaE = abs(Ei - Ek) if deltaE > maxDeltaE: maxK = k maxDeltaE = deltaE Ej = Ek return maxK, Ej else: j = selectJrand(i, oS.m) Ej = calcEk(oS, j) return j, Ej def updateEk(oS, k): Ek = calcEk(oS, k) oS.eCache[k] = [1, Ek] # Optimization routines in the complete Platt SMO algorithm # def innerL(i, oS): # Ei = calcEk(oS, i) # if ((oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or \ # ((oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)): # j, Ej = selectJ(i, oS, Ei) # alphaIold = oS.alphas[i].copy() # alphaJold = oS.alphas[j].copy() # if (oS.labelMat[i] != oS.labelMat[j]): # L = max(0, oS.alphas[j] - oS.alphas[i]) # H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) # else: # L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) # H = min(oS.C, oS.alphas[j] + oS.alphas[i]) # if L == H: # print('L==H') # return 0 # eta = 2.0 * oS.X[i, :] * oS.X[j, :].T - \ # oS.X[i, :] * oS.X[i, :].T - \ # oS.X[j, :] * oS.X[j, :].T # if eta >= 0: # print('eta>=0') # return 0 # oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta # oS.alphas[j] = clipAlpha(oS.alphas[j], H, L) # # updateEk(oS, j) # # if (abs(oS.alphas[j] - alphaJold) < 0.00001): # Check alpha[j] for minor changes # print('j not moving enough') # return 0 # oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (\ # alphaJold - oS.alphas[j]) # alpha[i] and alpha[j] are the same. The change is the same in size, but in the opposite direction (one increases and the other decreases) # # updateEk(oS, i) # # # After optimizing alpha[i] and alpha[j], set a constant term b for these two alpha values # b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * \ # oS.X[i, :] * oS.X[i, :].T - \ # oS.labelMat[j] * (oS.alphas[j] - alphaJold) * \ # oS.X[i, :] * oS.X[j, :].T # b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * \ # oS.X[i, :] * oS.X[j, :].T - \ # oS.labelMat[j] * (oS.alphas[j] - alphaJold) * \ # oS.X[j, :] * oS.X[j, :].T # if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): # oS.b = b1 # elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): # oS.b = b2 # else: # oS.b = (b1 + b2) / 2.0 # return 1 # else: # return 0 # Package the above code together and select the outer loop of the first alpha value def smoP(dataMatIn, classLabels, C, toler, maxIter, kTup=('lin', 0)): # oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler) oS = optStruct(mat(dataMatIn), mat(classLabels).transpose(), C, toler, kTup) iter = 0 entireSet = True; alphaPairsChanged = 0 while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)): alphaPairsChanged = 0 if entireSet: # Traverse all values for i in range(oS.m): alphaPairsChanged += innerL(i, oS) print('fullSet, iter:%d i:%d, pairs changed %d' % (iter, i, alphaPairsChanged)) iter += 1 else: # Traversal of non boundary values nonBoundIs = nonzero((oS.alphas.A > 0) * (oS.alphas.A < C))[0] for i in nonBoundIs: alphaPairsChanged += innerL(i, oS) print('non-bound, iter: %d i:%d, pairs changed %d' % (iter, i, alphaPairsChanged)) iter += 1 if entireSet: entireSet = False elif (alphaPairsChanged == 0): entireSet = True print('iteration number: %d' % iter) return oS.b, oS.alphas def calcWs(alphas, dataArr, classLabels): X = mat(dataArr) labelMat = mat(classLabels).transpose() m, n = shape(X) w = zeros((n, 1)) for i in range(m): w += multiply(alphas[i]*labelMat[i], X[i, :].T) return w '''Kernel conversion function''' def kernelTrans(X, A, kTup): # Function call: kerneleval = kerneltrans (SVS, datmat [I,:], ('rbf ', K1)) SVS is the support vector m, n = shape(X) K = mat(zeros((m, 1))) if kTup[0] == 'lin': # For determining the type of kernel function, two options are given here, which can also be extended K = X * A.T elif kTup[0] == 'rbf':# Radial basis function for j in range(m): deltaRow = X[j,:] - A K[j] = deltaRow * deltaRow.T K = exp(K/(-1*kTup[1]**2)) else: # If you encounter a tuple that cannot be recognized, throw an exception and stop the program raise NameError('Houston We Have a Problem -- \ That Kernel is not recognized') return K # Return k vector '''Using kernel functions requires innerL()and calcEk()Function. The modified function is as follows:''' def innerL(i, oS): Ei = calcEk(oS, i) if ((oS.labelMat[i]*Ei < -oS.tol) and (oS.alphas[i] < oS.C)) or \ ((oS.labelMat[i] * Ei > oS.tol) and (oS.alphas[i] > 0)): j, Ej = selectJ(i, oS, Ei) alphaIold = oS.alphas[i].copy() alphaJold = oS.alphas[j].copy() if (oS.labelMat[i] != oS.labelMat[j]): L = max(0, oS.alphas[j] - oS.alphas[i]) H = min(oS.C, oS.C + oS.alphas[j] - oS.alphas[i]) else: L = max(0, oS.alphas[j] + oS.alphas[i] - oS.C) H = min(oS.C, oS.alphas[j] + oS.alphas[i]) if L == H: print('L==H') return 0 eta = 2.0 * oS.K[i, j] * oS.K[i, i] - oS.K[j, j] # Modified here 1 if eta >= 0: print('eta>=0') return 0 oS.alphas[j] -= oS.labelMat[j] * (Ei - Ej) / eta oS.alphas[j] = clipAlpha(oS.alphas[j], H, L) updateEk(oS, j) if (abs(oS.alphas[j] - alphaJold) < 0.00001): # Check alpha[j] for minor changes print('j not moving enough') return 0 oS.alphas[i] += oS.labelMat[j] * oS.labelMat[i] * (\ alphaJold - oS.alphas[j]) # alpha[i] and alpha[j] are the same. The change is the same in size, but in the opposite direction (one increases and the other decreases) updateEk(oS, i) # yes alpha[i]and alpha[j]After optimization, give these two alpha Value sets a constant item b # Amendment 2 b1 = oS.b - Ei - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, i] - \ oS.labelMat[j] * (oS.alphas[j] - alphaJold) * oS.K[i, j] b2 = oS.b - Ej - oS.labelMat[i] * (oS.alphas[i] - alphaIold) * oS.K[i, j] - \ oS.labelMat[j] * (oS.alphas[j] - alphaJold) * oS.K[j, j] if (0 < oS.alphas[i]) and (oS.C > oS.alphas[i]): oS.b = b1 elif (0 < oS.alphas[j]) and (oS.C > oS.alphas[j]): oS.b = b2 else: oS.b = (b1 + b2) / 2.0 return 1 else: return 0 def calcEk(oS, k): fXk = float(multiply(oS.alphas, oS.labelMat).T * \ oS.K[:, k] + oS.b) Ek = fXk - float(oS.labelMat[k]) return Ek '''Using kernel functions in testing''' def testRbf(k1=1.3): dataArr, labelArr = loadDataSet('testSet.txt') b, alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, ('rbf', k1)) datMat = mat(dataArr) labelMat = mat(labelArr).transpose() svInd = nonzero(alphas.A>0)[0] # alphas. A> 0 returns True or False; nonzero: returns a two-dimensional array. The first array describes the index value of non-zero elements from the row dimension, and the second array describes the index value from the column dimension # print('svInd:', svInd) #svInd: [17 39 55 57 64 70 74 94] alpha is a 100 * 1-dimensional matrix with only 0 columns, so the number of rows of non-zero alpha is returned sVs = datMat[svInd] # print('sVs:', sVs) #sVs: [[ 4.658191 3.507396]..... # Returns the element characteristic value of the row with non-zero alpha labelSV = labelMat[svInd] # print('labelSV:', labelSV) # labelSV: [[-1.]...... # Returns the category of the support vector print('there are %d Support Vectors' % shape(sVs)[0]) # Number of output support vectors m, n = shape(datMat) errorCount = 0 for i in range(m): # The following two sentences are the key statements for classification by kernel function, and give how to classify by kernel function kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1)) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print('the training error rate is: %f' % (float(errorCount)/m)) dataArr, labelArr = loadDataSet('testSetRBF2.txt') errorCount = 0 datMat = mat(dataArr) labelMat = mat(labelArr).transpose() m, n = shape(datMat) for i in range(m): # The following two sentences are the key statements for classification using kernel functions kernelEval = kernelTrans(sVs, datMat[i, :], ('rbf', k1)) # sVs: features of support vector # print('kernelEval:', kernelEval) predict = kernelEval.T * multiply(labelSV, alphas[svInd]) + b if sign(predict) != sign(labelArr[i]): errorCount += 1 print('the test error rate is :%f' % (float(errorCount)/m)) '''be based on SVM Handwritten numeral recognition based on''' def img2vector(filename): returnVect = zeros((1,1024)) fr = open(filename) for i in range(32): lineStr = fr.readline() for j in range(32): returnVect[0,32*i+j] = int(lineStr[j]) return returnVect def loadImages(dirName): from os import listdir hwLabels = [] trainingFileList = listdir(dirName) #load the training set m = len(trainingFileList) trainingMat = zeros((m,1024)) for i in range(m): fileNameStr = trainingFileList[i] fileStr = fileNameStr.split('.')[0] #take off .txt classNumStr = int(fileStr.split('_')[0]) if classNumStr == 9: hwLabels.append(-1) else: hwLabels.append(1) trainingMat[i,:] = img2vector('%s/%s' % (dirName, fileNameStr)) return trainingMat, hwLabels def testDigits(kTup=('rbf', 10)): dataArr,labelArr = loadImages('trainingDigits') b,alphas = smoP(dataArr, labelArr, 200, 0.0001, 10000, kTup) datMat=mat(dataArr) labelMat = mat(labelArr).transpose() svInd=nonzero(alphas.A>0)[0] sVs=datMat[svInd] labelSV = labelMat[svInd] print("there are %d Support Vectors" % shape(sVs)[0]) m,n = shape(datMat) errorCount = 0 for i in range(m): kernelEval = kernelTrans(sVs,datMat[i,:],kTup) predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b if sign(predict)!=sign(labelArr[i]): errorCount += 1 print("the training error rate is: %f" % (float(errorCount)/m)) dataArr,labelArr = loadImages('testDigits') errorCount = 0 datMat=mat(dataArr); labelMat = mat(labelArr).transpose() m,n = shape(datMat) for i in range(m): kernelEval = kernelTrans(sVs,datMat[i,:],kTup) predict=kernelEval.T * multiply(labelSV,alphas[svInd]) + b if sign(predict)!=sign(labelArr[i]): errorCount += 1 print("the test error rate is: %f" % (float(errorCount)/m)) if __name__ == '__main__': dataArr, labelArr = loadDataSet('testSet.txt') # print(dataArr) # [[3.542485, 1.977398], [3.018896, 2.556416], # print(labelArr) #Simplified SMO algorithm test # b, alphas = smoSimple(dataArr, labelArr, 0.6, 0.001, 40) # print(b) #[[-3.78586587]] # print(alphas[alphas > 0]) #[[0.14742212 0.17251816 0.04511926 0.36505954]] # Number of output support vectors # print(shape(alphas[alphas > 0])) # (1, 4) # The output is the point of the support vector # for i in range(100): # if alphas[i]>0.0: # print(dataArr[i], labelArr[i]) ''' [4.658191, 3.507396] -1.0 [3.457096, -0.082216] -1.0 [5.286862, -2.358286] 1.0 [6.080573, 0.418886] 1.03 ''' # Complete SMO algorithm test # b, alphas = smoP(dataArr, labelArr, 0.6, 0.001, 40) # print(b) #[[-2.89901748]] # print(alphas[alphas > 0]) #[[0.06961952 0.0169055 0.0169055 0.0272699 0.04522972 0.0272699 0.0243898 0.06140181 0.06140181]] # ws = calcWs(alphas, dataArr,labelArr) # print(ws) # datMat = mat(dataArr) # if (datMat[0] * mat(ws) + b) > 0: # print('belongs to class 1 ') # else: # print('belongs to - 1 class') # # Confirm whether the classification is correct: # print(labelArr[0]) '''The above output results belong to-1 class -1.0 ''' # testRbf(2.0) # Handwritten numeral test testDigits(('rbf', 20))
The content of this chapter is not completely understood. It needs to be returned to continue polishing