Plane data classification of deep learning single hidden layer

Posted by jlgray48 on Sat, 19 Feb 2022 05:58:59 +0100

Teacher Wu began to build a hidden neural network.

1, Importing datasets and drawing

Two files are required before importing the dataset. Please refer to [data]

import numpy as np
import pandas as pd
from matplotlib import pyplot as plot
from testCases import *
import sklearn
from sklearn import datasets
from sklearn import linear_model
from planar_utils import *
plt.rcParams['font.sans-serif']=['SimHei'] #Used to display Chinese labels normally
plt.rcParams['axes.unicode_minus']=False #Used to display negative signs normally
np.set_printoptions(suppress=True)#Prevent scientific counting

np.random.seed(1)
X,Y = load_planar_dataset()
plt.figure(figsize=(12,8))
plt.scatter(X[0,:],X[1,:],c = Y.flatten(),label = 'Scatter')
plt.show()

In PLT The parameter c in scatter () is color, which is assigned as an iteratable parameter object with the same length as X and Y. according to the different values, the (x,y) parameter pairs show different colors. Simply put, it's good to distinguish colors according to one of the X and Y values. For example, the upper side wants to distinguish colors according to the y value, so it's OK to directly c=y.

2, logistic regression test

Instead of building wheels like the previous machine learning, directly call the sklearn function and test it with logistic regression.

# 10: All sample dimensions are 2 * 400
# Y: All label dimensions are 1 * 400 (0 | 1)

lr = linear_model.LogisticRegression()
lr.fit(X.T,Y.flatten())

#View its corresponding w
print(lr.coef_)
#View its corresponding w0
print(lr.intercept_)
w1 = lr.coef_[0,0]
w2 = lr.coef_[0,1]
w0 = lr.intercept_

c = np.linspace(min(X[0,:]),max(X[0,:]),100)
f = [(-w1*i-w0)/w2 for i in c]
plt.plot(c,f,label = 'Classification curve',color = 'r')
plt.legend()
plt.show()
#An array of prediction results
res = lr.predict(X.T)
print('Quasi determination of logistic regression%f%%'%((float(np.dot(Y,res))+float(np.dot(1-Y,1-res)))/Y.size*100))

The drawing method I just used is different from that of Mr. Wu. He also specially wrote a function to draw the classification line. I drew it directly by using W0, W1 and W2, because w0+w1*x1+w2*x2 = 0, which is just a function and can roughly draw the effect of Mr. Wu.

I wanted to use this linear function to draw a closed circle before. I tried it many times, but I still didn't draw it. Later, I thought, this itself is a primary function. How can I draw the image of a quadratic function?

3, Build neural network model

General methods of constructing neural networks:

1. Define the structure of neural network

2. Analyze the dimensions of the corresponding parameters of each layer structure

3. Define activation function

4. Initialize model parameters

5. Define forward propagation function

6. Calculate loss

7. Define back propagation function

8. Iterative integration

9. Forecast

10. Image analysis

The network model is as follows:

According to the model picture, calculate the dimension of the parameter (speculate a little, so as to facilitate vector multiplication with np later):

X: 2 * 400 * dataset x is a total of 400, with 2 features each

Y: Classification of each sample (1240 * 1)

The W dimension of the first layer is 4 * 2, because there are four nodes in this layer and two nodes in the previous layer

B of the first layer: because this layer has 4 nodes, it is 4 * 1

Therefore, Z and A of the first layer are 4 * 400 ， because there are 400 samples ， and the activation function is tanh

The W dimension of the second layer is 1 * 4. This layer has one node and the upper layer has four nodes, which is similar to logistic regression

B in the second layer: dimension 1 * 1 is a single value

Z and A of the second layer: 1 * 400 ＾ the activation function is sigmoid

4, Define model structure

Defining the structure of the model is mainly to define the quantity contained in each layer, and also to define the sigmoid function. For the tanh function, you can use NP Tanh()

def sigmoid(x):
    return 1/(1+np.exp(-x))


def layer_size(X,Y):
    #Input layer
    n_x = X.shape[0]
    #Hidden layer
    n_h = 4
    #Output layer
    n_y = Y.shape[0]
    return n_x,n_h,n_y

I think because there are few layers in this model, if there are many layers, an array may be used to store the corresponding number of each layer.

5, Initialize w and b

It can be seen from the model that this is a two-layer model, so it needs two layers to measure W and B

For W: use random values to initialize the matrix. It is not possible to set the initial w to all 0 matrices like logistic

w1 = np.random.randn(n_h,n_x)*0.01

For b: it can be set to all 0 matrix

b1 = np.zeros((n_h,1))

def initialize_parameters(n_x,n_h,n_y):
    '''
    It is determined according to the number of each layer. Because this is a two-layer model, the first layer needs to be defined W and B Second floor W and B
    :param n_x:
    :param n_h:
    :param n_y:
    :return:
    '''
    np.random.seed(2)
    w1 = np.random.randn(n_h,n_x)*0.01
    b1 = np.zeros((n_h,1))
    w2 = np.random.randn(n_y,n_h)*0.01
    b2 = np.zeros((n_y,1))

    parameters = {
        'w1':w1,
        'w2': w2,
        'b1': b1,
        'b2': b2,
    }

    return parameters

np.random.seed(2), like the number generated by the teacher, I have to say that the boss is considerate, which is convenient to verify the correctness of our experiment.

6, Forward propagation

def forward_propagation(X,parameters):
    '''
    Realize forward propagation
    :param X:Sample matrix 2*400
    :param parameters:  Initialized w and b Dictionary of composition
    :return:
    '''
    w1 = parameters['w1']#4*2
    b1 = parameters['b1']#4*1
    w2 = parameters['w2']#1*4
    b2 = parameters['b2']#1*1

    Z1 = np.dot(w1,X)+b1#4*400
    A1 = np.tanh(Z1)#4*400
    Z2 = np.dot(w2,A1)+b2#1*400
    A2 = sigmoid(Z2)#1*400

    cache = {
        "Z1":Z1,
        "A1": A1,
        "Z2": Z2,
        "A2": A2
    }

    return A2,cache

When writing code, it is best to think about the dimension of the matrix generated at each step and observe whether the form of (1, m) can be generated at the end.

VII. Cost function

After the forward propagation is completed, the cost function must be calculated. The code is as follows:

def compute_cost(A2,Y):
    m = Y.shape[1]
    first = np.multiply(-Y,np.log(A2))
    second = np.multiply(1-Y,np.log(1-A2))
    
    return np.sum(first-second)/m

8, Back propagation

I think this is the most difficult part. The reason why I think it is difficult is that I still haven't deduced these formulas up to now. I will deduce the single-layer neural network, but once the hidden layer is added, the step containing the derivative of the activation formula is not deduced. Take your time and discuss with others.

def backward_propagation(parameters,cache,X,Y):
    '''

    :param parameters: Contains w1 w2 b1 b2
    :param cache: Contains Z1 Z2 A1 A2
    :param X: input data
    :param Y: label
    :return: grads Dictionary contains dw and db
    '''

    #X；2*400
    #Y；1*400

    m = X.shape[1]
    w1 = parameters['w1']#4*2
    w2 = parameters['w2']#1*4

    A1 = cache['A1']#4*400
    A2 = cache['A2']#1*400

    dz2 = A2-Y#1*400
    dw2 = (1/m)*np.dot(dz2,A1.T)#1*4
    db2 = (1/m)*np.sum(dz2,axis = 1,keepdims = True)#1*1
    #The hardest part to understand
    dz1 = np.multiply(np.dot(w2.T,dz2),1-np.power(A1,2))#4*400
    dw1 = (1/m)*np.dot(dz1,X.T)#4*2
    db1 = (1/m)*np.sum(dz1,axis = 1,keepdims = True)#4*1

    grads = {
        'dw1':dw1,
        'db1': db1,
        'dw2': dw2,
        'db2': db2,
    }
    return grads

9, Iterative update parameters

Perform gradient descent and update iteration on the parameters obtained above

def update_parameters(parameters,grads,learning_rate = 1.2):
    '''
    to update w1  w2  b1 b2
    :param parameters:
    :param grads:
    :param learning_rate:
    :return:
    '''

    w1 = parameters['w1']
    w2 = parameters['w2']
    b1 = parameters['b1']
    b2 = parameters['b2']

    dw2 = grads['dw2']
    dw1 = grads['dw1']
    db2 = grads['db2']
    db1 = grads['db1']

    w2 = w2 - learning_rate*dw2
    b2 = b2 - learning_rate * db2
    w1 = w1 - learning_rate * dw1
    b1 = b1 - learning_rate * db1

    parameters = {
        'w1': w1,
        'w2': w2,
        'b1': b1,
        'b2': b2,
    }
    return parameters

10, Integration model

def nn_model(X,Y,n_h,num_iterations,print_cost=False):


    np.random.seed(3)
    n_x = layer_size(X,Y)[0]
    n_y = layer_size(X,Y)[2]

    parameters = initialize_parameters(n_x,n_h,n_y)
    w1 = parameters['w1']
    w2 = parameters['w2']
    b1 = parameters['b1']
    b2 = parameters['b2']


    for i in range(num_iterations):
        A2,cache = forward_propagation(X,parameters)
        cost = compute_cost(A2,Y)
        grads = backward_propagation(parameters,cache,X,Y)
        parameters = update_parameters(parameters,grads,learning_rate=0.5)

        if print_cost:
            if i%100==0:
                print("The first",i,"Secondary cycle,Cost is"+str(cost))
    return parameters

11, Prediction results

I have to say, this prediction function is really beautiful!

def predict(parameters,X):
    A2,cache = forward_propagation(X,parameters)
    predictions = np.round(A2)
    return predictions

Twelve, model operation

parameters = nn_model(X,Y,4,10000,True)

predictions = predict(parameters,X)

plot_decision_boundary(lambda x:predict(parameters,x.T),X,Y)
s1 = np.dot(Y,predictions.T)
s2 = np.dot(1-Y,1-predictions.T)
num = Y.size
print('Accuracy:%d%%'%float((s1+s2)/num*100))

13, Different number of hidden layers

Use a for loop to observe the accuracy of different number of hidden layers

plt.figure(figsize=(16,32))
hidden_layer_sizes = [1,2,3,4,5,20,50,100,200,300]
for i,n_h in enumerate(hidden_layer_sizes):
    plt.subplot(5,2,i+1)
    plt.title('hidden_layer_size %d'%n_h)
    parameters = nn_model(X,Y,n_h,num_iterations=5000)
    plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
    predictions = predict(parameters, X)
    s1 = np.dot(Y, predictions.T)
    s2 = np.dot(1 - Y, 1 - predictions.T)
    num = Y.size
    print('hidden_layer_size{}Accuracy:{}%%' .format(n_h,float((s1 + s2) / num * 100) ))

plt.show()

(after running it, I found that the accuracy rate was one% more, so I didn't modify it again)_ layer_ When the size is 4, it is best. In the previous operation, when the number of hidden layers is 5, it is only 90%, so the best scheme is 4 or 5 hidden layers (a little different from Mr. Wu's result), which is basically the same.

14, The activation function tanh becomes sigmoid

The specific code is to change the activation function and the derivative at the corresponding position.

The results are as follows:

It can be seen that the overall result is not as good as the tanh activation function

15, Change learning rate

Forget to mention that all the above experimental results are generated when the number of iterations is 5000 and the learning rate is 0.5.

When the change learning rate is 0.1:

16, Summary

① : it can be seen from the above experimental summary that Mr. Wu has adjusted the code to the best

② : familiar with the construction of the whole neural network

③ : the line of code of the derivative in the back propagation is pondering

④ : after the model is built, the key is to adjust the super parameters to make w and b more appropriate

⑤ : continue to do teacher Wu's next experiment

17, Appendix code (do not use it directly)

import numpy as np
import pandas as pd
from matplotlib import pyplot as plot
from testCases import *
import sklearn
from sklearn import datasets
from sklearn import linear_model
from planar_utils import *
plt.rcParams['font.sans-serif']=['SimHei'] #Used to display Chinese labels normally
plt.rcParams['axes.unicode_minus']=False #Used to display negative signs normally
np.set_printoptions(suppress=True)#Prevent scientific counting


np.random.seed(1)

X,Y = load_planar_dataset()



#logistic regression
def logistic():
    plt.figure(figsize=(12,8))
    plt.scatter(X[0,:],X[1,:],c = Y.flatten(),label = 'Scatter')
    # 10: All sample dimensions are 2 * 400
    # Y: All label dimensions are 1 * 400 (0 | 1)
    lr = linear_model.LogisticRegression()
    lr.fit(X.T,Y.flatten())
    #View its corresponding w
    print(lr.coef_)
    #View its corresponding w0
    print(lr.intercept_)
    w1 = lr.coef_[0,0]
    w2 = lr.coef_[0,1]
    w0 = lr.intercept_

    c = np.linspace(min(X[0,:]),max(X[0,:]),100)
    f = [(-w1*i-w0)/w2 for i in c]
    plt.plot(c,f,label = 'Classification curve',color = 'r')
    plt.legend()
    plt.show()
    #An array of prediction results
    res = lr.predict(X.T)
    print('Quasi determination of logistic regression%f%%'%((float(np.dot(Y,res))+float(np.dot(1-Y,1-res)))/Y.size*100))



def sigmoid(x):
    return 1/(1+np.exp(-x))


def layer_size(X,Y):
    #Input layer
    n_x = X.shape[0]
    #Hidden layer
    n_h = 4
    #Output layer
    n_y = Y.shape[0]
    return n_x,n_h,n_y


def initialize_parameters(n_x,n_h,n_y):
    '''
    It is determined according to the number of each layer. Because this is a two-layer model, the first layer needs to be defined W and B Second floor W and B
    :param n_x:
    :param n_h:
    :param n_y:
    :return:
    '''
    np.random.seed(2)
    w1 = np.random.randn(n_h,n_x)*0.01
    b1 = np.zeros((n_h,1))
    w2 = np.random.randn(n_y,n_h)*0.01
    b2 = np.zeros((n_y,1))

    parameters = {
        'w1':w1,
        'w2': w2,
        'b1': b1,
        'b2': b2,
    }
    return parameters

def forward_propagation(X,parameters):
    '''
    Realize forward propagation
    :param X:Sample matrix 2*400
    :param parameters:  Initialized w and b Dictionary of composition
    :return:
    '''
    w1 = parameters['w1']#4*2
    b1 = parameters['b1']#4*1
    w2 = parameters['w2']#1*4
    b2 = parameters['b2']#1*1

    Z1 = np.dot(w1,X)+b1#4*400
    A1 = np.tanh(Z1)#4*400
    #A1 = sigmoid(Z1)
    Z2 = np.dot(w2,A1)+b2#1*400
    A2 = sigmoid(Z2)#1*400

    cache = {
        "Z1":Z1,
        "A1": A1,
        "Z2": Z2,
        "A2": A2
    }

    return A2,cache


def compute_cost(A2,Y):
    m = Y.shape[1]
    first = -np.multiply(Y,np.log(A2))
    second = np.multiply(1-Y,np.log(1-A2))

    return np.sum(first-second)/m


def backward_propagation(parameters,cache,X,Y):
    '''

    :param parameters: Contains w1 w2 b1 b2
    :param cache: Contains Z1 Z2 A1 A2
    :param X: input data
    :param Y: label
    :return: grads Dictionary contains dw and db
    '''

    #X；2*400
    #Y；1*400

    m = X.shape[1]
    w1 = parameters['w1']#4*2
    w2 = parameters['w2']#1*4

    A1 = cache['A1']#4*400
    A2 = cache['A2']#1*400

    dz2 = A2-Y#1*400
    dw2 = (1/m)*np.dot(dz2,A1.T)#1*4
    db2 = (1/m)*np.sum(dz2,axis = 1,keepdims = True)#1*1
    #The hardest part to understand
    dz1 = np.multiply(np.dot(w2.T,dz2),1-np.power(A1,2))#4*400
    #dz1 = np.multiply(np.dot(w2.T, dz2),A1*(1-A1) )  # 4*400
    dw1 = (1/m)*np.dot(dz1,X.T)#4*2
    db1 = (1/m)*np.sum(dz1,axis = 1,keepdims = True)#4*1

    grads = {
        'dw1':dw1,
        'db1': db1,
        'dw2': dw2,
        'db2': db2,
    }
    return grads


def update_parameters(parameters,grads,learning_rate = 1.2):
    '''
    to update w1  w2  b1 b2
    :param parameters:
    :param grads:
    :param learning_rate:
    :return:
    '''

    w1 = parameters['w1']
    w2 = parameters['w2']
    b1 = parameters['b1']
    b2 = parameters['b2']

    dw2 = grads['dw2']
    dw1 = grads['dw1']
    db2 = grads['db2']
    db1 = grads['db1']

    w2 = w2 - learning_rate*dw2
    b2 = b2 - learning_rate * db2
    w1 = w1 - learning_rate * dw1
    b1 = b1 - learning_rate * db1

    parameters = {
        'w1': w1,
        'w2': w2,
        'b1': b1,
        'b2': b2,
    }
    return parameters


def nn_model(X,Y,n_h,num_iterations,print_cost=False):


    np.random.seed(3)
    n_x = layer_size(X,Y)[0]
    n_y = layer_size(X,Y)[2]

    parameters = initialize_parameters(n_x,n_h,n_y)
    w1 = parameters['w1']
    w2 = parameters['w2']
    b1 = parameters['b1']
    b2 = parameters['b2']


    for i in range(num_iterations):
        A2,cache = forward_propagation(X,parameters)
        cost = compute_cost(A2,Y)
        grads = backward_propagation(parameters,cache,X,Y)
        parameters = update_parameters(parameters,grads,learning_rate=0.1)

        if print_cost:
            if i%100==0:
                print("The first",i,"Secondary cycle,Cost is"+str(cost))
    return parameters



def predict(parameters,X):
    A2,cache = forward_propagation(X,parameters)
    predictions = np.round(A2)
    return predictions


plt.figure(figsize=(16,32))
hidden_layer_sizes = [1,2,3,4,5,20,50,100,200,300]
for i,n_h in enumerate(hidden_layer_sizes):
    plt.subplot(5,2,i+1)
    plt.title('hidden_layer_size %d'%n_h)
    parameters = nn_model(X,Y,n_h,num_iterations=5000)
    plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
    predictions = predict(parameters, X)
    s1 = np.dot(Y, predictions.T)
    s2 = np.dot(1 - Y, 1 - predictions.T)
    num = Y.size
    print('hidden_layer_size{}Accuracy:{}%%' .format(n_h,float((s1 + s2) / num * 100) ))

plt.show()

Topics: Python Machine Learning AI neural networks Deep Learning

Programmer Think