Deep learning: Based on python: Chapter 3

Posted by lvitup on Tue, 30 Nov 2021 21:32:51 +0100

Chapter 3 neural network

3.1 from perceptron to neural network

3.1.1 examples of neural networks

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-pdqemfik-1638008230252) (picture / 3.png)]

The network consists of three layers of neurons, but in essence, only two layers of neurons have weight, so it is called "two-layer network". Please note that some books also refer to the network in Figure 3-1 as "layer 3 network" according to the number of layers constituting the network. This book will represent the name of the network according to the number of layers that actually have weight (the total number of input layers, hidden layers and output layers minus 1)

3.1.2 review perceptron

y = h(b + w1x1 + w2x2)

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG cibmn7ll-1638008230253) (picture / 4.png)]

3.1.3 launch of activation function

The h (x) function just introduced will convert the sum of input signals into output signals. This function is generally called activation function

a = b + w1x1 + w2x2

y = h(a)

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-zfokwqlw-163808230254) (picture / 5.png)]

3.2 activation function

3.2.1 sigmoid function

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-msf4dnfj-1638008230256) (picture / 6.png)]

In the neural network, sigmoid function is used as the activation function to convert the signal, and the signal is transmitted to the next neuron.

3.2.2 realization of step function

Functions that support the implementation of NumPy arrays

def step_function(x):
 y = x > 0
 return y.astype(np.int)

The operation of this function

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-k3trf9nx-1638008230257) (picture / 7.png)]

The astype() method converts the type of NumPy array. The astype() method specifies the expected type through parameters,

In this example, it is np.int. After converting Boolean to int in Python, True is converted to 1 and False to 0.

3.2.3 graph of step function

function

import numpy as np
import matplotlib.pylab as plt
def step_function(x):
 return np.array(x > 0, dtype=np.int)
x = np.arange(-5.0, 5.0, 0.1)
y = step_function(x)
plt.plot(x, y)
plt.ylim(-0.1, 1.1) # Specifies the range of the y axis
plt.show()

Np.orange (- 5.0, 5.0, 0.1) generates a NumPy array ([- 5.0, - 4.9,..., 4.9]) in units of 0.1 in the range of − 5.0 to 5.0.

step_function() takes the NumPy array as a parameter, performs step function operation on each element of the array, and returns the operation result in the form of array.

3.2.4 implementation of sigmoid function

def sigmoid(x):
 return 1 / (1 + np.exp(-x))
>>> x = np.array([-1.0, 1.0, 2.0])
>>> sigmoid(x)
array([ 0.26894142, 0.73105858, 0.88079708])

The implementation of sigmoid function can support NumPy array. The secret lies in the broadcast function of NumPy. According to NumPy's broadcast function,

If you operate between a scalar and a NumPy array, the scalar will operate on each element of the NumPy array.

3.2.5 comparison between sigmoid function and step function

sigmoid function is a smooth curve, and the output changes continuously with the input.

The step function is bounded by 0, and the output changes sharply

Relative to the step function, it can only return 0 or 1,

sigmoid function can return real numbers such as 0.731... And 0.880

3.2.6 nonlinear function

The activation function of neural network must use nonlinear function.

3.2.7 ReLU function

def relu(x):
 return np.maximum(0, x)

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-iyj7nxgq-1638008230258) (picture / 8.png)]

3.3 operation of multidimensional array

3.3.1 multidimensional array

>>> import numpy as np
>>> A = np.array([1, 2, 3, 4])
>>> print(A)
[1 2 3 4]
>>> np.ndim(A)
1
>>> A.shape
(4,)
>>> A.shape[0]
4

>>> B = np.array([[1,2], [3,4], [5,6]])
>>> print(B)
[[1 2]
 [3 4]
 [5 6]]
>>> np.ndim(B)
2
>>> B.shape
(3, 2)

The dimension of the array can be obtained through the np.dim() function. A is a one-dimensional array composed of four elements

Note that the result of A.shape here is a tuple.

3.3.2 matrix multiplication

>>> A = np.array([[1,2], [3,4]])
>>> A.shape
(2, 2)
>>> B = np.array([[5,6], [7,8]])
>>> B.shape
(2, 2)
>>> np.dot(A, B)
array([[19, 22],
 [43, 50]])

Both A and B are 2 × 2, whose product can be calculated by NumPy's np.dot() function (product is also called dot product). np.dot() takes two NumPy arrays as parameters and returns the product of the arrays.

>>> A = np.array([[1,2,3], [4,5,6]])
>>> A.shape
(2, 3)
>>> B = np.array([[1,2], [3,4], [5,6]])
>>> B.shape
(3, 2)
>>> np.dot(A, B)
array([[22, 28],
 [49, 64]])

3.3.3 inner product of neural network

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-l7xwtquo-1638008230259) (picture / 9.png)]

>>> X = np.array([1, 2])
>>> X.shape
(2,)
>>> W = np.array([[1, 3, 5], [2, 4, 6]])
>>> print(W)
[[1 3 5]
 [2 4 6]]
>>> W.shape
(2, 3)
>>> Y = np.dot(X, W)
>>> print(Y)
[ 5 11 17]

3.4 implementation of 3-layer neural network

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-tg6iqjac-163808230259) (picture / 10.png)]

3.4.1 symbol confirmation

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-qvohqjab-1638008230260) (picture / 11.png)]

3.4.2 realization of signal transmission between layers

####  **Signal transmission from current input layer to layer 1**

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-3myrefio-1638008230260) (picture / 12.png)]

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG ekwylyk-1638008230260) (picture / 13.png)]

X = np.array([1.0, 0.5])
W1 = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
B1 = np.array([0.1, 0.2, 0.3])
print(W1.shape) # (2, 3)
print(X.shape) # (2,)
print(B1.shape) # (3,)
A1 = np.dot(X, W1) + B1

Z1 = sigmoid(A1)
print(A1) # [0.3, 0.7, 1.1]
print(Z1) # [0.57444252, 0.66818777, 0.75026011

As shown in Fig. 3-18, the weighted sum of the hidden layer (the sum of the weighted signal and the offset) is represented by a, and the signal converted by the activation function is represented by z. In addition, h() in the figure represents the activation function,

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-y0hobq85-163808230261) (picture / 14.png)]

Signal transmission from layer 1 to layer 2

W2 = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
B2 = np.array([0.1, 0.2])
print(Z1.shape) # (3,)
print(W2.shape) # (3, 2)
print(B2.shape) # (2,)
A2 = np.dot(Z1, W2) + B2
Z2 = sigmoid(A2)

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-rq9uknky-1638008230261) (picture / 15.png)]

Signal transmission from layer 2 to output layer

The implementation of the output layer is also basically the same as the previous implementation. However, the final activation function is different from the previous hidden layer

Activation function of output layer σ () represents the activation function h() different from the hidden layer( σ Read as sigma).

def identity_function(x):
 return x
W3 = np.array([[0.1, 0.3], [0.2, 0.4]])
B3 = np.array([0.1, 0.2])
A3 = np.dot(Z2, W3) + B3
Y = identity_function(A3) # Or Y = A3

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-z3vlip4k-1638008230262) (picture / 16.png)]

3.4.3 code implementation summary

def init_network():
 network = {}
 network['W1'] = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
 network['b1'] = np.array([0.1, 0.2, 0.3])
 network['W2'] = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
 network['b2'] = np.array([0.1, 0.2])
 network['W3'] = np.array([[0.1, 0.3], [0.2, 0.4]])
 network['b3'] = np.array([0.1, 0.2])
 return network
 
def forward(network, x):
 W1, W2, W3 = network['W1'], network['W2'], network['W3']
 b1, b2, b3 = network['b1'], network['b2'], network['b3']
 a1 = np.dot(x, W1) + b1
 z1 = sigmoid(a1)
 a2 = np.dot(z1, W2) + b2
 z2 = sigmoid(a2)
 a3 = np.dot(z2, W3) + b3
 y = identity_function(a3)
 return y
 
 
network = init_network()
x = np.array([1.0, 0.5])
y = forward(network, x)
print(y) # [ 0.31682708 0.69627909]

Init is defined_ Network() and forward() functions.

init_ The network() function initializes the weights and offsets and saves them in the dictionary variable network. The dictionary variable network stores the parameters (weight and offset) required for each layer.

The forward() function encapsulates the process of converting an input signal into an output signal.

3.5 design of output layer

Neural network can be used in classification and regression problems

The regression problem uses the identity function, and the classification problem uses the softmax function.

3.5.1 identity function and softmax function

Identity function:

The input will be output as is, and the input information will be output directly without any change.

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-zsgkjufe-1638008230262) (picture / 17.png)]

softmax function

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-33nakrvp-1638008230263) (picture / 18.png)]

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ec3vhrea-1638008230263) (picture / 19.png)]

Function:

def softmax(a):
 exp_a = np.exp(a)
 sum_exp_a = np.sum(exp_a)
 y = exp_a / sum_exp_a
 return y

Explanation:

>>> a = np.array([0.3, 2.9, 4.0])
>>>
>>> exp_a = np.exp(a) # exponential function
>>> print(exp_a)
[ 1.34985881 18.17414537 54.59815003]
>>>
>>> sum_exp_a = np.sum(exp_a) # Sum of exponential functions
>>> print(sum_exp_a)
74.1221542102
>>>
>>> y = exp_a / sum_exp_a
>>> print(y)
[ 0.01821127 0.24519181 0.73659691]

3.5.2 precautions when implementing softmax function

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ennyo6sg-163808230264) (picture / 20.png)]

Function:

def softmax(a):
 c = np.max(a)
 exp_a = np.exp(a - c) # Spillover Countermeasures
 sum_exp_a = np.sum(exp_a)
 y = exp_a / sum_exp_a
 return y
>>> a = np.array([1010, 1000, 990])
>>> np.exp(a) / np.sum(np.exp(a)) # Operation of softmax function
array([ nan, nan, nan]) # Not calculated correctly
>>>
>>> c = np.max(a) # 1010
>>> a - c
array([ 0, -10, -20])
>>>
>>> np.exp(a - c) / np.sum(np.exp(a - c))
array([ 9.99954600e-01, 4.53978686e-05, 2.06106005e-09])

3.5.3 characteristics of softmax function

def softmax(a):
 c = np.max(a)
 exp_a = np.exp(a - c) # Spillover Countermeasures
 sum_exp_a = np.sum(exp_a)
 y = exp_a / sum_exp_a
 return y
>>> a = np.array([0.3, 2.9, 4.0])
>>> y = softmax(a)
>>> print(y)
[ 0.01821127 0.24519181 0.73659691]
>>> np.sum(y)
1.0

3.5.4 number of neurons in output layer

3.6 handwritten numeral recognition

3.6.1 MNIST dataset

There are 60000 training images and 10000 test images

The MNIST dataset consists of digital images from 0 to 9 (Fig. 3-24). There are 60000 training images and 10000 test images, which can be used for learning and reasoning. The general use method of MNIST data set is to use the training image for learning first, and then use the learned model to measure the extent to which the test image can be correctly classified.

The image data of MNIST is 28 pixels × 28 pixel grayscale image (1 channel), and the value of each pixel is between 0 and 255. Each image data is marked with labels such as "7", "2" and "1".

load_mnist function returns the read MNIST data in the form of "(training image, training label), (test image, test label)".

It can also be like load_mnist(normalize=True, flatten=True, one_hot_label=False) set three parameters. The first parameter normalize sets whether to normalize the input image to a value of 0.0 ~ 1.0. If this parameter is set to False, the pixels of the input image will remain the original 0 ~ 255. The second parameter flatten sets whether to expand the input image (into a one-dimensional array). If this parameter is set to False, the input image is 1 × twenty-eight × A three-dimensional array of 28; If set to True, the input image is saved as a one-dimensional array of 784 elements. The third parameter is one_ hot_ Label sets whether to save labels as one hot representation. One hot means that only arrays with label 1 and the rest are 0 are correctly solved, such as [0,0,1,0,0,0,0,0,0,0]. When one_ hot_ When the label is False, it is just as simple as 7 and 2 to save the correct solution label; When one_ hot_ When label is True, the label is saved as one hot representation

Training image

# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # Settings for importing files from the parent directory
import numpy as np
from dataset.mnist import load_mnist
from PIL import Image


def img_show(img):
    pil_img = Image.fromarray(np.uint8(img))
    pil_img.show()

(x_train, t_train), (x_test, t_test) = load_mnist(flatten=True, normalize=False)

img = x_train[0]   #Means to take out a photo
label = t_train[0]  #For example, mark it as 5 and label=5
print(label)  # 5

print(img.shape)  # (784,)
img = img.reshape(28, 28)  # Change the shape of the image to its original size
print(img.shape)  # (28, 28)

img_show(img)

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ycnixj8f-163808230264) (picture / 21.png)]

3.6.2 inference processing of neural network

Next, we implement the reasoning processing of neural network for this MNIST data set. The input layer of neural network has 784 neurons and the output layer has 10 neurons. The number 784 of the input layer comes from 28 of the image size × 28 = 784, the number 10 of the output layer comes from the 10 category classification (numbers 0 to 9, 10 categories in total). In addition, the neural network has two hidden layers, the first hidden layer has 50 neurons and the second hidden layer has 100 neurons. This 50 and 100 can be set to any value. Let's define get first_ data(),init_network() and predict() (the code is in ch03/neuralnet_mnist.py).

#Will read in the sample saved in the pickle file_ Learned weight parameters in weight.pkl
#In this file, the weight and offset parameters are saved in the form of dictionary variables.
#Because we assumed that the learning had been completed before, the learned parameters were saved. Suppose it is saved in sample_ In the weight.pkl file,
def init_network():
    with open("sample_weight.pkl", 'rb') as f:
        network = pickle.load(f)
    return network

#Classification with predict() function
def predict(network, x):
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = softmax(a3)

    return y


x, t = get_data()
network = init_network()
accuracy_cnt = 0

#Take out the image data saved in x one by one with the for statement
for i in range(len(x)):
    y = predict(network, x[i])
    p= np.argmax(y) # Gets the index of the element with the highest probability
    if p == t[i]:
        accuracy_cnt += 1

print("Accuracy:" + str(float(accuracy_cnt) / len(x)))

After executing the above code, "Accuracy:0.9352" will be displayed. This means that 93.52% of the data are correctly classified.

3.6.3 batch processing

>>> x, _ = get_data()
>>> network = init_network()
>>> W1, W2, W3 = network['W1'], network['W2'], network['W3']
>>>
>>> x.shape
(10000, 784)
>>> x[0].shape
(784,)
>>> W1.shape
(784, 50)
>>> W2.shape
(50, 100)
>>> W3.shape
(100, 10)

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-nty0m4re-163808230264) (picture / 22.png)]

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ozo4ayga-163808230265) (picture / 23.png)]

This packaged input data is called a batch

Function:

#Will read in the sample saved in the pickle file_ Learned weight parameters in weight.pkl
#In this file, the weight and offset parameters are saved in the form of dictionary variables.
#Because we assumed that the learning had been completed before, the learned parameters were saved. Suppose it is saved in sample_ In the weight.pkl file,
def init_network():
    with open("sample_weight.pkl", 'rb') as f:
        network = pickle.load(f)
    return network

#Classification with predict() function
def predict(network, x):
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = softmax(a3)

    return y

x, t = get_data()
network = init_network()
batch_size = 100 # Batch quantity
accuracy_cnt = 0
for i in range(0, len(x), batch_size):
 x_batch = x[i:i+batch_size]
 y_batch = predict(network, x_batch)
 p = np.argmax(y_batch, axis=1)
 accuracy_cnt += np.sum(p == t[i:i+batch_size])
print("Accuracy:" + str(float(accuracy_cnt) / len(x)))

explain

>>> list( range(0, 10) )
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list( range(0, 10, 3) )
[0, 3, 6, 9]

>>> x = np.array([[0.1, 0.8, 0.1], [0.3, 0.1, 0.6],[0.2, 0.5, 0.3], [0.8, 0.1, 0.1]])
>>> y = np.argmax(x, axis=1) # argmax() gets the index of the element with the largest value
#This specifies at 100 ×  In the array of 10, find the index of the element with the largest value along the direction of dimension 1 (taking dimension 1 as the axis) (dimension 0 corresponds to dimension 1)
>>> print(y)
[1 2 1 0]

>>> y = np.array([1, 2, 1, 0])
>>> t = np.array([1, 2, 0, 0])
>>> print(y==t)
[True True False True]
>>> np.sum(y==t)
3

3.7 summary

  • The activation function in the neural network uses the smoothly changing sigmoid function or ReLU function. •
  • By skillfully using NumPy multidimensional array, neural network can be realized efficiently. •
  • The problem of machine learning can be divided into regression problem and classification problem.
  • For the activation function of the output layer, the identity function is generally used in the regression problem and the softmax function is generally used in the classification problem. •
  • In the classification problem, the number of neurons in the output layer is set to the number of categories to be classified. •
  • The set of input data is called a batch. By reasoning in batches, high-speed operation can be realized.

3.8 summary (function)

1. Activate function

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG oliuhbhv-1638008230265) (picture / 14.png)]

sigmoid function (processing h())

def sigmoid(x):
 return 1 / (1 + np.exp(-x))

ReLU function (processing h)

def relu(x):
 return np.maximum(0, x)

2. Output layer

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-fq6naoq1-163808230266) (picture / 16.png)]

softmax function

def softmax(a):
 exp_a = np.exp(a)
 sum_exp_a = np.sum(exp_a)
 y = exp_a / sum_exp_a
 return y

Identity function

3. Total function: Example

#The learned weight parameters saved in the pickle file sample_weight.pkl will be read in
#In this file, the weight and offset parameters are saved in the form of dictionary variables.
#Because we assumed that the learning had been completed, the learned parameters were saved. Suppose they were saved in the sample_weight.pkl file,

def init_network():
    with open("sample_weight.pkl", 'rb') as f:
        network = pickle.load(f)
    return network
    
def sigmoid(x):
 return 1 / (1 + np.exp(-x))
 
 
#Classification with predict() function
def predict(network, x):
    W1, W2, W3 = network['W1'], network['W2'], network['W3']
    b1, b2, b3 = network['b1'], network['b2'], network['b3']

    a1 = np.dot(x, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2, W3) + b3
    y = softmax(a3)

    return y

x, t = get_data()
network = init_network()
batch_size = 100 # Batch quantity
accuracy_cnt = 0
for i in range(0, len(x), batch_size):
 x_batch = x[i:i+batch_size]
 y_batch = predict(network, x_batch)
 p = np.argmax(y_batch, axis=1)
 accuracy_cnt += np.sum(p == t[i:i+batch_size])
print("Accuracy:" + str(float(accuracy_cnt) / len(x)))

Topics: Python Deep Learning