Beginner's first lesson, what is deep learning

Posted by zirius on Sun, 19 Dec 2021 22:12:54 +0100

Elementary neural network

The problem we want to solve here is to convert the gray image of handwritten digits (28 pixels) × 28 pixels) into 10 categories (0 ~ 9). We will use the MNIST dataset, which contains 60000 training images and 10000 test images.

Step 1: prepare data

1. The Minist data set contains 60000 training sets and 10000 test data sets. It is divided into picture and label. The picture is a 28 * 28 pixel matrix, and the label is 0 ~ 9, a total of 10 numbers. 2. Define the train for reading MNIST dataset_ Reader and test_reader, which specifies the size of a Batch as 128, that is, 128 images are trained or verified at a time. 3.paddle. dataset. mnist. The train () or test() interface has carried out gray processing, normalization, centering and other processing for us.

#Import required packages
import numpy as np    
import paddle as paddle
import paddle.fluid as fluid
from PIL import Image   #Call the image processing library, including the image class
import matplotlib.pyplot as plt
import os

train_reader = paddle.batch(paddle.reader.shuffle(paddle.dataset.mnist.train(),
                                                  buf_size=512),
                    batch_size=128)
#train_reader is a data provider for training
test_reader = paddle.batch(paddle.dataset.mnist.test(),
                           batch_size=128)
#test_reader is a data provider for testing
#paddle.batch() means every batch_size forms a batch

Supplementary notes:

  1. Numpy: an open source numerical calculation extension package for Python. as means an alias for convenience after import.
  2. paddle is a deep learning framework in which fluid is used Data creates a data variable.    
    import paddle.fluid as fluid
    
    # Define a two-dimensional data variable x with data type int64. The first dimension of X is 3, and the second dimension is unknown, which can be determined during program execution. Therefore, the shape of X can be specified as [3, None]
    x = fluid.data(name="x", shape=[3, None], dtype="int64")
    
    # Most networks organize data in batch mode. The batch size is uncertain when defining, so the dimension of batch (usually the first dimension) can be specified as None
    batched_x = fluid.data(name="batched_x", shape=[None, 3, None], dtype='int64')
    

    Use fluid layers. fill_ Constant to create a constant

    import paddle.fluid as fluid
    data = fluid.layers.fill_constant(shape=[3, 4], value=16, dtype='int64')
    

                                                                              

  3. Matplotlib.plt is object-oriented drawing

  4. PaddlePaddle provides the interface to read the MINST training set and the test set, respectively, paddle.. dataset. mnist. Train () and pad dataset. mnist. test(). paddle.reader.shuffle() indicates the buf per cache_ Size data items and disrupt them.

Print it and look at the mnist dataset

temp_reader = paddle.batch(paddle.dataset.mnist.train(),
                           batch_size=1)
temp_data=next(temp_reader())#The handwritten digital data image of 28 * 28 is converted into vector form for storage, and the vector of 784 is obtained
print(temp_data)

Step 2: configure network

The following code is to define a simple multi-layer perceptron. There are three layers in total, two hidden layers with a size of 100 and an output layer with a size of 10. Because MNIST data set is handwritten gray-scale images from 0 to 9 and there are 10 categories, the final output size is 10. The activation function of the last output layer is Softmax, so the last output layer is equivalent to a classifier. If an input layer is added, the structure of the multilayer perceptron is: input layer -- > > hidden layer -- > > hidden layer -- > > output layer.                                                     

# Define multilayer perceptron
def multilayer_perceptron(input):
    # For the first full connection layer, the activation function is ReLU
    hidden1 = fluid.layers.fc(input=input, size=100, act='relu') #Constructing a full connection layer in neural network
    # For the second full connection layer, the activation function is ReLU
    hidden2 = fluid.layers.fc(input=hidden1, size=100, act='relu')
    # The full connection output layer with softmax as the activation function has a size of 10
    prediction = fluid.layers.fc(input=hidden2, size=10, act='softmax')
    return prediction

Define the input layer, and the input is image data. The image is a 28 * 28 grayscale image, so the input shape is [1, 28, 28]. If the image is a 32 * 32 color image, the input shape is [3. 32, 32], because the grayscale image has only one channel, while the color image has three RGB channels.

# Define input / output layer
image = fluid.layers.data(name='image', shape=[1, 28, 28], dtype='float32')  #Single channel, 28 * 28 pixel value
label = fluid.layers.data(name='label', shape=[1], dtype='int64')            #Picture label

Supplementary notes:

  1. About padding fluid. data

paddle.fluid.data() is an OP (operator), which is used to create a global variable that can be accessed by operators in the calculation diagram and used as a placeholder for data input.
Name is paddle fluid. The name of the global variable created by data() is the prefix identification of the input layer output.   
shape declares padding fluid. Dimension information of the global variable created by data().  
None in the shape indicates the number of elements that are uncertain about the dimension, which will be determined during program execution.  
- 1 in the shape can only be at the front of the shape, indicating that it can adapt to any batch size
dtype is a pad fluid. The data type of the global variable created by data(), which supports {bool,float16,float32,float64,int8,int16,int32,int64.    
The data of the user feed must be the same as that of the pad fluid. The variables created by data () have the same shape. Although the data type of the feed is unsigned Byte, the softmax regression needs floating-point operation, so the data type is converted to float32

2. About padding fluid. layers. fc

paddle.fluid.layers.fc() is an OP, which is used to establish a full connection layer. Create a weight variable for each input Tensor, that is, a fully connected weight matrix from each input unit to each output unit.  
The FC layer multiplies each input Tensor and its corresponding weights to obtain a shape of [M,size] output Tensor, where m is batch_size. If there are multiple input tensors, the calculation results of tensors with multiple shapes of [M,size] will be accumulated as the final output.


Here, call the defined network to obtain the classifier:

# Get classifier
model = multilayer_perceptron(image)

Then, the loss function is defined. This time, the cross entropy loss function is used, which is commonly used in classification tasks. After defining a loss function, you can also average it, because it defines the loss value of a Batch. At the same time, we can also define an accuracy function, which can output the classification accuracy during our training.

# Get loss function and accuracy function
cost = fluid.layers.cross_entropy(input=model, label=label)  #The cross entropy loss function is used to describe the difference between the real sample label and the prediction probability
#Loss of one sample
avg_cost = fluid.layers.mean(cost)
#Average loss of a batch
acc = fluid.layers.accuracy(input=model, label=label)

Next, we define the optimization method. This time, we use the Adam optimization method, and specify the learning rate as 0.001.

# Define optimization method
optimizer = fluid.optimizer.AdamOptimizer(learning_rate=0.001)   #Optimization using Adam algorithm
opts = optimizer.minimize(avg_cost)

Step3: model training & Step4: model evaluation

Then define a parser and initialization parameters

# Define a parser that uses CPU
place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Parameter initialization
exe.run(fluid.default_startup_program())

Supplementary notes:

  1. When CPUPlace() is used, the CPU is used. If CUDAPlace() is used, the GPU is used.  
  2. Only the parser can execute the program. There are two programs by default: default_startup_program() and default_main_program()
  3. default_startup_program() defines various operations such as model parameter initialization, optimizer parameter initialization, reader initialization, etc.
  4. default_main_program() defines various operations such as neural network model, forward and reverse calculation, model parameter update, optimizer parameter update and so on.

The entered data dimension is the label corresponding to the image data and the image. Each category of image must correspond to a label, which is an integer value incremented from 0.

# Define input data dimensions
feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
#The place parameter indicates that data such as numpy array passed in from Python should be converted to lodsensor on GPU or CPU
#feed_ The list parameter is a list of variables
#Data feed for data format conversion

Finally, we can start training. We train 5 passes this time. In the above, we have defined a function to calculate the accuracy rate, so we let it output the current accuracy rate during training. The principle of calculating the accuracy rate is very simple, that is, compare the predicted result of training with the real value to calculate the accuracy rate. After each Pass training, conduct another test, use the test set to test, and calculate the average value of current Cost and accuracy.

# Start training and testing
for pass_id in range(5):  #  pass_id 0 to 4
    # Conduct training
    for batch_id, data in enumerate(train_reader()):                        #Traverse train_reader
        train_cost, train_acc = exe.run(program=fluid.default_main_program(),#Run the main program
                                        feed=feeder.feed(data),             #Feed data to the model
                                        fetch_list=[avg_cost, acc])         #fetch Error, accuracy, #fetch_list is the variable or named result that the user wants
        # Print information error and accuracy every 100 batches
        if batch_id % 100 == 0:
            print('Pass:%d, Batch:%d, Cost:%0.5f, Accuracy:%0.5f' %
                  (pass_id, batch_id, train_cost[0], train_acc[0]))

    # Test
    test_accs = []
    test_costs = []
    #One test per training round
    for batch_id, data in enumerate(test_reader()):                         #Traversal test_reader
        test_cost, test_acc = exe.run(program=fluid.default_main_program(), #Perform training procedures
                                      feed=feeder.feed(data),               #Feed data
                                      fetch_list=[avg_cost, acc])           #fetch error and accuracy
        test_accs.append(test_acc[0])                                       #Record the accuracy of each batch
        test_costs.append(test_cost[0])                                     #Record the error of each batch
    # Average the test results
    test_cost = (sum(test_costs) / len(test_costs))                         #Average error per round
    test_acc = (sum(test_accs) / len(test_accs))                            #Average accuracy per round
    print('Test:%d, Cost:%0.5f, Accuracy:%0.5f' % (pass_id, test_cost, test_acc))
    
    #Save model
    model_save_dir = "/home/aistudio/data/hand.inference.model"
    # Create if the save path does not exist
    if not os.path.exists(model_save_dir):
        os.makedirs(model_save_dir)
    print ('save models to %s' % (model_save_dir))
    fluid.io.save_inference_model(model_save_dir,  #Path to save inference model
                                  ['image'],      #inference requires the data of the feed
                                  [model],        #The Variables that hold the inference results
                                  exe)            #The executor saves the information model

Supplementary notes:

  1. for i,b... In enumerate (a) mode, I and B variables need to be assigned at the same time. I is assigned as a current element, as shown in the following table, and B is assigned as a current element.
  2. paddle.fluid.io.save_inference_model(dirname, feeded_var_names, target_vars, executor, main_program=None, model_filename=None, params_filename=None, export_for_deployment=True, program_only=False)
  3. append adding a new object at the end of the list will modify the original list.

Step5: model prediction

Before prediction, the image should be preprocessed in the same way as during training. First, grayscale, then compress the image size to 28 * 28, then convert the image into one-dimensional vector, and finally normalize the one-dimensional vector.

# Preprocess pictures
def load_image(file):
    im = Image.open(file).convert('L')                        #Convert RGB into gray image, L represents gray image, and the pixel value of gray image is between 0 ~ 255
    im = im.resize((28, 28), Image.ANTIALIAS)                 #Resize image with high quality the image size is 28 * 28
    im = np.array(im).reshape(1, 1, 28, 28).astype(np.float32)#Returns an array of new shapes, turning it into a numpy array to match the data feed format.
   # print(im)
    im = im / 255.0 * 2.0 - 1.0                               #Normalized to [- 1 ~ 1]
    print(im)
    return im

#Use the Matplotlib tool to display this image.
img = Image.open('data/data27012/6.png')
plt.imshow(img)   #Draw an image from an array
plt.show()        #Display image

Supplement · note:

  1. Image. The Img data type obtained by open () is an image object
  2. img.resize((width, height),Image.ANTIALIAS)
    The first parameter: width,height, indicates to set the width and height of the incoming picture.
    Second parameter:

    Image.NEAREST: low quality
    Image.BILINEAR: bilinear
    Image. Cubic: cubic spline interpolation
    Image.ANTIALIAS: high quality
  3. The astype function is used for numeric type conversion in array
     

#Create executor for prediction
infer_exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()  #Used to get a new scope

Finally, the image is converted into a one-dimensional vector and predicted, and the data is transferred from the image in the feed. fetch_ The value of list is the last classifier of the network model, so the output result is the probability value of 10 Tags, and the sum of these probability values is 1.

Start prediction

Through fluid io. load_ inference_ Model, the predictor will start from params_ Read the trained model in dirname (model_save_dir) to predict the data never encountered.

# Load data and start prediction
with fluid.scope_guard(inference_scope):   #fluid. scope_ The guard interface can switch to a specified scope through the With statement.
    #Get the trained model
    #Load the inference model(inference model) from the specified directory
    [inference_program,                                           #Reasoning Program
     feed_target_names,                                           #Is a str list that contains the names of variables that need to provide data in the inference Program. 
     fetch_targets] = fluid.io.load_inference_model(model_save_dir,#fetch_targets: is a list of variables from which we can get inference results. model_save_dir: path to save the model
                                                    infer_exe)     #infer_exe: run the executor of the information model
    img = load_image('data/data27012/6.png')

    results = exe.run(program=inference_program,     #Run speculator
                   feed={feed_target_names[0]: img}, #Feed img to be predicted
                   fetch_list=fetch_targets)         #Get the speculation,   
  1. Supplementary note: fludi io. load_ inference_ The model returns a tuple of three elements.

After getting the probability value of each tag, we need to get the tag with the highest probability and print it out.

# Obtain the label with the highest probability
lab = np.argsort(results)                               #The argsort function returns the index value of the result array value from small to large
#print(lab)
print("The prediction result of this picture label by: %d" % lab[0][0][-1])  #-1 stands for reading the penultimate column in the array