MNIST handwritten numeral recognition with PyTorch (latest, very detailed)

Posted by Vidya_tr on Sun, 23 Jan 2022 08:49:39 +0100

introduction

Based on PyTorch framework, this paper uses CNN convolutional neural network to realize MNIST handwritten numeral recognition, which runs only on CPU.

Four structures, Linear pure Linear layer, CNN convolutional neural network, Inception network and Residual residual network, have been used to recognize handwritten digits from MNIST data sets, and their recognition accuracy has been compared and analyzed. (the other three have not been released yet)

After watching the video of the boss of station B, and just a course experiment of in-depth learning is handwritten numeral recognition, record this blog.
This article is written by the author word by word. I believe it has been very detailed and can be used for beginners. You are welcome to put forward your opinions and improvements!

Import package:

import torch
import numpy as np
from matplotlib import pyplot as plt
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
import torch.nn.functional as F

1, Dataset (MNIST)

MNIST data set is a very classic data set in the field of machine learning. It is composed of 60000 training samples and 10000 test samples. Each sample is a 28 * 28 pixel gray handwritten numeral picture.

Download:
Official website: http://yann.lecun.com/exdb/mnist/
Network disk address: MNIST dataset (extraction code: zm7q)

There are 4 files in total, including training set, training set label, test set and test set label

File namesizecontent
train-labels-idx1-ubyte.gz9,681 kb55000 training sets and 5000 verification sets
train-labels-idx1-ubyte.gz29 kbTags corresponding to training set pictures
t10k-images-idx3-ubyte.gz1,611 kb10000 test sets
t10k-labels-idx1-ubyte.gz5 kbLabel corresponding to the test set picture

1.1 reading MNIST dataset

The data downloaded directly cannot be opened through decompression or application program, because these files are not in any standard image format, but stored in bytes, so you must write a program to open it.

torchvision. The datasets package already contains MNIST datasets, which can be obtained by entering code in the compiler. The steps are as follows:

  • Step 1: normalization, softmax normalization exponential function( https://blog.csdn.net/lz_peter/article/details/84574716 ), where 0.1307 is the mean and 0.3081 is the std standard deviation
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
  • Step 2: Download / obtain the data set, where root is the storage path of the data set, train=True is the training set, otherwise it is the test set.
train_dataset = datasets.MNIST(root='./data/mnist', train=True, download=True, transform=transform)  
test_dataset = datasets.MNIST(root='./data/mnist', train=False, download=True, transform=transform)  # train=True training set, = False test set
  • Step 3: after instantiating a dataset, package it with Dataloader, that is, load the dataset. Batch here_ Size is a super parameter, see section 5 for details; shuffle=True means to disrupt the data set. Here, we disrupt the training set for training and test the test set sequentially.
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

1.2 display MNIST dataset:

Here are 12 pictures, including picture content and labels.

fig = plt.figure()
for i in range(12):
    plt.subplot(3, 4, i+1)
    plt.tight_layout()
    plt.imshow(train_dataset.train_data[i], cmap='gray', interpolation='none')
    plt.title("Labels: {}".format(train_dataset.train_labels[i]))
    plt.xticks([])
    plt.yticks([])
plt.show()

Output results:

2, Building models (CNN)

2.1 convolution

The number of channels of each convolution kernel shall be the same as the number of input channels, and the total number of convolution kernels shall be the same as the number of output channels.
After convolution, C(Channels) changes, W(width) and H(Height) are variable and invariable, depending on the padding.

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)

Parameters:

  • in_channels: input channels
  • out_channels: output channels
  • kernel_size: convolution kernel size
  • Stripe: step size
  • padding: filling

2.2 activation layer

The activation layer uses the ReLU activation function.
Linear rectified unit (relu), also known as modified linear unit, is an activation function commonly used in artificial neural networks. It usually refers to the nonlinear function represented by slope function and its variants.

torch.nn.ReLU()

2.3 pool layer

The pooling layer adopts maximum pooling.
After pooling, C(C.hannels) remained unchanged, W(width) and H(Height) changed.

torch.nn.MaxPool2d(input, kernel_size, stride, padding)

Parameters:

  • Input: input
  • kernel_size: convolution kernel size
  • Stripe: step size
  • padding: filling

2.4 full connection layer

Before, the convolution layer requires that the input and output are four-dimensional tensors (B,C,W,H), while the input and output of the full connection layer are two-dimensional tensors (B,Input_feature). After convolution, activation and pooling, use view to flatten and enter the full connection layer.

2.5 CNN model

The model is shown in the figure:

For example, if you input an image with handwritten numeral "5", its dimension is (batch,1,28,28), that is, the height and width of a single channel are 28 pixels respectively.

  • Firstly, a convolution kernel is 5 × 5, the number of channels is changed from 1 to 10, and the height and width are 24 pixels respectively;
  • Then it is 2 through a convolution kernel × 2, the number of channels remains unchanged, and the height and width become half, that is, the dimension becomes (batch,10,12,12);
  • Then through a convolution kernel, it is 5 × 5, the number of channels is changed from 10 to 20, and the height and width are 8 pixels respectively;
  • Then through a convolution kernel, it is 2 × 2, the number of channels remains unchanged, and the height and width become half, that is, the dimension becomes (batch,20,4,4);
  • Then flatten the view so that its dimension becomes 320 (2044), enter the full connection layer, and output it into 10 categories with linear function, i.e. "0-9" 10 numbers.

code:

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 10, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.conv2 = torch.nn.Sequential(
            torch.nn.Conv2d(10, 20, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(320, 50),
            torch.nn.Linear(50, 10),
        )

    def forward(self, x):
        batch_size = x.size(0)
        x = self.conv1(x)  # One layer of convolution layer, one layer of pooling layer and one layer of activation layer (the figure shows convolution before activation and then pooling, with little difference)
        x = self.conv2(x)  # Once more
        x = x.view(batch_size, -1)  # The input required for flatten to become a fully connected network (batch, 20,4,4) = > (batch, 320), - 1 the automatic calculation here is 320
        x = self.fc(x)
        return x  # The last output is the dimension of 10, that is (corresponding to 0 ~ 9 of mathematical symbols)

Instantiated model:

model = Net()

3, Loss function and optimizer

The loss function uses cross entropy loss
Parameter optimization uses random gradient descent

criterion = torch.nn.CrossEntropyLoss()  # Cross entropy loss
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)  # lr learning rate, momentum impulse

4, Define training wheel and test wheel

4.1 training round

  • Step 1: forward propagation

  • Step 2: feedback propagation

  • Step 3: update

Training wheel code:

def train(epoch):
    running_loss = 0.0  # This clears the loss of the entire epoch
    running_total = 0
    running_correct = 0
    for batch_idx, data in enumerate(train_loader, 0):
        inputs, target = data
        optimizer.zero_grad()

        # forward + backward + update
        outputs = model(inputs)
        loss = criterion(outputs, target)

        loss.backward()
        optimizer.step()

        # Add up the loss in operation to divide the following 300 times
        running_loss += loss.item()
        # Calculate the running accuracy acc
        _, predicted = torch.max(outputs.data, dim=1)
        running_total += inputs.shape[0]
        running_correct += (predicted == target).sum().item()

        if batch_idx % 300 == 299:  # Don't want to lose every time, waste time, choose an average loss every 300 times, and accuracy
            print('[%d, %5d]: loss: %.3f , acc: %.2f %%'
                  % (epoch + 1, batch_idx + 1, running_loss / 300, 100 * running_correct / running_total))
            running_loss = 0.0  # The loss of this small batch of 300 is cleared
            running_total = 0
            running_correct = 0  # This small batch of 300 acc is cleared

4.2 test wheel

The test set does not need to calculate the gradient (no feedback), first from test_ Read the pictures and labels every time in the loader, and predict the accuracy of each round after feedforward operation
Test wheel code:

def test():
    correct = 0
    total = 0
    with torch.no_grad():  # Test sets do not count gradients
        for data in test_loader:
            images, labels = data
            outputs = model(images)
            _, predicted = torch.max(outputs.data, dim=1)  # dim = 1, column is the 0th dimension, and row is the 1st dimension. Find 1 along the row (1st dimension) Maximum and 2 Subscript of maximum value
            total += labels.size(0)  # Comparison operation between tensors
            correct += (predicted == labels).sum().item()
    acc = correct / total
    print('[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch+1, EPOCH, 100 * acc))  # Find the accuracy of the test, correct number / total number
    return acc

5, Start training

Hyperparameters: the hyperparameters used mainly include batch size of small batch data, learning rate and momentum used in gradient descent algorithm, and 10 rounds of training are defined at the same time.

# super parameters
batch_size = 64
learning_rate = 0.01
momentum = 0.5
EPOCH = 10

Main function: a total of 10 rounds of training: one test for each round of training.

if __name__ == '__main__':
    acc_list_test = []
    for epoch in range(EPOCH):
        train(epoch)
        # if epoch % 10 == 9:  #Test once every 10 rounds of training
        acc_test = test()
        acc_list_test.append(acc_test)

    plt.plot(acc_list_test)
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy On TestSet')
    plt.show()

6, Results and analysis

The following table shows the input results of loss value and recognition accuracy on training set and test set.

It can be seen that a total of 10 rounds of training and testing are carried out: in each round of training, the loss value and accuracy are output every 300 small batches of data; Test once after each round of training and print its accuracy on the test set.
After 10 rounds, the average recognition accuracy on the training set reaches 98.88%, and the accuracy on the test set reaches 99%, of which the accuracy on the test set is shown in the figure below.

[1, 300]: loss: 0.820 , acc: 76.82 %
[1, 600]: loss: 0.237 , acc: 93.01 %
[1, 900]: loss: 0.152 , acc: 95.35 %
Accuracy on test set: 96.4 %
[2, 300]: loss: 0.126 , acc: 96.27 %
[2, 600]: loss: 0.109 , acc: 96.77 %
[2, 900]: loss: 0.094 , acc: 97.15 %
Accuracy on test set: 97.6 %
[3, 300]: loss: 0.084 , acc: 97.55 %
[3, 600]: loss: 0.080 , acc: 97.49 %
[3, 900]: loss: 0.075 , acc: 97.64 %
Accuracy on test set: 97.7 %
[4, 300]: loss: 0.072 , acc: 97.85 %
[4, 600]: loss: 0.066 , acc: 98.08 %
[4, 900]: loss: 0.060 , acc: 98.16 %
Accuracy on test set: 98.3 %
[5, 300]: loss: 0.058 , acc: 98.21 %
[5, 600]: loss: 0.060 , acc: 98.23 %
[5, 900]: loss: 0.055 , acc: 98.31 %
Accuracy on test set: 98.5 %
[6, 300]: loss: 0.047 , acc: 98.57 %
[6, 600]: loss: 0.054 , acc: 98.29 %
[6, 900]: loss: 0.053 , acc: 98.39 %
Accuracy on test set: 98.6 %
[7, 300]: loss: 0.048 , acc: 98.61 %
[7, 600]: loss: 0.044 , acc: 98.58 %
[7, 900]: loss: 0.049 , acc: 98.54 %
Accuracy on test set: 98.7 %
[8, 300]: loss: 0.045 , acc: 98.77 %
[8, 600]: loss: 0.043 , acc: 98.60 %
[8, 900]: loss: 0.043 , acc: 98.72 %
Accuracy on test set: 98.7 %
[9, 300]: loss: 0.040 , acc: 98.78 %
[9, 600]: loss: 0.037 , acc: 98.86 %
[9, 900]: loss: 0.042 , acc: 98.73 %
Accuracy on test set: 98.8 %
[10, 300]: loss: 0.038 , acc: 98.84 %
[10, 600]: loss: 0.034 , acc: 98.98 %
[10, 900]: loss: 0.037 , acc: 98.88 %
Accuracy on test set: 99.0 %

7, Complete code

import torch
import numpy as np
from matplotlib import pyplot as plt
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
import torch.nn.functional as F

"""
Convolution operation use mnist Data sets, and 10-4,11 Similar, just here: 1.Output of training wheel acc 2.Use on Model torch.nn.Sequential
"""
# Super parameter ------------------------------------------------------------------------------------
batch_size = 64
learning_rate = 0.01
momentum = 0.5
EPOCH = 10

# Prepare dataset ------------------------------------------------------------------------------------
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
# softmax normalized exponential function( https://blog.csdn.net/lz_peter/article/details/84574716 ), where 0.1307 is the mean and 0.3081 is the std standard deviation

train_dataset = datasets.MNIST(root='./data/mnist', train=True, transform=transform)  # If there is no local, add download=True
test_dataset = datasets.MNIST(root='./data/mnist', train=False, transform=transform)  # train=True training set, = False test set
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

fig = plt.figure()
for i in range(12):
    plt.subplot(3, 4, i+1)
    plt.tight_layout()
    plt.imshow(train_dataset.train_data[i], cmap='gray', interpolation='none')
    plt.title("Labels: {}".format(train_dataset.train_labels[i]))
    plt.xticks([])
    plt.yticks([])
plt.show()


# The training set is out of order, and the test set is in order
# Design model using class ------------------------------------------------------------------------------
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 10, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.conv2 = torch.nn.Sequential(
            torch.nn.Conv2d(10, 20, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(320, 50),
            torch.nn.Linear(50, 10),
        )

    def forward(self, x):
        batch_size = x.size(0)
        x = self.conv1(x)  # One layer of convolution layer, one layer of pooling layer and one layer of activation layer (the figure shows convolution before activation and then pooling, with little difference)
        x = self.conv2(x)  # Once more
        x = x.view(batch_size, -1)  # The input required for flatten to become a fully connected network (batch, 20,4,4) = > (batch, 320), - 1 the automatic calculation here is 320
        x = self.fc(x)
        return x  # The last output is the dimension of 10, that is (corresponding to 0 ~ 9 of mathematical symbols)


model = Net()


# Construct loss and optimizer ------------------------------------------------------------------------------
criterion = torch.nn.CrossEntropyLoss()  # Cross entropy loss
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)  # lr learning rate, momentum impulse


# Train and Test CLASS --------------------------------------------------------------------------------------
# Encapsulate a single round of a ring in a function class
def train(epoch):
    running_loss = 0.0  # This clears the loss of the entire epoch
    running_total = 0
    running_correct = 0
    for batch_idx, data in enumerate(train_loader, 0):
        inputs, target = data
        optimizer.zero_grad()

        # forward + backward + update
        outputs = model(inputs)
        loss = criterion(outputs, target)

        loss.backward()
        optimizer.step()

        # Add up the loss in operation to divide the following 300 times
        running_loss += loss.item()
        # Calculate the running accuracy acc
        _, predicted = torch.max(outputs.data, dim=1)
        running_total += inputs.shape[0]
        running_correct += (predicted == target).sum().item()

        if batch_idx % 300 == 299:  # Don't want to lose every time, waste time, choose an average loss every 300 times, and accuracy
            print('[%d, %5d]: loss: %.3f , acc: %.2f %%'
                  % (epoch + 1, batch_idx + 1, running_loss / 300, 100 * running_correct / running_total))
            running_loss = 0.0  # The loss of this small batch of 300 is cleared
            running_total = 0
            running_correct = 0  # This small batch of 300 acc is cleared

        # torch.save(model.state_dict(), './model_Mnist.pth')
        # torch.save(optimizer.state_dict(), './optimizer_Mnist.pth')


def test():
    correct = 0
    total = 0
    with torch.no_grad():  # Test sets do not count gradients
        for data in test_loader:
            images, labels = data
            outputs = model(images)
            _, predicted = torch.max(outputs.data, dim=1)  # dim = 1, column is the 0th dimension, and row is the 1st dimension. Find 1 along the row (1st dimension) Maximum and 2 Subscript of maximum value
            total += labels.size(0)  # Comparison operation between tensors
            correct += (predicted == labels).sum().item()
    acc = correct / total
    print('[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch+1, EPOCH, 100 * acc))  # Find the accuracy of the test, correct number / total number
    return acc


# Start train and Test --------------------------------------------------------------------------------------
if __name__ == '__main__':
    acc_list_test = []
    for epoch in range(EPOCH):
        train(epoch)
        # if epoch % 10 == 9:  #Test once every 10 rounds of training
        acc_test = test()
        acc_list_test.append(acc_test)

    plt.plot(acc_list_test)
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy On TestSet')
    plt.show()

reference material:
https://www.bilibili.com/video/BV1Y7411d7Ys?p=10

Topics: Machine Learning Pytorch Deep Learning