[Pytorch] MNIST Image Classification Code-Ultra-detailed Interpretation

Posted by zackcez on Mon, 06 Dec 2021 18:21:17 +0100

[Pytorch] MNIST Image Classification Code-Ultra-detailed Interpretation

Preface

Recently, machine learning has increased dramatically among junior undergraduates, and the editor often sees his little buddy gnawing books in the study room. However, due to the lack of experience and guidance, the principle may be clear, but because many books do not introduce detailed functions, and so on, many little partners only know a little about the specific code. This article is based on Pytorch, the most popular learning framework, to explain in detail the most basic image classification in image classification - MNIST dataset classification.
At the same time, I hope this article will help you understand the basic ideas of in-depth learning.

Before reading this article, you need to master basic in-depth learning knowledge (including CNN (Convolutional Neural Network). If you have no or weak base, at least familiarize yourself with this table order first:

Recommendation orderWebsite
1How do I implement a neural network from zero by myself? Answer to Quantum Bits-Knowledge
2Convolutional Neural Network CNN Complete Guide Final Edition (1) Learn-obsessed Cakes Article-Know
3Convolutional Neural Network CNN Complete Guide Final Edition (2) Learn-obsessed Cake Articles-Know

Once you're familiar with the above articles, you can start reading this blog!

If you encounter any problems with the Pytorch framework code in this article, you can query it first Quick Manual - Pytorch website Related content. If not found, please Pytorch website Query.

1. Code Framework

Below is my favorite code framework for reference.

Filename: model.py

1. Introducing packages
2. Set up related parameters
3. Processing datasets
- Define transform
- Importing datasets
- Load (DataLoader)
- Preview (optional)
4. Build a network
5. Training
6. Save the model

2. Implementation Code

1. Introducing packages

The code is as follows:

import torch
import torch.nn as nn
from torch.nn import Sequential
from matplotlib import pyplot as plt
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
from torchvision.utils import make_grid
import torchvision.transforms as transforms
Package Namefunction
torchCore Package
torch.nnModules containing neural networks, packages for inheritance, and some function methods (nn.functional)
torchvisionContains some datasets, models, image processing methods
torch.utilsA Toolkit
matplotlibUsed to display dataset pictures

2. Set up related parameters

epochs = 10
batch_size = 64
lr = 0.001
parameterSignificance
epochsRounds trained
batch_sizeSize of each batch, that is, the amount of data for each iteration of training
lrLearning rate is the learning rate. Usually with very small values

Here's a more detailed explanation of epochs and batch_szie:
->batch_ Size represents the amount of data for each training iteration;
->epochs means a few rounds of training.

Each iteration (Iteration) is a weight update, each weight update requires batch_ The loss function is obtained by Forward operation on the size data, and the parameters are updated by Backward (note that the gradient needs to be set to 0 during this process, which will follow). An iteration equals using batch_size samples are trained once. For example, there are 256 sample data, complete training of these sample data requires:
->batch_size=64;
->4 iterations;
->epochs=1.

Normally epochs are set to more than once, which is the same as grinding flour, where one round is not enough and more rounds are needed to produce finer flour.

3. Processing datasets

# Set up data conversion
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert data to Tensor
    transforms.Normalize(  # Standardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1
        mean=[0.5, ],  # Expect
        std=[0.5, ]  # standard deviation
    )
])

# Training Set Import
data_train = datasets.MNIST(root='data/', transform=transform, train=True, download=True)
# Data Set Import
data_test = datasets.MNIST(root='data/', transform=transform, train=False)

# Data Loading
# Training Set Loading
dataloader_train = DataLoader(dataset=data_train, batch_size=64, shuffle=True)
# Data Set Loading
dataloader_test = DataLoader(dataset=data_test, batch_size=64, shuffle=True)

In addition to the comments in the code, some of the methods or parameters in this code are explained below.

For transform:

parameterSignificance
transforms.ToTensor()Convert data to Tensor
transforms.NormalizeStandardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1
meanExpect
stdstandard deviation

For datasets.MNIST:

parameterSignificance
rootDataset (here MNIST) path
transformConversion Form
trainWhether to train or not. For training set, train=True, for test set, train=False
downloadWhether or not to download (automatically determines if you have downloaded or if the dataset exists under root, if you do, it will not be downloaded when you retrain)

For DataLoader:

parameterSignificance
datasetDataset to process
batch_sizeBatch size
shuffleIs the data out of order

Preview (optional)

# Data Preview
images, labels = next(iter(dataloader_train))
img = make_grid(images)
img = img.numpy().transpose(1, 2, 0)
mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]
img = img * std + mean
print([labels[i] for i in range(16)])
plt.imshow(img)
plt.show()
MethodEffect
iter(dataloader_train)Generate dataloader_ Iterator for train
nextReturns the next item of the iterator (used with iter()
make_gridGenerate Grid
img.numpy().transpose(1, 2, 0)Converts the C, W, H positions of the numpy array matrix of img. 1, 2, 0 in parentheses means to shift the original position of 1, 2, 0 to 0, 1, 2 (that is, to convert the original [C, W, H] matrix to [H, W, C]) matrix. The data format used in Pytorch is inconsistent with the format of the plt.imshow() function, in Pytorch it is [C, H, W], and in plt.imshow(), it is [H, W, C]. Where C=Channel is the color channel; H=Height, image length; Width, picture width)
plt.imshow(img) and plt.show()display picture

4. Build a network

# Constructing Convolution Neural Network
class CNN(nn.Module):  # Inherit from parent nn.Module
    def __init__(self):  # Constructor equivalent to C++.
        # The super() function is a method of calling a parent class (superclass) to solve the problem of multiple inheritance
        super(CNN, self).__init__()
        
        # The first convolution layer. Sequential brackets indicate the action to be performed
        self.conv1 = Sequential(
            nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        # Second convolution layer
        self.conv2 = Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        # Full Connection Layer (Dense, Dense Connection Layer)
        self.dense = Sequential(
            nn.Linear(7 * 7 * 128, 1024),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(1024, 10)
        )

    def forward(self, x):  # Forward Propagation
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x = x2.view(-1, 7 * 7 * 128)
        x = self.dense(x)
        return x

In addition to the comments in the code, some of the methods or parameters in this code are explained as follows:

Method or parameterMeaning or function
nn.Conv2dConvolution of two-dimensional images. Where in_channels represent the input channel, out_channels represent the output channel, kernel_ The size of the convolution core is represented by size (n * n), stride is represented by the step of the convolution core movement, and padding is represented by the filling size (which belongs to the basic content). Please Baidu yourself for details.
nn.BatchNorm2dBatch Normalization (BN), batch standardization. Make a batch of Feature Map s satisfy a normal distribution with a mean of 0 and a variance of 1. Effect: Accelerate convergence; Controls over-fitting to use little or no Dropout and regularity; Lowering the network is insensitive to initialization weights; Allow a higher learning rate
nn.ReLUA common activation function, not to be overlooked
nn.MaxPool2dMaximum pooling of 2-D images without further discussion
nn.LinearLinear processing of data into a one-dimensional tensor
nn.DropoutDropout, to prevent overfitting, not to mention
x2.view(-1, 7 * 7 * 128)Flattening the parameters so that the parameter dimensions of the full join layer output match their input dimensions

5. Training

See the code comments for explanations.

# Training and parameter optimization

# Define derivative function
def get_Variable(x):
    x = torch.autograd.Variable(x)  # Automatic derivation of Pytorch
    
    # Determine if a GPU is available
    return x.cuda() if torch.cuda.is_available() else x
    
# Define Network
cnn = CNN()

# Determine if a GPU is available to speed up training
if torch.cuda.is_available():
    cnn = cnn.cuda()

# Set the loss function to CrossEntropyLoss (Cross Entropy Loss Function)
loss_F = nn.CrossEntropyLoss()

# Set optimizer to Adam optimizer
optimizer = torch.optim.Adam(cnn.parameters(), lr=lr)

# train
for epoch in range(epochs):
    running_loss = 0.0  # Loss of an epoch
    running_correct = 0.0  # accuracy rate
    print("Epoch [{}/{}]".format(epoch, epochs))
    for data in dataloader_train:
        # The DataLoader return value is an image within a batch and the corresponding label
        X_train, y_train = data
        X_train, y_train = get_Variable(X_train), get_Variable(y_train)
        outputs = cnn(X_train)
        _, pred = torch.max(outputs.data, 1)
        # The latter parameter represents reducing the dimension of outputs.data by one dimension before outputting
        # The first return value is the maximum value in the tensor, and the second is the maximum value index
        # -------------------------- The following is similar to the random gradient descent-------------------------------------------------------------------------------------------------
        optimizer.zero_grad()
        # Gradient Zero
        loss = loss_F(outputs, y_train)
        # Seek loss
        loss.backward()
        # Reverse Propagation
        optimizer.step()
        # Update all gradients
        # ---------------------------------------------------------------------------------
        running_loss += loss.item()  # Here item() means to return the loss value for each time
        running_correct += torch.sum(pred == y_train.data)
        
    testing_correct = 0.0
    
    for data in dataloader_test:
        X_test, y_test = data
        X_test, y_test = get_Variable(X_test), get_Variable(y_test)
        outputs = cnn(X_test)
        _, pred = torch.max(outputs, 1)
        testing_correct += torch.sum(pred == y_test.data)
        # print(testing_correct)
    print("Loss: {:.4f}  Train Accuracy: {:.4f}%  Test Accuracy: {:.4f}%".format(
        running_loss / len(data_train), 100 * running_correct / len(data_train),
        100 * testing_correct / len(data_test)))

6. Save the model

torch.save(cnn, 'data/model.pth')  # Save the model to the data folder in the current directory, named model.pth

Congratulations! If you do this, all the steps of the training will be completed!

The complete MNIST image recognition code is as follows:

import torch
import torch.nn as nn
from torch.nn import Sequential
from matplotlib import pyplot as plt
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
from torchvision.utils import make_grid
import torchvision.transforms as transforms

epochs = 10
batch_size = 64
lr = 0.001

# Transform Derivative Set

# Set up data conversion
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert data to Tensor
    transforms.Normalize(  # Standardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1
        mean=[0.5, ],  # Expect
        std=[0.5, ]  # standard deviation
    )
])

# Training Set Import
data_train = datasets.MNIST(root='data/', transform=transform, train=True, download=True)
# Data Set Import
data_test = datasets.MNIST(root='data/', transform=transform, train=False)

# Data Loading

# Training Set Loading
dataloader_train = DataLoader(dataset=data_train, batch_size=64, shuffle=True)
# Data Set Loading
dataloader_test = DataLoader(dataset=data_test, batch_size=64, shuffle=True)

# Data Preview
images, labels = next(iter(dataloader_train))
img = make_grid(images)
img = img.numpy().transpose(1, 2, 0)
mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]
img = img * std + mean
print([labels[i] for i in range(16)])
plt.imshow(img)
plt.show()


# Constructing Convolution Neural Network
class CNN(nn.Module):  # Inherit from parent nn.Module
    def __init__(self):  # Constructor equivalent to C++.
        # The super() function is a method of calling a parent class (superclass) to solve the problem of multiple inheritance
        super(CNN, self).__init__()

        # The first convolution layer. Sequential brackets indicate the action to be performed
        self.conv1 = Sequential(
            nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        # Second convolution layer
        self.conv2 = Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        # Full Connection Layer (Dense, Dense Connection Layer)
        self.dense = Sequential(
            nn.Linear(7 * 7 * 128, 1024),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(1024, 10)
        )

    def forward(self, x):  # Forward Propagation
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x = x2.view(-1, 7 * 7 * 128)
        x = self.dense(x)
        return x


# Training and parameter optimization

# Define derivative function
def get_Variable(x):
    x = torch.autograd.Variable(x)  # Automatic derivation of Pytorch

    # Determine if a GPU is available
    return x.cuda() if torch.cuda.is_available() else x


# Define Network
cnn = CNN()

# Determine if a GPU is available to speed up training
if torch.cuda.is_available():
    cnn = cnn.cuda()

# Set the loss function to CrossEntropyLoss (Cross Entropy Loss Function)
loss_F = nn.CrossEntropyLoss()

# Set optimizer to Adam optimizer
optimizer = torch.optim.Adam(cnn.parameters(), lr=lr)

# train
for epoch in range(epochs):
    running_loss = 0.0  # Loss of an epoch
    running_correct = 0.0  # accuracy rate
    print("Epoch [{}/{}]".format(epoch, epochs))
    for data in dataloader_train:
        # The DataLoader return value is an image within a batch and the corresponding label
        X_train, y_train = data
        X_train, y_train = get_Variable(X_train), get_Variable(y_train)
        outputs = cnn(X_train)
        _, pred = torch.max(outputs.data, 1)
        # The latter parameter represents reducing the dimension of outputs.data by one dimension before outputting
        # The first return value is the maximum value in the tensor, and the second is the maximum value index
        # -------------------------- The following is similar to the random gradient descent-------------------------------------------------------------------------------------------------
        optimizer.zero_grad()
        # Gradient Zero
        loss = loss_F(outputs, y_train)
        # Seek loss
        loss.backward()
        # Reverse Propagation
        optimizer.step()
        # Update all gradients
        # ---------------------------------------------------------------------------------
        running_loss += loss.item()  # Here item() means to return the loss value for each time
        running_correct += torch.sum(pred == y_train.data)

    testing_correct = 0.0

    for data in dataloader_test:
        X_test, y_test = data
        X_test, y_test = get_Variable(X_test), get_Variable(y_test)
        outputs = cnn(X_test)
        _, pred = torch.max(outputs, 1)
        testing_correct += torch.sum(pred == y_test.data)
        # print(testing_correct)
    print("Loss: {:.4f}  Train Accuracy: {:.4f}%  Test Accuracy: {:.4f}%".format(
        running_loss / len(data_train), 100 * running_correct / len(data_train),
        100 * testing_correct / len(data_test)))

# Save Model
torch.save(cnn, 'data/model.pth')

Note: When loading a model within inference.py:

# Load Model
cnn = torch.load('data/model.pth')
cnn.eval()  # Enter inference mode

3. Other

The author is a sophomore in a university, studying computer science and technology. I started my freshman term with machine learning and focused on super-resolution reconstruction. Machine learning is a purely hobby, and there is hardly anyone to guide it. Therefore, if there are any faults in this article, we hope to criticize and correct them!

* This blog is partly online.

Topics: Python Machine Learning neural networks Pytorch Deep Learning