[Pytorch] MNIST Image Classification Code-Ultra-detailed Interpretation

Posted by zackcez on Mon, 06 Dec 2021 18:21:17 +0100

[Pytorch] MNIST Image Classification Code-Ultra-detailed Interpretation

Preface

Recently, machine learning has increased dramatically among junior undergraduates, and the editor often sees his little buddy gnawing books in the study room. However, due to the lack of experience and guidance, the principle may be clear, but because many books do not introduce detailed functions, and so on, many little partners only know a little about the specific code. This article is based on Pytorch, the most popular learning framework, to explain in detail the most basic image classification in image classification - MNIST dataset classification.
At the same time, I hope this article will help you understand the basic ideas of in-depth learning.

Before reading this article, you need to master basic in-depth learning knowledge (including CNN (Convolutional Neural Network). If you have no or weak base, at least familiarize yourself with this table order first:

Recommendation order	Website
1	How do I implement a neural network from zero by myself? Answer to Quantum Bits-Knowledge
2	Convolutional Neural Network CNN Complete Guide Final Edition (1) Learn-obsessed Cakes Article-Know
3	Convolutional Neural Network CNN Complete Guide Final Edition (2) Learn-obsessed Cake Articles-Know

Once you're familiar with the above articles, you can start reading this blog!

If you encounter any problems with the Pytorch framework code in this article, you can query it first Quick Manual - Pytorch website Related content. If not found, please Pytorch website Query.

1. Code Framework

Below is my favorite code framework for reference.

Filename: model.py

1. Introducing packages
2. Set up related parameters
3. Processing datasets
- Define transform
- Importing datasets
- Load (DataLoader)
- Preview (optional)
4. Build a network
5. Training
6. Save the model

2. Implementation Code

1. Introducing packages

The code is as follows:

import torch
import torch.nn as nn
from torch.nn import Sequential
from matplotlib import pyplot as plt
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
from torchvision.utils import make_grid
import torchvision.transforms as transforms

Package Name	function
torch	Core Package
torch.nn	Modules containing neural networks, packages for inheritance, and some function methods (nn.functional)
torchvision	Contains some datasets, models, image processing methods
torch.utils	A Toolkit
matplotlib	Used to display dataset pictures

2. Set up related parameters

epochs = 10
batch_size = 64
lr = 0.001

parameter	Significance
epochs	Rounds trained
batch_size	Size of each batch, that is, the amount of data for each iteration of training
lr	Learning rate is the learning rate. Usually with very small values

Here's a more detailed explanation of epochs and batch_szie:
->batch_ Size represents the amount of data for each training iteration;
->epochs means a few rounds of training.

Each iteration (Iteration) is a weight update, each weight update requires batch_ The loss function is obtained by Forward operation on the size data, and the parameters are updated by Backward (note that the gradient needs to be set to 0 during this process, which will follow). An iteration equals using batch_size samples are trained once. For example, there are 256 sample data, complete training of these sample data requires:
->batch_size=64；
->4 iterations;
->epochs=1.

Normally epochs are set to more than once, which is the same as grinding flour, where one round is not enough and more rounds are needed to produce finer flour.

3. Processing datasets

# Set up data conversion
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert data to Tensor
    transforms.Normalize(  # Standardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1
        mean=[0.5, ],  # Expect
        std=[0.5, ]  # standard deviation
    )
])

# Training Set Import
data_train = datasets.MNIST(root='data/', transform=transform, train=True, download=True)
# Data Set Import
data_test = datasets.MNIST(root='data/', transform=transform, train=False)

# Data Loading
# Training Set Loading
dataloader_train = DataLoader(dataset=data_train, batch_size=64, shuffle=True)
# Data Set Loading
dataloader_test = DataLoader(dataset=data_test, batch_size=64, shuffle=True)

In addition to the comments in the code, some of the methods or parameters in this code are explained below.

For transform:

parameter	Significance
transforms.ToTensor()	Convert data to Tensor
transforms.Normalize	Standardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1
mean	Expect
std	standard deviation

For datasets.MNIST:

parameter	Significance
root	Dataset (here MNIST) path
transform	Conversion Form
train	Whether to train or not. For training set, train=True, for test set, train=False
download	Whether or not to download (automatically determines if you have downloaded or if the dataset exists under root, if you do, it will not be downloaded when you retrain)

For DataLoader:

parameter	Significance
dataset	Dataset to process
batch_size	Batch size
shuffle	Is the data out of order

Preview (optional)

# Data Preview
images, labels = next(iter(dataloader_train))
img = make_grid(images)
img = img.numpy().transpose(1, 2, 0)
mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]
img = img * std + mean
print([labels[i] for i in range(16)])
plt.imshow(img)
plt.show()

Method	Effect
iter(dataloader_train)	Generate dataloader_ Iterator for train
next	Returns the next item of the iterator (used with iter()
make_grid	Generate Grid
img.numpy().transpose(1, 2, 0)	Converts the C, W, H positions of the numpy array matrix of img. 1, 2, 0 in parentheses means to shift the original position of 1, 2, 0 to 0, 1, 2 (that is, to convert the original [C, W, H] matrix to [H, W, C]) matrix. The data format used in Pytorch is inconsistent with the format of the plt.imshow() function, in Pytorch it is [C, H, W], and in plt.imshow(), it is [H, W, C]. Where C=Channel is the color channel; H=Height, image length; Width, picture width)
plt.imshow(img) and plt.show()	display picture

4. Build a network

# Constructing Convolution Neural Network
class CNN(nn.Module):  # Inherit from parent nn.Module
    def __init__(self):  # Constructor equivalent to C++.
        # The super() function is a method of calling a parent class (superclass) to solve the problem of multiple inheritance
        super(CNN, self).__init__()
        
        # The first convolution layer. Sequential brackets indicate the action to be performed
        self.conv1 = Sequential(
            nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        # Second convolution layer
        self.conv2 = Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        # Full Connection Layer (Dense, Dense Connection Layer)
        self.dense = Sequential(
            nn.Linear(7 * 7 * 128, 1024),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(1024, 10)
        )

    def forward(self, x):  # Forward Propagation
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x = x2.view(-1, 7 * 7 * 128)
        x = self.dense(x)
        return x

In addition to the comments in the code, some of the methods or parameters in this code are explained as follows:

Method or parameter	Meaning or function
nn.Conv2d	Convolution of two-dimensional images. Where in_channels represent the input channel, out_channels represent the output channel, kernel_ The size of the convolution core is represented by size (n * n), stride is represented by the step of the convolution core movement, and padding is represented by the filling size (which belongs to the basic content). Please Baidu yourself for details.
nn.BatchNorm2d	Batch Normalization (BN), batch standardization. Make a batch of Feature Map s satisfy a normal distribution with a mean of 0 and a variance of 1. Effect: Accelerate convergence; Controls over-fitting to use little or no Dropout and regularity; Lowering the network is insensitive to initialization weights; Allow a higher learning rate
nn.ReLU	A common activation function, not to be overlooked
nn.MaxPool2d	Maximum pooling of 2-D images without further discussion
nn.Linear	Linear processing of data into a one-dimensional tensor
nn.Dropout	Dropout, to prevent overfitting, not to mention
x2.view(-1, 7 * 7 * 128)	Flattening the parameters so that the parameter dimensions of the full join layer output match their input dimensions

5. Training

See the code comments for explanations.

# Training and parameter optimization

# Define derivative function
def get_Variable(x):
    x = torch.autograd.Variable(x)  # Automatic derivation of Pytorch
    
    # Determine if a GPU is available
    return x.cuda() if torch.cuda.is_available() else x
    
# Define Network
cnn = CNN()

# Determine if a GPU is available to speed up training
if torch.cuda.is_available():
    cnn = cnn.cuda()

# Set the loss function to CrossEntropyLoss (Cross Entropy Loss Function)
loss_F = nn.CrossEntropyLoss()

# Set optimizer to Adam optimizer
optimizer = torch.optim.Adam(cnn.parameters(), lr=lr)

# train
for epoch in range(epochs):
    running_loss = 0.0  # Loss of an epoch
    running_correct = 0.0  # accuracy rate
    print("Epoch [{}/{}]".format(epoch, epochs))
    for data in dataloader_train:
        # The DataLoader return value is an image within a batch and the corresponding label
        X_train, y_train = data
        X_train, y_train = get_Variable(X_train), get_Variable(y_train)
        outputs = cnn(X_train)
        _, pred = torch.max(outputs.data, 1)
        # The latter parameter represents reducing the dimension of outputs.data by one dimension before outputting
        # The first return value is the maximum value in the tensor, and the second is the maximum value index
        # -------------------------- The following is similar to the random gradient descent-------------------------------------------------------------------------------------------------
        optimizer.zero_grad()
        # Gradient Zero
        loss = loss_F(outputs, y_train)
        # Seek loss
        loss.backward()
        # Reverse Propagation
        optimizer.step()
        # Update all gradients
        # ---------------------------------------------------------------------------------
        running_loss += loss.item()  # Here item() means to return the loss value for each time
        running_correct += torch.sum(pred == y_train.data)
        
    testing_correct = 0.0
    
    for data in dataloader_test:
        X_test, y_test = data
        X_test, y_test = get_Variable(X_test), get_Variable(y_test)
        outputs = cnn(X_test)
        _, pred = torch.max(outputs, 1)
        testing_correct += torch.sum(pred == y_test.data)
        # print(testing_correct)
    print("Loss: {:.4f}  Train Accuracy: {:.4f}%  Test Accuracy: {:.4f}%".format(
        running_loss / len(data_train), 100 * running_correct / len(data_train),
        100 * testing_correct / len(data_test)))

6. Save the model

torch.save(cnn, 'data/model.pth')  # Save the model to the data folder in the current directory, named model.pth

Congratulations! If you do this, all the steps of the training will be completed!

The complete MNIST image recognition code is as follows:

import torch
import torch.nn as nn
from torch.nn import Sequential
from matplotlib import pyplot as plt
import torchvision.datasets as datasets
from torch.utils.data import DataLoader
from torchvision.utils import make_grid
import torchvision.transforms as transforms

epochs = 10
batch_size = 64
lr = 0.001

# Transform Derivative Set

# Set up data conversion
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert data to Tensor
    transforms.Normalize(  # Standardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1
        mean=[0.5, ],  # Expect
        std=[0.5, ]  # standard deviation
    )
])

# Training Set Import
data_train = datasets.MNIST(root='data/', transform=transform, train=True, download=True)
# Data Set Import
data_test = datasets.MNIST(root='data/', transform=transform, train=False)

# Data Loading

# Training Set Loading
dataloader_train = DataLoader(dataset=data_train, batch_size=64, shuffle=True)
# Data Set Loading
dataloader_test = DataLoader(dataset=data_test, batch_size=64, shuffle=True)

# Data Preview
images, labels = next(iter(dataloader_train))
img = make_grid(images)
img = img.numpy().transpose(1, 2, 0)
mean = [0.5, 0.5, 0.5]
std = [0.5, 0.5, 0.5]
img = img * std + mean
print([labels[i] for i in range(16)])
plt.imshow(img)
plt.show()


# Constructing Convolution Neural Network
class CNN(nn.Module):  # Inherit from parent nn.Module
    def __init__(self):  # Constructor equivalent to C++.
        # The super() function is a method of calling a parent class (superclass) to solve the problem of multiple inheritance
        super(CNN, self).__init__()

        # The first convolution layer. Sequential brackets indicate the action to be performed
        self.conv1 = Sequential(
            nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        # Second convolution layer
        self.conv2 = Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        # Full Connection Layer (Dense, Dense Connection Layer)
        self.dense = Sequential(
            nn.Linear(7 * 7 * 128, 1024),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(1024, 10)
        )

    def forward(self, x):  # Forward Propagation
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x = x2.view(-1, 7 * 7 * 128)
        x = self.dense(x)
        return x


# Training and parameter optimization

# Define derivative function
def get_Variable(x):
    x = torch.autograd.Variable(x)  # Automatic derivation of Pytorch

    # Determine if a GPU is available
    return x.cuda() if torch.cuda.is_available() else x


# Define Network
cnn = CNN()

# Determine if a GPU is available to speed up training
if torch.cuda.is_available():
    cnn = cnn.cuda()

# Set the loss function to CrossEntropyLoss (Cross Entropy Loss Function)
loss_F = nn.CrossEntropyLoss()

# Set optimizer to Adam optimizer
optimizer = torch.optim.Adam(cnn.parameters(), lr=lr)

# train
for epoch in range(epochs):
    running_loss = 0.0  # Loss of an epoch
    running_correct = 0.0  # accuracy rate
    print("Epoch [{}/{}]".format(epoch, epochs))
    for data in dataloader_train:
        # The DataLoader return value is an image within a batch and the corresponding label
        X_train, y_train = data
        X_train, y_train = get_Variable(X_train), get_Variable(y_train)
        outputs = cnn(X_train)
        _, pred = torch.max(outputs.data, 1)
        # The latter parameter represents reducing the dimension of outputs.data by one dimension before outputting
        # The first return value is the maximum value in the tensor, and the second is the maximum value index
        # -------------------------- The following is similar to the random gradient descent-------------------------------------------------------------------------------------------------
        optimizer.zero_grad()
        # Gradient Zero
        loss = loss_F(outputs, y_train)
        # Seek loss
        loss.backward()
        # Reverse Propagation
        optimizer.step()
        # Update all gradients
        # ---------------------------------------------------------------------------------
        running_loss += loss.item()  # Here item() means to return the loss value for each time
        running_correct += torch.sum(pred == y_train.data)

    testing_correct = 0.0

    for data in dataloader_test:
        X_test, y_test = data
        X_test, y_test = get_Variable(X_test), get_Variable(y_test)
        outputs = cnn(X_test)
        _, pred = torch.max(outputs, 1)
        testing_correct += torch.sum(pred == y_test.data)
        # print(testing_correct)
    print("Loss: {:.4f}  Train Accuracy: {:.4f}%  Test Accuracy: {:.4f}%".format(
        running_loss / len(data_train), 100 * running_correct / len(data_train),
        100 * testing_correct / len(data_test)))

# Save Model
torch.save(cnn, 'data/model.pth')

Note: When loading a model within inference.py:

# Load Model
cnn = torch.load('data/model.pth')
cnn.eval()  # Enter inference mode

3. Other

The author is a sophomore in a university, studying computer science and technology. I started my freshman term with machine learning and focused on super-resolution reconstruction. Machine learning is a purely hobby, and there is hardly anyone to guide it. Therefore, if there are any faults in this article, we hope to criticize and correct them!

* This blog is partly online.

Topics: Python Machine Learning neural networks Pytorch Deep Learning

Programmer Think

[Pytorch] MNIST Image Classification Code-Ultra-detailed Interpretation

[Pytorch] MNIST Image Classification Code-Ultra-detailed Interpretation

Preface

1. Code Framework

2. Implementation Code

1. Introducing packages

2. Set up related parameters

3. Processing datasets

4. Build a network

5. Training

6. Save the model

3. Other

Hot Topics