[Pytorch] MNIST Image Classification Code-Ultra-detailed Interpretation
Preface
Recently, machine learning has increased dramatically among junior undergraduates, and the editor often sees his little buddy gnawing books in the study room. However, due to the lack of experience and guidance, the principle may be clear, but because many books do not introduce detailed functions, and so on, many little partners only know a little about the specific code. This article is based on Pytorch, the most popular learning framework, to explain in detail the most basic image classification in image classification - MNIST dataset classification.
At the same time, I hope this article will help you understand the basic ideas of in-depth learning.
Before reading this article, you need to master basic in-depth learning knowledge (including CNN (Convolutional Neural Network). If you have no or weak base, at least familiarize yourself with this table order first:
Once you're familiar with the above articles, you can start reading this blog!
If you encounter any problems with the Pytorch framework code in this article, you can query it first Quick Manual - Pytorch website Related content. If not found, please Pytorch website Query.
1. Code Framework
Below is my favorite code framework for reference.
Filename: model.py
1. Introducing packages
2. Set up related parameters
3. Processing datasets
- Define transform
- Importing datasets
- Load (DataLoader)
- Preview (optional)
4. Build a network
5. Training
6. Save the model
2. Implementation Code
1. Introducing packages
The code is as follows:
import torch import torch.nn as nn from torch.nn import Sequential from matplotlib import pyplot as plt import torchvision.datasets as datasets from torch.utils.data import DataLoader from torchvision.utils import make_grid import torchvision.transforms as transforms
Package Name | function |
---|---|
torch | Core Package |
torch.nn | Modules containing neural networks, packages for inheritance, and some function methods (nn.functional) |
torchvision | Contains some datasets, models, image processing methods |
torch.utils | A Toolkit |
matplotlib | Used to display dataset pictures |
2. Set up related parameters
epochs = 10 batch_size = 64 lr = 0.001
parameter | Significance |
---|---|
epochs | Rounds trained |
batch_size | Size of each batch, that is, the amount of data for each iteration of training |
lr | Learning rate is the learning rate. Usually with very small values |
Here's a more detailed explanation of epochs and batch_szie:
->batch_ Size represents the amount of data for each training iteration;
->epochs means a few rounds of training.
Each iteration (Iteration) is a weight update, each weight update requires batch_ The loss function is obtained by Forward operation on the size data, and the parameters are updated by Backward (note that the gradient needs to be set to 0 during this process, which will follow). An iteration equals using batch_size samples are trained once. For example, there are 256 sample data, complete training of these sample data requires:
->batch_size=64;
->4 iterations;
->epochs=1.
Normally epochs are set to more than once, which is the same as grinding flour, where one round is not enough and more rounds are needed to produce finer flour.
3. Processing datasets
# Set up data conversion transform = transforms.Compose([ transforms.ToTensor(), # Convert data to Tensor transforms.Normalize( # Standardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1 mean=[0.5, ], # Expect std=[0.5, ] # standard deviation ) ]) # Training Set Import data_train = datasets.MNIST(root='data/', transform=transform, train=True, download=True) # Data Set Import data_test = datasets.MNIST(root='data/', transform=transform, train=False) # Data Loading # Training Set Loading dataloader_train = DataLoader(dataset=data_train, batch_size=64, shuffle=True) # Data Set Loading dataloader_test = DataLoader(dataset=data_test, batch_size=64, shuffle=True)
In addition to the comments in the code, some of the methods or parameters in this code are explained below.
For transform:
parameter | Significance |
---|---|
transforms.ToTensor() | Convert data to Tensor |
transforms.Normalize | Standardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1 |
mean | Expect |
std | standard deviation |
For datasets.MNIST:
parameter | Significance |
---|---|
root | Dataset (here MNIST) path |
transform | Conversion Form |
train | Whether to train or not. For training set, train=True, for test set, train=False |
download | Whether or not to download (automatically determines if you have downloaded or if the dataset exists under root, if you do, it will not be downloaded when you retrain) |
For DataLoader:
parameter | Significance |
---|---|
dataset | Dataset to process |
batch_size | Batch size |
shuffle | Is the data out of order |
Preview (optional)
# Data Preview images, labels = next(iter(dataloader_train)) img = make_grid(images) img = img.numpy().transpose(1, 2, 0) mean = [0.5, 0.5, 0.5] std = [0.5, 0.5, 0.5] img = img * std + mean print([labels[i] for i in range(16)]) plt.imshow(img) plt.show()
Method | Effect |
---|---|
iter(dataloader_train) | Generate dataloader_ Iterator for train |
next | Returns the next item of the iterator (used with iter() |
make_grid | Generate Grid |
img.numpy().transpose(1, 2, 0) | Converts the C, W, H positions of the numpy array matrix of img. 1, 2, 0 in parentheses means to shift the original position of 1, 2, 0 to 0, 1, 2 (that is, to convert the original [C, W, H] matrix to [H, W, C]) matrix. The data format used in Pytorch is inconsistent with the format of the plt.imshow() function, in Pytorch it is [C, H, W], and in plt.imshow(), it is [H, W, C]. Where C=Channel is the color channel; H=Height, image length; Width, picture width) |
plt.imshow(img) and plt.show() | display picture |
4. Build a network
# Constructing Convolution Neural Network class CNN(nn.Module): # Inherit from parent nn.Module def __init__(self): # Constructor equivalent to C++. # The super() function is a method of calling a parent class (superclass) to solve the problem of multiple inheritance super(CNN, self).__init__() # The first convolution layer. Sequential brackets indicate the action to be performed self.conv1 = Sequential( nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2) ) # Second convolution layer self.conv2 = Sequential( nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2) ) # Full Connection Layer (Dense, Dense Connection Layer) self.dense = Sequential( nn.Linear(7 * 7 * 128, 1024), nn.ReLU(), nn.Dropout(p=0.5), nn.Linear(1024, 10) ) def forward(self, x): # Forward Propagation x1 = self.conv1(x) x2 = self.conv2(x1) x = x2.view(-1, 7 * 7 * 128) x = self.dense(x) return x
In addition to the comments in the code, some of the methods or parameters in this code are explained as follows:
Method or parameter | Meaning or function |
---|---|
nn.Conv2d | Convolution of two-dimensional images. Where in_channels represent the input channel, out_channels represent the output channel, kernel_ The size of the convolution core is represented by size (n * n), stride is represented by the step of the convolution core movement, and padding is represented by the filling size (which belongs to the basic content). Please Baidu yourself for details. |
nn.BatchNorm2d | Batch Normalization (BN), batch standardization. Make a batch of Feature Map s satisfy a normal distribution with a mean of 0 and a variance of 1. Effect: Accelerate convergence; Controls over-fitting to use little or no Dropout and regularity; Lowering the network is insensitive to initialization weights; Allow a higher learning rate |
nn.ReLU | A common activation function, not to be overlooked |
nn.MaxPool2d | Maximum pooling of 2-D images without further discussion |
nn.Linear | Linear processing of data into a one-dimensional tensor |
nn.Dropout | Dropout, to prevent overfitting, not to mention |
x2.view(-1, 7 * 7 * 128) | Flattening the parameters so that the parameter dimensions of the full join layer output match their input dimensions |
5. Training
See the code comments for explanations.
# Training and parameter optimization # Define derivative function def get_Variable(x): x = torch.autograd.Variable(x) # Automatic derivation of Pytorch # Determine if a GPU is available return x.cuda() if torch.cuda.is_available() else x # Define Network cnn = CNN() # Determine if a GPU is available to speed up training if torch.cuda.is_available(): cnn = cnn.cuda() # Set the loss function to CrossEntropyLoss (Cross Entropy Loss Function) loss_F = nn.CrossEntropyLoss() # Set optimizer to Adam optimizer optimizer = torch.optim.Adam(cnn.parameters(), lr=lr) # train for epoch in range(epochs): running_loss = 0.0 # Loss of an epoch running_correct = 0.0 # accuracy rate print("Epoch [{}/{}]".format(epoch, epochs)) for data in dataloader_train: # The DataLoader return value is an image within a batch and the corresponding label X_train, y_train = data X_train, y_train = get_Variable(X_train), get_Variable(y_train) outputs = cnn(X_train) _, pred = torch.max(outputs.data, 1) # The latter parameter represents reducing the dimension of outputs.data by one dimension before outputting # The first return value is the maximum value in the tensor, and the second is the maximum value index # -------------------------- The following is similar to the random gradient descent------------------------------------------------------------------------------------------------- optimizer.zero_grad() # Gradient Zero loss = loss_F(outputs, y_train) # Seek loss loss.backward() # Reverse Propagation optimizer.step() # Update all gradients # --------------------------------------------------------------------------------- running_loss += loss.item() # Here item() means to return the loss value for each time running_correct += torch.sum(pred == y_train.data) testing_correct = 0.0 for data in dataloader_test: X_test, y_test = data X_test, y_test = get_Variable(X_test), get_Variable(y_test) outputs = cnn(X_test) _, pred = torch.max(outputs, 1) testing_correct += torch.sum(pred == y_test.data) # print(testing_correct) print("Loss: {:.4f} Train Accuracy: {:.4f}% Test Accuracy: {:.4f}%".format( running_loss / len(data_train), 100 * running_correct / len(data_train), 100 * testing_correct / len(data_test)))
6. Save the model
torch.save(cnn, 'data/model.pth') # Save the model to the data folder in the current directory, named model.pth
Congratulations! If you do this, all the steps of the training will be completed!
The complete MNIST image recognition code is as follows:
import torch import torch.nn as nn from torch.nn import Sequential from matplotlib import pyplot as plt import torchvision.datasets as datasets from torch.utils.data import DataLoader from torchvision.utils import make_grid import torchvision.transforms as transforms epochs = 10 batch_size = 64 lr = 0.001 # Transform Derivative Set # Set up data conversion transform = transforms.Compose([ transforms.ToTensor(), # Convert data to Tensor transforms.Normalize( # Standardization, even if the data follows a normal distribution with a expected value of 0 and a standard deviation of 1 mean=[0.5, ], # Expect std=[0.5, ] # standard deviation ) ]) # Training Set Import data_train = datasets.MNIST(root='data/', transform=transform, train=True, download=True) # Data Set Import data_test = datasets.MNIST(root='data/', transform=transform, train=False) # Data Loading # Training Set Loading dataloader_train = DataLoader(dataset=data_train, batch_size=64, shuffle=True) # Data Set Loading dataloader_test = DataLoader(dataset=data_test, batch_size=64, shuffle=True) # Data Preview images, labels = next(iter(dataloader_train)) img = make_grid(images) img = img.numpy().transpose(1, 2, 0) mean = [0.5, 0.5, 0.5] std = [0.5, 0.5, 0.5] img = img * std + mean print([labels[i] for i in range(16)]) plt.imshow(img) plt.show() # Constructing Convolution Neural Network class CNN(nn.Module): # Inherit from parent nn.Module def __init__(self): # Constructor equivalent to C++. # The super() function is a method of calling a parent class (superclass) to solve the problem of multiple inheritance super(CNN, self).__init__() # The first convolution layer. Sequential brackets indicate the action to be performed self.conv1 = Sequential( nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2) ) # Second convolution layer self.conv2 = Sequential( nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2) ) # Full Connection Layer (Dense, Dense Connection Layer) self.dense = Sequential( nn.Linear(7 * 7 * 128, 1024), nn.ReLU(), nn.Dropout(p=0.5), nn.Linear(1024, 10) ) def forward(self, x): # Forward Propagation x1 = self.conv1(x) x2 = self.conv2(x1) x = x2.view(-1, 7 * 7 * 128) x = self.dense(x) return x # Training and parameter optimization # Define derivative function def get_Variable(x): x = torch.autograd.Variable(x) # Automatic derivation of Pytorch # Determine if a GPU is available return x.cuda() if torch.cuda.is_available() else x # Define Network cnn = CNN() # Determine if a GPU is available to speed up training if torch.cuda.is_available(): cnn = cnn.cuda() # Set the loss function to CrossEntropyLoss (Cross Entropy Loss Function) loss_F = nn.CrossEntropyLoss() # Set optimizer to Adam optimizer optimizer = torch.optim.Adam(cnn.parameters(), lr=lr) # train for epoch in range(epochs): running_loss = 0.0 # Loss of an epoch running_correct = 0.0 # accuracy rate print("Epoch [{}/{}]".format(epoch, epochs)) for data in dataloader_train: # The DataLoader return value is an image within a batch and the corresponding label X_train, y_train = data X_train, y_train = get_Variable(X_train), get_Variable(y_train) outputs = cnn(X_train) _, pred = torch.max(outputs.data, 1) # The latter parameter represents reducing the dimension of outputs.data by one dimension before outputting # The first return value is the maximum value in the tensor, and the second is the maximum value index # -------------------------- The following is similar to the random gradient descent------------------------------------------------------------------------------------------------- optimizer.zero_grad() # Gradient Zero loss = loss_F(outputs, y_train) # Seek loss loss.backward() # Reverse Propagation optimizer.step() # Update all gradients # --------------------------------------------------------------------------------- running_loss += loss.item() # Here item() means to return the loss value for each time running_correct += torch.sum(pred == y_train.data) testing_correct = 0.0 for data in dataloader_test: X_test, y_test = data X_test, y_test = get_Variable(X_test), get_Variable(y_test) outputs = cnn(X_test) _, pred = torch.max(outputs, 1) testing_correct += torch.sum(pred == y_test.data) # print(testing_correct) print("Loss: {:.4f} Train Accuracy: {:.4f}% Test Accuracy: {:.4f}%".format( running_loss / len(data_train), 100 * running_correct / len(data_train), 100 * testing_correct / len(data_test))) # Save Model torch.save(cnn, 'data/model.pth')
Note: When loading a model within inference.py:
# Load Model cnn = torch.load('data/model.pth') cnn.eval() # Enter inference mode
3. Other
The author is a sophomore in a university, studying computer science and technology. I started my freshman term with machine learning and focused on super-resolution reconstruction. Machine learning is a purely hobby, and there is hardly anyone to guide it. Therefore, if there are any faults in this article, we hope to criticize and correct them!
* This blog is partly online.