Handwritten numeral recognition based on convolutional neural network and pytorch

Posted by rajatr on Wed, 05 Jan 2022 03:19:42 +0100

Don't bother python course
Due to the pytorch version, some knowledge needs to be found during class. The course is very good. (no money, only for sharing)

definition

Baidu Encyclopedia
The structure of convolutional neural network is composed of convolution, activation and pooling

convolution

After reading too many online explanations, I found that I still didn't understand them. Because my ability was limited, I didn't know much about online explanations. I didn't want to start with formulas, and I didn't study professionally, so I only understood what convolution was based on the current project.
For the current example of handwriting recognition, a convolution kernel is defined to move continuously on the picture, and the corresponding required features are obtained
We define a convolution kernel (of course, the initial value is random, and then the corresponding final parameters are obtained by continuously calculating loss). The convolution kernel multiplies the scanned features by the values in the convolution kernel, and finally adds them to obtain the corresponding data to obtain the image data suitable for training..

The next step is the code implementation (paste the complete code at the end of the text)

Import related packages

import os
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt
from matplotlib import cm
from icecream import ic

EPOCH = 1
BATCH_SIZE = 50
LR = 0.001
DOWNLOAD_MNIST = False

Firstly, the corresponding training data and experimental data are imported
Use torchvision data. MNIST and carry out corresponding transformation

train_data = torchvision.datasets.MNIST(
	# The first is the stored folder
    root='./mnist/',
    # Whether to extract corresponding data from training data
    train=True,
    # Because the input to the neural network must be a tensor, use this to transform it into a tensor
    transform=torchvision.transforms.ToTensor(),
    # Whether to download or not. Select False directly for downloading, and select True for not downloading
    download=False
)

# Batch training, turn the data into a small batch of data for training.
# DataLoader is used to wrap the data used, throw a batch of data each time, and delete the dimension of the picture at the same time
# From the original (60000, 28, 28) to (50, 1, 28, 28), only 50 data are thrown at a time
train_loader = Data.DataLoader(dataset=train_data, batch_size=50, shuffle=True)

# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
# This mainly ensures that the input image is four-dimensional weight
test_x = torch.unsqueeze(test_data.data, dim=1).type(torch.FloatTensor)[
         :2000] / 255.  # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)

For batch_ I don't know much about size. I finally found that every batch is thrown_ Size data
3. Construct neural network

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(  # input shape (1, 28, 28)
            nn.Conv2d(
            	# Enter the number of channels of the picture. The number of channels of the gray picture is 1, and the number of channels of the color picture is 3
                in_channels=1,  # input height
                # The number of channels of the output picture is equal to the number of convolution kernels
                out_channels=16,  # n_filters
                kernel_size=5,  # filter size
                stride=1,  # filter movement/step
                # After convolution operation, the original image size is still the same, and the calculation method of padding is
                # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
                # The reason for dividing by 2 is that both sides need to be expanded, and padding only represents one side
                padding=2,
            ),  # output shape (16, 28, 28)
            nn.ReLU(),  # activation
            # Perform the corresponding pooling layer. The size of the convolution kernel is 2x2. Take the maximum value in the size area of the image convolution kernel
            nn.MaxPool2d(kernel_size=2),  # choose max value in 2x2 area, output shape (16, 14, 14)
        )
        self.conv2 = nn.Sequential(  # input shape (16, 14, 14)
            nn.Conv2d(16, 32, 5, 1, 2),  # output shape (32, 14, 14)
            nn.ReLU(),  # activation
            nn.MaxPool2d(2),  # output shape (32, 7, 7)
        )
        self.out = nn.Linear(32 * 7 * 7, 10)  # fully connected layer, output 10 classes

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)  # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
        output = self.out(x)
        return output, x  # return x for visualization

torch. nn. Understanding of conv2d() function and channel
Convolution layer is to extract the image features and remove the corresponding redundant information
Pool layer is to enlarge the features in the picture and select better features
Reference documents: Deeply study the significance of convolution layer and pooling layer in convolution neural network
You may have a problem here:

Why can the maximum represent significant characteristics?
Because the convolution kernel only has the features close to the features to be extracted when extracting the features of the picture
Will be multiplied by the convolution kernel to get the maximum value
Why should the output result of convolution be flattened and input to the output layer after x = x.view(x.size(0), -1)
Because the output layer adopts a fully connected network, the value of input and output is required to be one-dimensional
Reference documents: x = x.view(x.size(0), -1)

Establish the corresponding neural network

cnn = CNN()
print(cnn)  # net architecture
# Define optimization function
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)  # optimize all cnn parameters
# Define loss function
loss_func = nn.CrossEntropyLoss()  # the target label is not one-hotted

Conduct relevant training

for epoch in range(EPOCH):
    for step, (b_x, b_y) in enumerate(train_loader):  # gives batch data, normalize x when iterate train_loader
        output = cnn(b_x)[0]  # cnn output
        loss = loss_func(output, b_y)  # cross entropy loss
        # Perform gradient clearing
        optimizer.zero_grad()  # clear gradients for this training step
        loss.backward()  # backpropagation, compute gradients
        optimizer.step()  # apply gradients

Add full code

There are corresponding drawing codes in the complete code (you do not need to copy the above)

import os
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt
from matplotlib import cm
from icecream import ic

EPOCH = 1
BATCH_SIZE = 50
LR = 0.001
DOWNLOAD_MNIST = False

# Determine whether the dataset exists
if not (os.path.exists('./mnist/')) or not os.listdir('./mnist/'):
    DOWNLOAD_MNIST = True

train_data = torchvision.datasets.MNIST(
    root='./mnist/',
    train=True,
    transform=torchvision.transforms.ToTensor(),
    download=DOWNLOAD_MNIST
)

# plot one example
print(train_data.data.size())  # (60000, 28, 28)
print(train_data.targets.size())  # (60000)
plt.imshow(train_data.data[0].numpy(), cmap='gray')
plt.title('%i' % train_data.targets[0])
plt.show()

# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)

# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
test_x = test_data.data.type(torch.FloatTensor)[
         :2000] / 255.  # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
print(test_x)
test_y = test_data.targets[:2000]
# print(test_x[4].size())

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(  # input shape (1, 28, 28)
            nn.Conv2d(
                in_channels=1,  # input height
                out_channels=16,  # n_filters
                kernel_size=5,  # filter size
                stride=1,  # filter movement/step
                # After convolution operation, the original image size is still the same, and the calculation method of padding is
                # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
                # The reason for dividing by 2 is that both sides need to be expanded, and padding only represents one side
                padding=2,
            ),  # output shape (16, 28, 28)
            nn.ReLU(),  # activation
            nn.MaxPool2d(kernel_size=2),  # choose max value in 2x2 area, output shape (16, 14, 14)
        )
        self.conv2 = nn.Sequential(  # input shape (16, 14, 14)
            nn.Conv2d(16, 32, 5, 1, 2),  # output shape (32, 14, 14)
            nn.ReLU(),  # activation
            nn.MaxPool2d(2),  # output shape (32, 7, 7)
        )
        self.out = nn.Linear(32 * 7 * 7, 10)  # fully connected layer, output 10 classes

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)  # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
        output = self.out(x)
        return output, x  # return x for visualization


cnn = CNN()
print(cnn)  # net architecture
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)  # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss()  # the target label is not one-hotted

try:
    from sklearn.manifold import TSNE

    HAS_SK = True
except:
    HAS_SK = False
    print('Please install sklearn for layer visualization')


def plot_with_labels(lowDWeights, labels):
    plt.cla()
    X, Y = lowDWeights[:, 0], lowDWeights[:, 1]
    for x, y, s in zip(X, Y, labels):
        # Randomly generate corresponding colors
        # x and y are coordinates
        c = cm.rainbow(int(255 * s / 9))
        plt.text(x, y, s, backgroundcolor=c, fontsize=9)
    plt.xlim(X.min(), X.max())
    plt.ylim(Y.min(), Y.max())
    plt.title('Visualize last layer')
    plt.show()
    plt.pause(0.01)


plt.ion()
# training and testing
for epoch in range(EPOCH):
    for step, (b_x, b_y) in enumerate(train_loader):  # gives batch data, normalize x when iterate train_loader
        output = cnn(b_x)[0]  # cnn output
        loss = loss_func(output, b_y)  # cross entropy loss
        optimizer.zero_grad()  # clear gradients for this training step
        loss.backward()  # backpropagation, compute gradients
        optimizer.step()  # apply gradients

        if step % 50 == 0:
            test_output, last_layer = cnn(test_x)
            pred_y = torch.max(test_output, 1)[1].data.numpy()
            accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)
            if HAS_SK:
                # The similarity of data is transformed into corresponding probability
                # Perflexity is a floating point type, n_components is embedded, dimension is 2, init is embedded initialization, n_iter is the number of iterations
                tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
                plot_only = 500
                # Dimensionality reduction of data
                low_dim_embs = tsne.fit_transform(last_layer.data.numpy()[:plot_only, :])
                labels = test_y.numpy()[:plot_only]
                plot_with_labels(low_dim_embs, labels)
plt.ioff()

# print 10 predictions from test data
# Go to the first ten to test the model and compare it with the original label
test_output, _ = cnn(test_x[:10])
pred_y = torch.max(test_output, 1)[1].data.numpy()
print(pred_y, 'prediction number')
print(test_y[:10].numpy(), 'real number')

Topics: Pytorch

Programmer Think