Zero basic in-depth learning pytoch tutorial, one hour quick start

Posted by grabber_grabbs on Tue, 12 Oct 2021 00:06:19 +0200

There are excellent tutorials on the official website of Pytorch. Among them, several short articles belong to the content of the small column called DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ. Considering that it is a little difficult for everyone to read English literature, the author plans to spend some time on translation and make some content adjustments in combination with his own understanding. The original link is posted here Click here to jump . To continue the previous content, click here to jump to Part III . OK, let's start the fourth part and this is the last part.

The content of this part allows us to carry out simple practical combat in combination with our previous knowledge. After learning this part, we can use pytorch to build a simple model, and we will start directly.

Now you know how to define the neural network, calculate the loss value and update the weight value of the neural network. Now I want you to think about it.

What is the data?

Usually, when you have to deal with picture, text, audio and video data, you can use the standard python package to import the data into the numpy array. Then you can convert the array to the tensor type of pytorch.

(1) For pictures, there are pilot and OpenCV packages that can be used.

(2) For audio, packages with scipy and librosa can be used.

(3) For text, both NLTK and SpaCy can be used for loading based on original Python or python

In particular, for visual processing, we have created a package called torchvision, which can be used for data loader of public data sets, such as ImageNet, CIFAR10, MNIST, etc. Data conversion tools for images include torch vision. Datasets and

This tool provides us with great convenience, avoids writing too much duplicate code, and is convenient for relevant personnel to use.

For this tutorial, we will use the CIFIA10 dataset. It includes various types, such as' airplane ',' automobile ',' bird ',' cat ',' der ',' dog ',' frog ',' horse ',' ship ',' truck '. These pictures in CIFIA are 3 * 32 * 32 size pictures and 3 (RGB) channel 32 * 32 size pictures.

  After the introduction, we will carry out practical training.

Train a picture classifier

We will follow the steps below:

(1) Use torchvision to import and standardize the training data and test data of CIFIA10.

(2) A convolutional neural network is defined

(3) Define a loss function

(4) A neural network is trained on the training data set

(5) Test the effect of the network on the test set

1. Import and normalize the dataset of CIFIA10

Using torchvision, it is very simple to import CIFAR10.

import torch
import torchvision
import torchvision.transforms as transforms

The range of torchvision's dataset output is [0,1]. We convert them into the normalized tensor format, and the range interval is [- 1,1].

transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader =, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader =, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

If you run on Windows and get a BrokenPipeError, you can   Num of Workers is set to 0.

The output result in the figure above is

Downloading to ./data/cifar-10-python.tar.gz
Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified

Since there is no relevant data set on your computer, you will connect to the website for download.

Now let's show the picture of training.

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

# get some random training images
dataiter = iter(trainloader)
images, labels =

# show images
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(batch_size)))

The output result is:

cat plane  bird  ship

2. Define a convolutional neural network

  Select a neural network from the neural network set and adjust the input of the network to a 3-channel (RGB) picture instead of the original default single channel (gray) picture.

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

3. Define a loss function and optimizer

Let's use the loss function of a classifier corss entropy and SGD with momentum.

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4. Training network

The following things begin to become interesting. We simply let the data iterate, feed the input data to the network, and then let the network optimize itself.

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

The output result is:

[1,  2000] loss: 2.128
[1,  4000] loss: 1.793
[1,  6000] loss: 1.649
[1,  8000] loss: 1.555
[1, 10000] loss: 1.504
[1, 12000] loss: 1.444
[2,  2000] loss: 1.379
[2,  4000] loss: 1.344
[2,  6000] loss: 1.336
[2,  8000] loss: 1.327
[2, 10000] loss: 1.294
[2, 12000] loss: 1.280
Finished Training

Let's save the model quickly so that we can continue training directly from the parameters here next time.

PATH = './cifar_net.pth', PATH)

More details about saving can be viewed Original file.  

5. Test the network on the test set

  We have trained the network for 2 rounds on the training set (the big cycle of the training network is 2 times), but we need to check whether the network has achieved good learning results.

We detect the error with the real value by detecting the classification of labels and the output of neural network. If the prediction is correct, we will add a sample to the correct prediction set.

OK, first step, let's show a set of pictures in the self-test set.

dataiter = iter(testloader)
images, labels =

# print images
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

GroundTruth:    cat  ship  ship plane

Next, let's import the model we saved before (in fact, we don't need to save and import the model parameters here. Here's just a demonstration).  

net = Net()

Next, let's think about the above examples of Kangkang neural network:

outputs = net(images)

The output is the probability of 10 categories. The category with the highest probability is identified as the output result, so we will obtain the index of the category with the highest probability.

_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

The output result is:

Predicted:   frog  ship  ship  ship

The results seem good.

Let's see how the network performs on the entire dataset.

correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in testloader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

The result output is:

Accuracy of the network on the 10000 test images: 54 %

  From the results, it seems that this is much better than taking a chance. Taking a chance is one of ten, that is, the accuracy rate is 10%. It seems that the neural network has something.

Let's see which categories perform better.

# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

# again no gradients needed
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1

# print accuracy for each class
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print("Accuracy for class {:5s} is: {:.1f} %".format(classname,

The output result is:

Accuracy for class plane is: 59.4 %
Accuracy for class car   is: 66.7 %
Accuracy for class bird  is: 22.7 %
Accuracy for class cat   is: 52.7 %
Accuracy for class deer  is: 59.1 %
Accuracy for class dog   is: 28.9 %
Accuracy for class frog  is: 70.8 %
Accuracy for class horse is: 57.6 %
Accuracy for class ship  is: 67.4 %
Accuracy for class truck is: 62.2 %

OK, what can we do next? We can try GPU acceleration again.

Training on GPU  

Just like converting the tensor to the GPU, here we convert the network to the GPU.

If we can get CUDA (NVIDIA's library here for more details, we will continue to write relevant tutorials later), we can first define our device as the first visible CUDA device.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assuming that we are on a CUDA machine, this should print a CUDA device:


The output result is:


The rest assumes that CUDA is already installed on our device.

These methods recursively traverse all modules and convert their parameters and buffers to CUDA's tensor s.

Remember, you must send the input and target to the GPU at each step   Participate in the operation.

inputs, labels = data[0].to(device), data[1].to(device)

If you don't notice more acceleration with the GPU during the test. Mainly because your network may be small.  

Try to increase the width of your network and see what kind of acceleration you get.

Through this study, you can establish a small network for image classification. Next, you can further understand PyTorch's library, and then train more neural networks.  

Topics: Python Pytorch Deep Learning