Li Mu took you to learn AI for the sake of a small paper

Posted by elite_prodigy on Fri, 12 Nov 2021 15:50:34 +0100

Today, let's test the softmax image classification problem

!!!!! I omitted the part of the animation display about the calculation accuracy
torchvision.transforms is an image preprocessing package in pytorch, which contains many functions to transform image data, which are essential in the step of image data reading

  • torchvision.transforms.Compose(): combine multiple transforms together

  • transforms.ToTensor() is to convert an image format into the data form of torch.FloatTensor

  • transforms.Resize(resize) adjusts the wind of the picture to turn green

  • Record the role of keepdim

import numpy as np

a = torch.ones((2,2))
 
b = np.array([[1,2,3],[1,1,1]])
c = torch.from_numpy(b)
 
interval_0 = torch.sum(c, dim=0, keepdim=True)

interval_1 = torch.sum(c, dim=1)

The function of keepdim is to prevent the calculated data from becoming row by row data. The second is to prevent the dimension from being squeezed

batch_size = 256

def get_dataloader_workers():  
    """Four processes are used to read data."""
    return 2

#ok, now we integrate all the components and define functions to get the data set
def load_data_fashion_mnist(batch_size,resize=None):
  #Download the fashion MNIST dataset and load it into memory
  trans = [transforms.ToTensor()] #Define a list that contains the data type transformations you want to make on plmimage type data
  if resize:
    trans.insert(0,transforms.Resize(resize))#If we set resize, its resolution will be reset here
  trans = transforms.Compose(trans)#Combine these operations
  mnist_train=torchvision.datasets.FashionMNIST(root="./data",train=True,transform=trans,download=True)#Load data into memory
  mnist_test=torchvision.datasets.FashionMNIST(root="./data",train=False,transform=trans,download=True)#Load data into memory
  return (data.DataLoader(mnist_train,batch_size,shuffle=True,num_workers=get_dataloader_workers()),data.DataLoader
          (mnist_test,batch_size,shuffle=False,num_workers=get_dataloader_workers()))


batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)

#Initialize the parameters of the model
num_inputs = 784	#We need to lengthen a 28 * 28 picture to a vector of 1 * 784, although this will lose spatial information
num_outputs = 10	#Finally, our prediction is only 10 categories
W = torch.normal(0,0.01,size=(num_inputs,num_outputs),requires_grad=True)#Define our layer and record the gradient
b = torch.zeros(num_outputs,requires_grad=True)#Define our bias and record the gradient

#Define the operation of softmax
def softmax(X):
  X_exp = torch.exp(X)  #Calculate the power of each parameter
  partition = X_exp.sum(1,keepdim=True)
  return X_exp/partition  #Note that the broadcast mechanism is applied here

I have something to say about Softmax here
X = torch.normal(0,1,(2,5)) # creates a matrix with a mean of 0 and a variance of 1, with 2 rows and 5 columns
tensor([[ 1.3066, 0.0417, 0.6489, -0.0553, -0.9866],[-1.7921, -0.4884, 1.7815, 2.2112, -0.6010]])
X_exp = torch.exp(X). Here, we perform the operation based on e for each element
partition = X_exp.sum(1,keepdim=True)
dim=1 indicates that in the end, our data will become a column of data. Why should it become a column of data? For example, here it becomes
partition = X_exp.sum(1,keepdim=True)
partition.shape
torch.Size([2, 1])
Because for us, a row represents a sample, we need to predict the distribution on another row. tip applies the broadcast operation at the last return. Finally, the partition will become a [2 * 5] matrix, and the value of each column is the same as that of the first column

supplement
tensor.reshape(-1). We all know that when we transform the matrix, we need to keep the number of elements consistent- 1 means that an unknown number is defined, and this dimension only needs to be calculated according to the information of other dimensions

Let me briefly describe how the cross entropy is calculated

Start again

%matplotlib inline
import torch
import torchvision
from IPython import display
from torch.utils import data
from torchvision import transforms
from d2l import torch as d2l

This part is the library we need to import. Now I found that in fact, d2l library is a library specially written by Mr. Li, which is convenient for us to study. Comrades put their tears on the public screen.

def get_fashion_mnist_labels(labels):
    """return Fashion-MNIST The text label of the dataset."""
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

In this function, we will pass in a label of type int, such as labels = [1,2,3], and return the corresponding text label

def get_dataloader_workers():
    """Four processes are used to read data."""
    return 4

Read data in parallel. This part can be turned up and down according to the situation of your computer

def load_data_fashion_mnist(batch_size, resize=None):
    """download Fashion-MNIST The dataset and then load it into memory."""
    trans = [transforms.ToTensor()] # Here, we specify what the data will be transformed
    if resize:  #If the resize parameter is not 0, we will reset its resolution
        trans.insert(0, transforms.Resize(resize))  #Perform the Resize function to reset the resolution
    trans = transforms.Compose(trans) #Put this series of operations together
    mnist_train = torchvision.datasets.FashionMNIST(    #Get training dataset    
        root="./data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(     #Get test data set
        root="./data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,
                            num_workers=get_dataloader_workers()),    # In pytorch, there is no doubt about using DataLoader
            data.DataLoader(mnist_test, batch_size, shuffle=False,
                            num_workers=get_dataloader_workers()))

Note that we need to perform a series of operations on the data, and trans can be understood as a collection of these operations
resize is to redefine its resolution
In pytorch, all our data needs to be sub packaged into dataLoader. Given a data set, specify batchsize, specify whether to disrupt, and specify several processes to read. This is used a lot, so I won't say more

def softmax(X):
    X_exp = torch.exp(X)  #First, the input X, which may be a vector or a matrix, is solved exponentially
    partition = X_exp.sum(1, keepdim=True)  #Now we are eliminating the dimension of column. Why? Because each row is a sample, we require row sum rather than column sum
    return X_exp / partition  # The broadcast mechanism is applied here

Let's take this example
Our input is
X is a matrix of [256784], W is a matrix of [78410], that is, a matrix of [25610]. We perform a softmax operation on a matrix of [25610].
In our example, our row represents our sample, so we need a row sum, not a column sum.
Finally, it is applied to the broadcast mechanism. When we use the summation, we obtain a [2561] matrix, which needs to be extended to
A [256 * 10] matrix.

def net(X):
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)    #The reshape -1 parameter means that the size of this position is calculated according to other dimensions
    #Here, we elongated the dimension of X from a 256 * 1 * 28 * 28 dimension to a 256 * 784 dimension in advance

Here we define our network layer. Our network layer can simply understand a linear mapping plus a softmax

def cross_entropy(y_hat, y):
    return - torch.log(y_hat[range(len(y_hat)), y])   #To define our loss function here, we only need to know that for classification problems, we use the loss function of cross entropy most of the time```
We define our loss function by using cross entropy. We will understand the specific formula later.
But I can only say

y_hat is a matrix of 256 10
The dimension of y is a matrix of 2561

def train_epoch_ch3(net, train_iter, loss, updater):#Function that really starts training
    """The training model has an iteration cycle (see Chapter 3 for definition)."""
    # Set the model to training mode
    #net is the network we created. Here, it is a full connection layer of softmax
    #train_iter is our training data set
    #Loss is the loss function we define
    #updater is the sgd optimization function, which is very simple, that is, take a lr learning rate value for each parameter in the gradient direction
    if isinstance(net, torch.nn.Module):#First, set the network mode to the training mode, that is, start the gradient recording and gradient calculation
        net.train()
    # Total training loss, total training accuracy, number of samples
    metric = Accumulator(3) #Create an accumulator
    for X, y in train_iter:
    #According to our definition, each time we will read a label corresponding to 256 pictures from the data
    # The dimension of X is 256 * 1 * 28 * 28
    # The dimension of y is 256 * 1

        # Calculate the gradient and update the parameters
        y_hat = net(X)  #To calculate the predicted value, in fact, what comes out here is a 256 * 10 matrix
        l = loss(y_hat, y)  #To calculate the loss, cross entropy is used as the loss function
        if isinstance(updater, torch.optim.Optimizer):
            # Use the built-in optimizer and loss function of pytorch, which is the case when the api with pytorch is used
            updater.zero_grad() #Clear the gradient, otherwise it will accumulate and stack on the original gradient
            l.backward()#Calculated gradient
            updater.step()#Take the calculated gradient update model
            metric.add(float(l) * len(y), accuracy(y_hat, y),
                       y.size().numel())#Start add
        else:
            # Using custom optimizers and loss functions
            l.sum().backward()#What if we use our own way of definition? It's another case
            updater(X.shape[0]) #Update parameters
            metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())#calculation
    # Return training loss and training accuracy
    return metric[0] / metric[2], metric[1] / metric[2]   #Return loss and quasi loss rate

Remember that our steps must be to calculate the loss first, then call the backwar() directional propagation function according to the calculated loss, and then update the model. It must be such an operation

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
    """Training model (see Chapter 3 for definition)."""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc']) #This part is an animation function. We can put it for a while
    for epoch in range(num_epochs): #An iteration is performed according to our specified training cycle
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        #net is the network we created. Here, it is a full connection layer of softmax
        #train_iter is our training data set
        #Loss is the loss function we define
        #updater is the sgd optimization function, which is very simple, that is, take a lr learning rate value for each parameter in the gradient direction
        
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc
def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)
#This sgd is an optimization update method. For each parameter, the learning rate lr and gradient direction are optimized

batch_size = 256    #Here we initialize a batch_size of size
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size) #Here we first get our training set and our test set
for X, y in train_iter:
  print(X.reshape(-1, 784).shape)
  break


num_inputs = 784
num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True) #Create a data set with mean value of 0 and variance of 0.01, which can be understood as a neural network with input of 784 dimensions and output of 10 dimensions, and turn on gradient recording
b = torch.zeros(num_outputs, requires_grad=True)    #Create our bias and turn on gradient recording


num_epochs = 10
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater) #In theory, you can start running

What if you use pytorch's api?

batch_size = 256
train_iter,test_iter = d2l.load_data_fashion_mnist(batch_size)  #I won't say much else. This data initialization still needs to be initialized by myself, baby


net = nn.Sequential(nn.Flatten(),nn.Linear(784,10))# Here, nn.Flatten is to smooth the data. The default is from dimension 1, which is different from torch.flatten()

def init_weight(m):
  if type(m) == nn.Linear:
    nn.init.normal_(m.weight,std=0.01)  #We begin to define how our parameters are initialized
net.apply(init_weights)#Initialize our function

loss = nn.CrossEntropyLoss()#Call our api directly in the future, which is our loss function

trainer = torch.optim.SGD(net.parameters(), lr=0.1)#This is our optimization function

num_epochs = 10#Discussion on definition optimization
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)#It's so simple and elegant to start training

From here, we can see that the essence of the so-called liniar is the multiplication of matrices. The so-called neural network is such a LInear change one by one, which is not difficult.
And here I also want to propose that the optimization function is what we should do after determining the updated gradient and learning rate. Backward () is back propagation to calculate the gradient, and loss specifies our loss function. This is a reverse process. The forward process is to calculate the loss function first, because the training mode is turned on, the gradient will be recorded, then the gradient will be calculated and recorded according to backward(), and then the parameters will be updated with the optimization function, above!!!! A classic process

Topics: AI Pytorch Deep Learning