The core of all neural networks in Python is the autograd package, which provides the mechanism of automatic derivation for all operations on tensors. It is a framework defined at runtime, which means that the back propagation is determined by how the code runs, and each iteration can be different.
- torch.tensor Is the core class of this package. If you set its property. Required_ grad = true, then it will track all operations on the tensor. When the calculation is completed, it can automatically calculate all grads by calling. backward(). All gradients of this tensor are automatically added to the. grad attribute. If you want to prevent a tensor from being tracked, you can call the. detach() method to separate it from the calculation history and organize its future calculation to be tracked. To prevent tracing history and using memory, you can wrap code blocks in with torch.no_grad(): is particularly useful when evaluating models.
- There is also a class that is very important for the implementation of autograd: Function. The sensor and Function are connected to each other and encode a complete calculation history. Each tensor has a.grad_fn property, which references the Function that creates the sensor itself
- Code demonstration
import torch x=torch.ones(2,2,requires_grad=True) print(x) y=x+2 print(y) print(y.grad_fn) a=torch.randn(2,2) a=((a*3)/(a-1)) print(a.requires_grad) a.requires_grad_(True) print(a.requires_grad) b = (a * a).sum() print(b.grad_fn)
tensor([[1., 1.], [1., 1.]], requires_grad=True) tensor([[3., 3.], [3., 3.]], grad_fn=<AddBackward0>) <AddBackward0 object at 0x000001F39E4BDD30> False True <SumBackward0 object at 0x000001F39E4BDF98>
pytorch is generally used torch.nn The NN package relies on the autograd package to define models and derive them nn.Module Contains layers and a forward(input) method that returns output.
Typical neural network training process
- Defining neural networks with weights
- Iterate over the input dataset
- Process input over the network
- Calculate loss
- Parameters of back propagation gradient to network
- Update network parameter weight = weight - learning_rate * gradient
import torch import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # Input image channel: 1; output channel: 6; 5x5 convolution kernel self.conv1 = nn.Conv2d(1, 6, 5) self.conv2 = nn.Conv2d(6, 16, 5) # an affine operation: y = Wx + b self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): # 2x2 Max pooling x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # If it is a square matrix, it can be defined with only one number x = F.max_pool2d(F.relu(self.conv2(x)), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features(self, x): size = x.size()[1:] # Remove all other dimensions of the batch dimension num_features = 1 for s in size: num_features *= s return num_features net = Net() print(net)
Net( (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) (fc1): Linear(in_features=400, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) )
After we define the forward function, the backward function will be automatically defined when we use autograd, and the backward function is used to calculate the derivative. We can use any operation and calculation for tensor in forward function.
params=list(net.parameters()) print(len(params)) print(params.size()) input=torch.rand(1,1,32,32) out=net(input) print(out)
10 torch.Size() tensor([[ 0.0475, -0.0317, 0.0667, -0.0221, -0.0535, -0.0424, -0.0987, 0.0406, 0.0520, -0.1062]], grad_fn=<AddmmBackward>)
input=torch.randn(1,1,32,32) output = net(input) print(output.shape) target = torch.randn(10) # Use analog data in this example target = target.view(1, -1) # Make the target value consistent with the data value size print(target.shape) criterion = nn.MSELoss() loss = criterion(output, target) print(loss) print(loss.grad_fn)
torch.Size([1, 10]) torch.Size([1, 10]) tensor(1.4566, grad_fn=<MseLossBackward>) <MseLossBackward object at 0x000001F39E500780>
At this time, if you use. Grad of loss_ The FN attribute tracks the back propagation process, as shown in the following figure
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss
At this point, call loss.backward(), the whole figure starts to calculate the loss differential, and all requirements are set in the figure_ The. Grad attribute of the tensor with grad = true begins to accumulate gradient tensors. Note that the forward process only defines the calculation diagram and is running loss.backward() the gradient is not calculated until.
We just need to call loss.backward() to back propagation error. We need to clear the existing gradient, otherwise the gradient will be accumulated with the existing gradient.
Now, we will call loss.backward(), and check the bias gradient of conv1 layer before and after back propagation.
net.zero_grad() print('conv1.bias.grad before backward:%s'%(net.conv1.bias.grad)) loss.backward() print('conv1.bias.grad after backward:%s'%(net.conv1.bias.grad))
You can see that the gradient is not calculated before backward is called.
conv1.bias.grad before backward:tensor([0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward:tensor([ 0.0237, 0.0092, 0.0009, 0.0056, -0.0152, 0.0179])
When using neural networks, you may want to use different update rules, such as SGD, nesterov SGD, Adam, RMSProp, etc. torch.optim Provides a good package that implements all of these methods. Easy to use:
import torch.optim as optim #Create optimizer optimizer=optim.SGD(net.parameters(),lr=0.01) #In the iteration of training optimizer.zero_grad()#Clear the gradient, otherwise the gradient will be superimposed output=net(input) loss=criterion(output,target) loss.backward() optimizer.step()#Update gradient parameters
A neural network can be constructed simply by the above process.