Neural networks consist of layers / modules that perform operations on data. torch.nn Namespace provides all the building blocks needed to build your own neural network. Each module in PyTorch is nn.Module Subclass of. Neural network is a module itself, which is composed of other modules (layers). This nested structure allows easy construction and management of complex architecture.
In the following section, we will build a neural network to classify the images in the FashionMNIST dataset.
import os import torch from torch import nn from torch.utils.data import DataLoader from torchvision import datasets, transforms
1. Acquire training equipment
We hope to be able to train our model on hardware accelerators such as GPU (if available). Let's check it torch.cuda Whether it is available, otherwise we continue to use the CPU.
device = 'cuda' if torch.cuda.is_available() else 'cpu' print('Using {} device'.format(device))
Out:
Using cuda device
2. Definition class
We inherited NN Module to define our neural network, and initialize the neural network layer in init. Each NN Module subclasses implement the operation of input data in the forward method.
class NeuralNetwork(nn.Module): def __init__(self): super(NeuralNetwork, self).__init__() self.flatten = nn.Flatten() self.linear_relu_stack = nn.Sequential( nn.Linear(28*28, 512), nn.ReLU(), nn.Linear(512, 512), nn.ReLU(), nn.Linear(512, 10), nn.ReLU() ) def forward(self, x): x = self.flatten(x) logits = self.linear_relu_stack(x) return logits
We create an instance of NeuralNetwork, move it to the device, and print its structure.
model = NeuralNetwork().to(device) print(model)
Out:
NeuralNetwork( (flatten): Flatten(start_dim=1, end_dim=-1) (linear_relu_stack): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=512, bias=True) (3): ReLU() (4): Linear(in_features=512, out_features=10, bias=True) (5): ReLU() ) )
In order to use the model, we pass the input data to it. This will execute the forward of the model, as well as some background operations . Do not call model directly forward()!
Calling the model on the input returns a 10 dimensional tensor containing the original predicted value of each class. We pass it on to NN An example of the softmax module is used to obtain the prediction probability.
X = torch.rand(1, 28, 28, device=device) logits = model(X) pred_probab = nn.Softmax(dim=1)(logits) y_pred = pred_probab.argmax(1) print(f"Predicted class: {y_pred}")
Out:
Predicted class: tensor([2], device='cuda:0')
3. Model layer
Let's decompose the layers in the FashionMNIST model. To illustrate this, we will take a small batch of samples composed of three images with a size of 28x28 to see what happens when we deliver it over the network.
input_image = torch.rand(3,28,28) print(input_image.size())
Out:
torch.Size([3, 28, 28])
- nn.Flatten
We initialize NN Flatten layer to convert each 2D 28x28 image into a continuous array of 784 pixel values (maintaining a small batch dimension (dim=0)).
flatten = nn.Flatten() flat_image = flatten(input_image) print(flat_image.size())
Out:
torch.Size([3, 784])
- nn.Linear
The linear layer is a module that applies a linear transformation to the input using its stored weights and deviations.
layer1 = nn.Linear(in_features=28*28, out_features=20) hidden1 = layer1(flat_image) print(hidden1.size())
Out:
torch.Size([3, 20])
- nn.ReLU
Nonlinear activation is the reason for creating complex mappings between the inputs and outputs of the model. They are applied after linear transformation to introduce nonlinearity and help neural networks learn a variety of phenomena.
In this model, we use nn.ReLU However, there are other activation functions that can introduce nonlinearity into the model.
print(f"Before ReLU: {hidden1}\n\n") hidden1 = nn.ReLU()(hidden1) print(f"After ReLU: {hidden1}")
Out:
Before ReLU: tensor([[ 0.6527, -0.0592, 0.0030, 0.1115, -0.7055, -0.3122, 0.1380, 0.2645, 0.2227, -0.0342, 0.1871, -0.1997, -0.1490, 0.6607, 0.0505, 0.8880, -0.0214, 0.0201, -0.4516, 0.4307], [ 0.3041, 0.1164, 0.0963, -0.0067, -0.4396, -0.3806, -0.2247, 0.5759, 0.3805, 0.0916, 0.4540, -0.1994, -0.0649, 0.3390, -0.0996, 0.8811, -0.1655, 0.1817, -0.6419, 0.4605], [ 0.5195, 0.0234, 0.1066, 0.0727, -0.6756, -0.3488, 0.1052, 0.7148, -0.1316, -0.1426, -0.1310, -0.0110, 0.1333, 0.1948, -0.0153, 0.8247, -0.2263, 0.1925, -0.5722, 0.1346]], grad_fn=<AddmmBackward>) After ReLU: tensor([[0.6527, 0.0000, 0.0030, 0.1115, 0.0000, 0.0000, 0.1380, 0.2645, 0.2227, 0.0000, 0.1871, 0.0000, 0.0000, 0.6607, 0.0505, 0.8880, 0.0000, 0.0201, 0.0000, 0.4307], [0.3041, 0.1164, 0.0963, 0.0000, 0.0000, 0.0000, 0.0000, 0.5759, 0.3805, 0.0916, 0.4540, 0.0000, 0.0000, 0.3390, 0.0000, 0.8811, 0.0000, 0.1817, 0.0000, 0.4605], [0.5195, 0.0234, 0.1066, 0.0727, 0.0000, 0.0000, 0.1052, 0.7148, 0.0000, 0.0000, 0.0000, 0.0000, 0.1333, 0.1948, 0.0000, 0.8247, 0.0000, 0.1925, 0.0000, 0.1346]], grad_fn=<ReluBackward0>)
- nn.Sequential
nn.Sequential Is an ordered module container. Data passes through all modules in the same order as defined. You can use sequential containers to combine a sequence like seq_modules such a fast network.
seq_modules = nn.Sequential( flatten, layer1, nn.ReLU(), nn.Linear(20, 10) ) input_image = torch.rand(3,28,28) logits = seq_modules(input_image)
- nn.Softmax
The last linear layer of the neural network returns the original value in logits - [-infty, infty] - to nn.Softmax modular. logits are scaled to values [0, 1], representing the prediction probability of the model for each category. The dim parameter indicates a dimension where the sum of values must be 1.
softmax = nn.Softmax(dim=1) pred_probab = softmax(logits)
4. Model parameters
Many layers in the neural network are parameterized, that is, they have relevant weights and deviations optimized during training. Subclass NN The module automatically tracks all fields defined in the model object and uses the parameters() or named of the model_ The PA {rameters() method makes all parameters accessible.
In this example, we iterate over each parameter and print a preview of its size and its value.
print("Model structure: ", model, "\n\n") for name, param in model.named_parameters(): print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")
Out:
Model structure: NeuralNetwork( (flatten): Flatten(start_dim=1, end_dim=-1) (linear_relu_stack): Sequential( (0): Linear(in_features=784, out_features=512, bias=True) (1): ReLU() (2): Linear(in_features=512, out_features=512, bias=True) (3): ReLU() (4): Linear(in_features=512, out_features=10, bias=True) (5): ReLU() ) ) Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0223, -0.0229, -0.0062, ..., -0.0287, 0.0203, 0.0229], [ 0.0346, 0.0006, -0.0277, ..., 0.0335, -0.0079, 0.0116]], device='cuda:0', grad_fn=<SliceBackward>) Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([ 0.0295, -0.0277], device='cuda:0', grad_fn=<SliceBackward>) Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0148, -0.0338, 0.0335, ..., -0.0288, -0.0252, 0.0087], [ 0.0210, -0.0399, -0.0356, ..., 0.0247, 0.0369, -0.0389]], device='cuda:0', grad_fn=<SliceBackward>) Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0011, -0.0219], device='cuda:0', grad_fn=<SliceBackward>) Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[ 0.0128, -0.0335, -0.0053, ..., -0.0127, 0.0053, 0.0172], [-0.0397, 0.0174, -0.0358, ..., 0.0409, 0.0423, 0.0149]], device='cuda:0', grad_fn=<SliceBackward>) Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([-0.0104, 0.0200], device='cuda:0', grad_fn=<SliceBackward>)