[YOLO] YOLOv5 module analysis [first draft, which will be supplemented later...]

Posted by gigabyt3r on Sun, 30 Jan 2022 21:12:43 +0100

Relevant modules of YOLOv5 mainly exist in common. In PY

Focus module

Function: down sampling

The function of Focus module is to slice the picture, which is similar to down sampling. First, change the picture to 320 × three hundred and twenty × 12, and then go through 3 × 3, the output channel 32 finally becomes 320 × three hundred and twenty × The characteristic graph of 32 is four times the amount of general convolution calculation. In this way, there will be no information loss in the down sampling.

Input: 3x640x640

Output: 32 × three hundred and twenty × three hundred and twenty

Code implementation:

class Focus(nn.Module):
    # Focus wh information into c-space
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
        # self.contract = Contract(gain=2)

    def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)
        return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
        # return self.conv(self.contract(x))

Conv module

Perform convolution, BN and activation function operations on the input characteristic graph. In the new version of YOLOv5, the author uses Silu as the activation function

Code implementation:

class Conv(nn.Module):
    # Standard convolution
    # ch_in, ch_out, kernel, stride, padding, groups
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):
    	# k is the size of convolution kernel and s is the step size
    	# G is group. When g=1, it is equivalent to ordinary convolution. When G > 1, group convolution is performed.
    	# Compared with ordinary convolution, grouping convolution reduces the amount of parameters and improves the training efficiency
        super(Conv, self).__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))
    def fuseforward(self, x):
        return self.act(self.conv(x))

Bottleneck module


  1. First reduce the number of channels and then expand them (by default, it is reduced to half). The specific method is to 1 × 1 reduce the channel in half by convolution, and then pass 3 × 3 convolution doubles the number of channels and obtains features (two standard convolution modules are used), and the number of input and output channels does not change.
  2. The shortcut parameter controls whether residual connections are made (using ResNet).
  3. Bottleneck in the backbone of yolov5 makes shortcut True by default, and bottleneck in the head does not use shortcut.
  4. Corresponding to ResNet, add is used for feature fusion instead of concat, so that the number of features after fusion remains unchanged.

Code implementation:

class Bottleneck(nn.Module):
    # Standard bottleneck

    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion
        # Special parameters
        # Shortcut: whether to add a shortcut connection to the bottleneck structure department. After adding, it will be the ResNet module;
        # e. Expansion. The channel expansion rate of the bottleneck in the bottleneck structure is 0.5 by default, which becomes 1 / 2 of the input
        super(Bottleneck, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_, c2, 3, 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

C3 module


  1. In the new version of yolov5, the author transforms the bottleneckcsp (Bottleneck layer) module into C3 module. Its structure and function are basically the same. They are all CSP architecture, but there are differences in the selection of correction units. It includes three standard convolution layers and multiple Bottleneck modules (the number is determined by the product of n and depth_multiple parameters in the configuration file. yaml)
  2. The difference between C3 and BottleneckCSP module is that the Conv module after residual output is removed, and the activation function in the standard convolution module after concat is also changed from LeakyRelu to SiLU (ibid.).
  3. This module is the main module for learning residual characteristics. Its structure is divided into two branches. One uses multiple Bottleneck stacks and three standard convolution layers specified above, the other passes through only one basic convolution module, and finally concat the two branches.

code implementation

class C3(nn.Module):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super(C3, self).__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # act=FReLU(c2)
        self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)])

    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))

SPP module


  1. SPP is the abbreviation of spatial pyramid pooling. It first halves the input channel through a standard convolution module, and then makes maxpooling with kernel size of 5, 9 and 13 respectively (padding is adaptive for different kernel sizes).
  2. Concatenate the results of the three times of maximum pooling with the data without pooling operation, and the number of channel s after final merging is twice that of the original.
    Code implementation:
class SPP(nn.Module):
    # Spatial pyramid pooling layer used in YOLOv3-SPP
    def __init__(self, c1, c2, k=(5, 9, 13)):
        super(SPP, self).__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])

    def forward(self, x):
        x = self.cv1(x)
        return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))


  1. https://blog.csdn.net/Mr_Clutch/article/details/119912926?spm=1001.2014.3001.5502

Topics: neural networks Deep Learning CNN