Relevant modules of YOLOv5 mainly exist in common. In PY
Function: down sampling
The function of Focus module is to slice the picture, which is similar to down sampling. First, change the picture to 320 × three hundred and twenty × 12, and then go through 3 × 3, the output channel 32 finally becomes 320 × three hundred and twenty × The characteristic graph of 32 is four times the amount of general convolution calculation. In this way, there will be no information loss in the down sampling.
Output: 32 × three hundred and twenty × three hundred and twenty
class Focus(nn.Module): # Focus wh information into c-space def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups super().__init__() self.conv = Conv(c1 * 4, c2, k, s, p, g, act) # self.contract = Contract(gain=2) def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2) return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)) # return self.conv(self.contract(x))
Perform convolution, BN and activation function operations on the input characteristic graph. In the new version of YOLOv5, the author uses Silu as the activation function
class Conv(nn.Module): # Standard convolution # ch_in, ch_out, kernel, stride, padding, groups def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # k is the size of convolution kernel and s is the step size # G is group. When g=1, it is equivalent to ordinary convolution. When G > 1, group convolution is performed. # Compared with ordinary convolution, grouping convolution reduces the amount of parameters and improves the training efficiency super(Conv, self).__init__() self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False) self.bn = nn.BatchNorm2d(c2) self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) def forward(self, x): return self.act(self.bn(self.conv(x))) def fuseforward(self, x): return self.act(self.conv(x))
- First reduce the number of channels and then expand them (by default, it is reduced to half). The specific method is to 1 × 1 reduce the channel in half by convolution, and then pass 3 × 3 convolution doubles the number of channels and obtains features (two standard convolution modules are used), and the number of input and output channels does not change.
- The shortcut parameter controls whether residual connections are made (using ResNet).
- Bottleneck in the backbone of yolov5 makes shortcut True by default, and bottleneck in the head does not use shortcut.
- Corresponding to ResNet, add is used for feature fusion instead of concat, so that the number of features after fusion remains unchanged.
class Bottleneck(nn.Module): # Standard bottleneck def __init__(self, c1, c2, shortcut=True, g=1, e=0.5): # ch_in, ch_out, shortcut, groups, expansion # Special parameters # Shortcut: whether to add a shortcut connection to the bottleneck structure department. After adding, it will be the ResNet module; # e. Expansion. The channel expansion rate of the bottleneck in the bottleneck structure is 0.5 by default, which becomes 1 / 2 of the input super(Bottleneck, self).__init__() c_ = int(c2 * e) # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c_, c2, 3, 1, g=g) self.add = shortcut and c1 == c2 def forward(self, x): return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
- In the new version of yolov5, the author transforms the bottleneckcsp (Bottleneck layer) module into C3 module. Its structure and function are basically the same. They are all CSP architecture, but there are differences in the selection of correction units. It includes three standard convolution layers and multiple Bottleneck modules (the number is determined by the product of n and depth_multiple parameters in the configuration file. yaml)
- The difference between C3 and BottleneckCSP module is that the Conv module after residual output is removed, and the activation function in the standard convolution module after concat is also changed from LeakyRelu to SiLU (ibid.).
- This module is the main module for learning residual characteristics. Its structure is divided into two branches. One uses multiple Bottleneck stacks and three standard convolution layers specified above, the other passes through only one basic convolution module, and finally concat the two branches.
class C3(nn.Module): # CSP Bottleneck with 3 convolutions def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion super(C3, self).__init__() c_ = int(c2 * e) # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c1, c_, 1, 1) self.cv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2) self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)]) def forward(self, x): return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
- SPP is the abbreviation of spatial pyramid pooling. It first halves the input channel through a standard convolution module, and then makes maxpooling with kernel size of 5, 9 and 13 respectively (padding is adaptive for different kernel sizes).
- Concatenate the results of the three times of maximum pooling with the data without pooling operation, and the number of channel s after final merging is twice that of the original.
class SPP(nn.Module): # Spatial pyramid pooling layer used in YOLOv3-SPP def __init__(self, c1, c2, k=(5, 9, 13)): super(SPP, self).__init__() c_ = c1 // 2 # hidden channels self.cv1 = Conv(c1, c_, 1, 1) self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1) self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k]) def forward(self, x): x = self.cv1(x) return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))