nn.AvgPool2d -- two dimensional average pooling operation

Posted by jsnyder2k on Tue, 28 Sep 2021 20:52:24 +0200

PyTorch learning notes: nn.AvgPool2d - two dimensional average pooling operation

torch.nn.AvgPool2d( kernel_size , stride=None , padding=0 , ceil_mode=False , count_include_pad=True , divisor_override=None )

Function: 2D average pooling operation is applied to the input signal composed of multiple planes. The specific calculation formula is as follows:
o u t ( N i , C i , h , w ) = 1 k H ∗ k W ∑ m = 0 k H − 1 ∑ m = 0 k H − 1 i n p u t ( N i , C i , s t r i d e [ 0 ] × h + m , s t r i d e [ 1 ] × w + n ) false set up transport enter ruler inch yes ( N , C , H , W ) , transport Out ruler inch yes ( N , C , H o u t , W o u t ) , pool turn nucleus ruler inch yes ( k H , k W ) out(N_i,C_i,h,w)=\frac{1}{kH*kW}\sum^{kH-1}_ {m=0}\sum^{kH-1}_ {M = 0} input (n_i, c_i, stripe [0] \ times H + m, stripe [1] \ times W + n) \ \ suppose the input size is (N,C,H,W), the output size is (n, C, H {out}, w {out}), and the pool core size is (kH,kW) out(Ni​,Ci​,h,w)=kH∗kW1​m=0∑kH−1​m=0∑kH−1​input(Ni​,Ci​,stride[0] × h+m,stride[1] × w+n) assume that the input size is (N,C,H,W), the output size is (N,C,Hout, Wout), and the pool core size is (kH,kW)
If padding is non-zero, 0 will be implicitly filled around the input image. You can specify the parameter count_include_pad to determine whether the 0 is included in the pool calculation process.

Input:

  • kernel_size: the size of the pooled core
  • Stripe: the moving stride of the window, which is the same as the kernel by default_ Consistent size
  • Padding: zero padding width size on both sides
  • ceil_mode: when set to True, the operation of rounding up is adopted in the process of calculating the output shape; otherwise, the operation of rounding down is adopted
  • count_include_pad: Boolean type. When True, zero padding will be included in the average pooling calculation; otherwise, zero padding will not be included
  • divisor_override: if specified, the divisor will be replaced by the divisor_override. In other words, if this variable is not specified, the calculation process of the average pool is actually in a pool core, adding the elements and dividing them by the size of the pool core, that is, the divisor_override defaults to the high of the pooled core × Wide; If this variable is specified, the pooling process is to add the elements in the pooled core and divide by the division_ override.

be careful:

  • Kernel of parameter_ Size, stripe and padding can be:

    • Integer, in which case the height and width dimensions are the same
    • Tuple, containing two integers, the first for the height dimension and the second for the width dimension
  • The calculation formula of output shape is:
    H o u t = ⌊ H i n + 2 × p a d d i n g [ 0 ] − k e r n e l _ s i z e [ 0 ] s t r i d e [ 0 ] ⌋ W o u t = ⌊ W i n + 2 × p a d d i n g [ 1 ] − k e r n e l _ s i z e [ 1 ] s t r i d e [ 1 ] ⌋ his in , H i n and W i n by transport enter of high and wide , Silence recognize towards lower take whole ( can finger set ginseng number come repair change take whole gauge be ) H_ {out}=\lfloor{\frac{H_{in}+2\times padding[0]-kernel\_size[0]}{stride[0]}}\rfloor\ W_ {out} = \ lfloor {\ frac {w {in} + 2 \ times padding [1] - kernel \ _size [1]} {stripe [1]}} \ rfloor \ \ where h_ {in} and W_{in} is the entered height and width, rounded down by default (you can specify parameters to modify the rounding rules) Hout​=⌊stride[0]Hin​+2 × padding[0]−kernel_size[0]​⌋Wout​=⌊stride[1]Win​+2 × padding[1]−kernel_size[1] ⌋ where Hin ⌋ and win ⌋ are the entered height and width, rounded down by default (you can specify parameters to modify the rounding rules)

  • The padding size should be smaller than the pool core size

Code case

General usage

import torch
from torch import nn
img=torch.arange(16).reshape(1,1,4,4)
# The pool core and pool step are both 2
pool=nn.AvgPool2d(2,stride=2)
img_2=pool(img)
print(img)
print(img_2)

output

# Original image
tensor([[[[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15]]]])
# The length and width of the pooled image are half of the original
tensor([[[[ 2,  4],
          [10, 12]]]])

ceil_ The difference between setting mode to True and Fasle

import torch
from torch import nn
img=torch.arange(20,dtype=torch.float).reshape(1,1,4,5)
pool_f=nn.AvgPool2d(2,stride=2,padding=0,ceil_mode=False)
pool_t=nn.AvgPool2d(2,stride=2,padding=0,ceil_mode=True)
img_2=pool_f(img)
img_3=pool_t(img)
print(img)
print(img_2)
print(img_3)

output

# Original image
tensor([[[[ 0.,  1.,  2.,  3.,  4.],
          [ 5.,  6.,  7.,  8.,  9.],
          [10., 11., 12., 13., 14.],
          [15., 16., 17., 18., 19.]]]])
# By default, ceil_mode is False
tensor([[[[ 3.,  5.],
          [13., 15.]]]])
# ceil_mode is True
tensor([[[[ 3.0000,  5.0000,  6.5000],
          [13.0000, 15.0000, 16.5000]]]])
# Since 5 cannot be divided by 2, one is rounded down and the other is rounded up

The difference between setting and not setting padding

import torch
from torch import nn
img=torch.arange(16,dtype=torch.float).reshape(1,1,4,4)
pool_t=nn.AvgPool2d(2,stride=2,padding=1)
pool_f=nn.AvgPool2d(2,stride=2)
img_2=pool_t(img)
img_3=pool_f(img)
print(img)
print(img_2)
print(img_3)

output

# Original drawing
tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]]]])
# The filling width is 1, and the 0 filled by default will be used for pooling calculation
tensor([[[[0.0000, 0.7500, 0.7500],
          [3.0000, 7.5000, 4.5000],
          [3.0000, 6.7500, 3.7500]]]])
# Unfilled results
tensor([[[[ 2.5000,  4.5000],
          [10.5000, 12.5000]]]])
# The image size obtained by pooling the filled image can be calculated by the above formula

count_ include_ The difference between setting pad to True and False

import torch
from torch import nn
img=torch.arange(16,dtype=torch.float).reshape(1,1,4,4)
pool_t=nn.AvgPool2d(2,stride=2,padding=1,count_include_pad=True)
pool_f=nn.AvgPool2d(2,stride=2,padding=1,count_include_pad=False)
img_2=pool_t(img)
img_3=pool_f(img)
print(img)
print(img_2)
print(img_3)

output

# Original drawing
tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]]]])
# Fill width is 1, count_include_pad is true by default
# Populated 0 is used for pooled calculation
tensor([[[[0.0000, 0.7500, 0.7500],
          [3.0000, 7.5000, 4.5000],
          [3.0000, 6.7500, 3.7500]]]])
# Populated 0 is not used for pooled calculations
tensor([[[[ 0.0000,  1.5000,  3.0000],
          [ 6.0000,  7.5000,  9.0000],
          [12.0000, 13.5000, 15.0000]]]])

divisor_ Difference between override set and unset

import torch
from torch import nn
img=torch.arange(16,dtype=torch.float).reshape(1,1,4,4)
pool_1=nn.AvgPool2d(2,stride=2)
pool_d1=nn.AvgPool2d(2,stride=2,divisor_override=2)
pool_d2=nn.AvgPool2d(2,stride=2,divisor_override=3)
img_1=pool_1(img)
img_2=pool_d1(img)
img_3=pool_d2(img)
print(img)
print(img_1)
print(img_2)
print(img_3)

output

# Original image
tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]]]])
# The director is not set_ Override, which is a normal average pooling operation
tensor([[[[ 2.5000,  4.5000],
          [10.5000, 12.5000]]]])
# divisor_override is set to 2. Take the four elements in the upper left corner as an example
# After pooling, the first element is the sum of the four elements in the upper left corner of the original figure divided by 2
tensor([[[[ 5.,  9.],
          [21., 25.]]]])
# divisor_override is set to 3,
# That is, the sum of the four elements divided by 3
tensor([[[[ 3.3333,  6.0000],
          [14.0000, 16.6667]]]])

Official documents

torch.nn.AvgPool2d(): https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html?highlight=avgpool2d#torch.nn.AvgPool2d

Topics: Pytorch Deep Learning