pytorch personal learning notes - detailed explanation and usage of Normalize() parameter

Posted by webaddict on Sun, 05 Sep 2021 18:45:21 +0200

The reason is that some T.Normalize parameters are a fixed pile of 0.5, while others are the calculated mean standard deviation in accordance with the function definition

1, Function function (quick start)

T.Normalize(mean, std)

Input a tensor in the form of (channel,height,width), and input the corresponding mean and standard deviation of each channel as parameters. The function will use these two parameters to standardize each layer (make the data mean 0 and variance 1) and output it. Namely:

o u t p u t [ c h a n n e l ] = i n p u t [ c h a n n e l ] − m e a n [ c h a n n e l ] s t d [ c h a n n e l ] output[channel] = \frac{input[channel]-mean[channel]}{std[channel]} output[channel]=std[channel]input[channel]−mean[channel]​
Example: (see detailed explanation of parameters for specific explanation)

import torch
from torchvision import transforms as T
x = np.array([[[1, 1],
               [3, 3]],
              [[2, 2],
               [4, 4]]])
tr = T.Normalize([2,3], [1,1])
tr(x.float())
'''
Output:
tensor([[[-1., -1.],
         [ 1.,  1.]],
        [[-1., -1.],
         [ 1.,  1.]]])
'''

2, Detailed explanation and example of transform.Normalize parameters

T.Normalize(mean, std)

Parameter details:

  • mean: (list) the length is the same as the number of channels entered, representing the average of all values on each channel.
  • std: (list) the length is the same as the number of channels entered, representing the standard deviation of all values on each channel.
import torch

x=torch.tensor([[[1,1],[3,3]],[[2,2],[4,4]]])
x
'''
tensor([[[1, 1],
         [3, 3]],  # The mean value of the first channel is 2 and the standard deviation is 1
         
        [[2, 2],
         [4, 4]]]) # The mean value of the second channel is 3 and the standard deviation is 1
'''

#  Therefore, the mean of this tensor should be [2,3], and the std should be [1,1]

Taking the mean and standard deviation of each channel as parameter input, each channel can be standardized, that is, the mean is 0 and the variance is 1

from torchvision import transforms as T
tr = T.Normalize([2,3], [1,1])
tr(x.float())
'''
tensor([[[-1., -1.],
         [ 1.,  1.]],
        [[-1., -1.],
         [ 1.,  1.]]])
'''

3, Common usage (explains why sometimes the parameter is fixed 0.5)

1. As shown in the above example, calculate the mean and standard deviation of each layer of channels respectively, and use them as parameter input to standardize each layer of channels

reflection:

Question: normally, when the input is determined, the mean and standard deviation of each channel are determined, but why does this function need to input these two parameters manually?

Answer: when the input picture is large, the calculation of channel mean and standard deviation is relatively time-consuming. Artificially, these two values can be calculated or estimated by sampling, which is more flexible.

2. Set both parameters to 0.5 and use them together with transforms.ToTensor() to forcibly scale the data to the [- 1,1] interval( Standardization can only ensure that most data is near 0 - 3 σ (principle)

In fact, this usage is not the original intention of the function, but borrows the formula (x-mean) / std.

First, ToTensor can scale the value of the image from [0255] to [0,1] through out = in/255, and then through the interval formula:
X n e w = a + ( b − a ) X − M i n M a x − M i n X_{new} = a+(b-a)\frac{X-Min}{Max-Min} Xnew​=a+(b−a)Max−MinX−Min​

  • This formula can reduce X to the range [a,b]

Scale [0,1] to [- 1,1]: bring a=-1, b=1 into the above formula to get x_new = (x-0.5)/0.5, that is, both mean and std are set to 0.5

from torchvision import transforms as T
import numpy as np
x = np.array([[[253, 179],
               [102, 17]],
              [[45, 99],
               [4, 223]]])
x_pil = T.ToPILImage()(x.astype('uint8'))  
# Only when the input array is uint8 type, ToTensor can divide by 255,
# A ToPILImage may be published later to explain it in detail
x_t = T.ToTensor()(x_pil)
T.Normalize([0.5,0.5], [0.5,0.5])(x_t)
'''
tensor([[[ 0.9843, -0.2000],
         [-0.6471, -0.9686]],
        [[ 0.4039, -0.8667],
         [-0.2235,  0.7490]]])
'''

Topics: Python Pytorch Deep Learning