Use of transform in pytorch

Posted by ssidellq on Wed, 27 Oct 2021 03:27:36 +0200

1.ToTensor convert picture format

Transform is a transform.py file, which contains multiple classes, including the ToTensor class. Note that ToTensor is a class, not a method. First, create an object, and then use the object to convert the picture's PIL format or numpy format into the Tensor format. The code is as follows:

from PIL import Image
from torchvision import transforms

img_path = "data/train/ants_image/0013035.jpg"
img_PIL = Image.open(img_path)

tensor_trans = transforms.ToTensor()
tensor_img = tensor_trans(img_PIL)

print(tensor_img)

The above code is to convert the PIL format into Tensor format. If it is a picture in numpy format, in addition to the previous forced conversion of the format to numpy format with numpy.array, you can also directly obtain a picture in numpy format through the cv2.imread function in OpenCV. The code is as follows:

import cv2
cv2.imread(img_path)

2.Normalize picture standardization

Normalize class is used to normalize the input picture. The usage is to create object = transform. Normalize (n-dimensional mean, n-dimensional standard deviation). The first parameter of the input is the n-dimensional mean of the picture, and the second parameter is the n-dimensional variance of the picture. Here n refers to the number of channels, generally 3 channels, that is, the first parameter is a one-dimensional array containing three numbers, The first number is the average value of the first channel, the second number is the average value of the second channel, and the third number is the average value of the third channel. The code is as follows:

print(img_tensor[0][0][0])
trans_nor = transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
img_norm = trans_nor(img_tensor)
print(img_norm[0][0][0])

0.5 is taken randomly, and the output result is that the value of the first channel of the first pixel in the figure is significantly different from the original value after standardization.

3.Resize picture size scaling

The resize class is used to scale the picture according to the specified size. Note that the original picture should be a PIL picture type, and the output image is also a PIL picture type. If you want to perform other operations on the picture, you also need to use the previous ToTensor for type conversion. The usage is to create an object = transform. Resize ((size)), where the size is a one-dimensional array containing two numbers. The code is as follows:

print(img.size)
trans_resize = transforms.Resize((512, 512))
img_resize = trans_resize(img)
print(img_resize)

The output will see the scaled picture size of 512 * 512.

4.Compose integrates the above classes

The compose class is used to integrate any of the above-mentioned multiple classes or other classes in transform, that is, wrap the methods of multiple classes, so that the operation of multiple classes can be completed by composing a class, which is equivalent to a packaging operation. The usage is to create an object = transform.Compose([one kind of object, another kind of object,...]), in which the output type of the object of the previous class should be the input type of the object of the latter kind, so that the original image to be processed can carry out a series of operations of multiple classes in order. For example, use compose to integrate the above two classes, resize and ToTensor. Resize is created in another way, which is that the parameter has only a single value instead of an array. This method means to scale the value by the height. The code is:

trans_resize_2 = transforms.Resize(512)
trans_compose = transforms.Compose([trans_resize_2, trans_totensor])
img_resize_2 = trans_compose(img)
writer.add_image("Resize", img_resize_2, 1)

5. Other classes such as RandomCrop are cut randomly

RandomCrop is used to crop a picture to a specified size. The usage is to create object = RandomCrop((x, y)) or create object = RandomCrop(?). When the former is used, it is to crop a xy size picture, and when the latter is used, it is to crop a?? The size of the image, and each call will always cut back, that is, cut the part that has not been cut. The code is:

trans_random = transforms.RandomCrop((100, 200))
trans_compose_2 = transforms.Compose([trans_random, trans_totensor])
for i in range(10):
    img_crop = trans_compose_2(img)
    writer.add_image("RandomCrop", img_crop, i)

6. Complete codes of the above categories

from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms

writer = SummaryWriter("logs")
img = Image.open("data/train/ants_image/0013035.jpg")
print(img)

# ToTensor
trans_totensor = transforms.ToTensor()
img_tensor = trans_totensor(img)

# Normalize
print(img_tensor[0][0][0])
trans_nor = transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
img_norm = trans_nor(img_tensor)
print(img_norm[0][0][0])
writer.add_image("Normalize", img_norm)

# Resize
print(img.size)
trans_resize = transforms.Resize((512, 512))
img_resize = trans_resize(img)
img_resize = trans_totensor(img_resize)
print(img_resize)
writer.add_image("Resize", img_resize)

# Compose
trans_resize_2 = transforms.Resize(512)
trans_compose = transforms.Compose([trans_resize_2, trans_totensor])
img_resize_2 = trans_compose(img)
writer.add_image("Resize", img_resize_2, 1)

# RandomCrop
trans_random = transforms.RandomCrop((100, 200))
trans_compose_2 = transforms.Compose([trans_random, trans_totensor])
for i in range(10):
    img_crop = trans_compose_2(img)
    writer.add_image("RandomCrop", img_crop, i)

writer.close()

Topics: Python Pytorch Computer Vision

Programmer Think