Image Classification (AlexNet): Code Notes

Posted by JNorman on Thu, 20 Jan 2022 08:28:31 +0100

AlexNet Preface Knowledge

The Macro Framework of Deep Learning: training and inference and Their Application Scenarios
json.loads() method of loading json data: used to convert str-type data to dict.

  • Training

For example, now you want to train a model that can distinguish between an apple and an orange. You need to search for some pictures of the apple and an orange. These pictures together are called training dataset. The training dataset has labels. The labels of the apple pictures are apples and the same is true of oranges. An initial neural network makes itself more accurate by constantly optimizing its own parameters. It may start with 10 pictures of apples, only 5 of which are considered apples by the network, and 5 of which are mistaken. At this time, by optimizing the parameters, the other 5 errors will become right. This whole process is called training.

  • Inference

You've trained a model and it works well in the training dataset, but we expect it to recognize pictures it hasn't seen before. If you take a new picture and throw it into the network for judgment, it is called live data. If the accuracy of the field data is very high, then your network training is very good. We take the trained model out for a walk and call it Inference.

  • deployment

To apply a trained neural network model, you need to put it on a hardware platform and make sure it runs. This process is called deployment.

(1) alexnet_inference.py analysis

1.1 transforms.Compose()

# hard code
    # The mean and std of three RGB channels are obtained by ImageNet data statistics.
    norm_mean = [0.485, 0.456, 0.406]  # Through the ImageNet statistics, it is RGB three channels(mean and std).
    norm_std = [0.229, 0.224, 0.225]
    # Transforms. Compose (): Image preprocessing package in Pytorch that combines multiple steps
    inference_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(norm_mean, norm_std),
    ])

transforms.Compose()
The following describes the functions in transforms:

  • Resize: Resize a given picture to given size
  • Normalize: Normalized an tensor image with mean and standard deviation
  • ToTensor: Converts a PIL image or an array of shapes [H, W, C] whose range of values is [0,225] into a tensor of shapes [C, H, W], whose range of values is [0, 1.0] (torch.FloadTensor)
  • ToPILImage: convert a tensor to PIL image
  • Scale: Not used anymore, Resize is recommended
  • CenterCrop: Clip in the middle of the image
  • RandomCrop: Clipping at a random location
  • RandomHorizontalFlip: Flips a given PIL image at a 0.5 probability level
  • RandomVerticalFlip: Flips a given PIL image vertically with a probability of 0.5
  • RandomResizedCrop: Clips the PIL image to any size and aspect ratio, and then transforms it to a given size
  • Grayscale: Convert an image to a grayscale image
  • RandomGrayscale: Converts an image to a grayscale with a certain probability
  • FiceCrop: Cut the image into four corners and one center
  • TenCrop
  • Pad: Fill all edges of the image with the given pad value
  • ColorJitter: Randomly change the brightness contrast and saturation of the image.

1.2 unsqueeze_()

    # path --> img
    # Loading pictures 
    #If not used. convert('RGB') converts to read an image with RGBA four channels and A channel transparent
    img_rgb = Image.open(path_img).convert('RGB') 

    # img --> tensor
    img_tensor = img_transform(img_rgb, inference_transform)
    img_tensor.unsqueeze_(0)        # [c,h,w] increase to [b,c,h,w]
    # to(device): the tensor variable copy at the beginning of reading the data is copied to the GPU specified by device, and subsequent operations are performed on the GPU.
    img_tensor = img_tensor.to(device)

    return img_tensor, img_rgb

PIL Image Processing Standard Library - convert('RGB')
unsqueeze_in pytorch Usage of (increasing dimension)
Pytorch Learn Notes - to(device) usage

1.3 AlexNet(nn.Module)

nn.AdaptiveAvgPool2d()
AlexNet in Pytorch changes the number of convolution cores in the original paper

class AlexNet(nn.Module):

    def __init__(self, num_classes: int = 1000) -> None:
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            # conv1 
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),  # 96 --> 64
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2), # 3 > 2 
            
			# conv2
            nn.Conv2d(64, 192, kernel_size=5, padding=2),# 384 --> 256
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            
			# conv3
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            
            # conv4
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            
            # conv5
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        # Adaptive Average Pooling, Specified Output (H, W)
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6)) # trick 
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            
            nn.Linear(4096, num_classes),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

1.4 Loading pretrained_model parameter for pre-training State_ Dict

Model in Pytorch. The role of Eval
Using summary in torchsummary on Pytorch

def get_model(path_state_dict, vis_model=False):
    """
    Create a model, load parameters
    :param path_state_dict:
    :return:
    """
    model = models.alexnet()
    pretrained_state_dict = torch.load(path_state_dict)
    model.load_state_dict(pretrained_state_dict)  # Parameters of Import Pre-training Model
    model.eval() # Because we are not training, we use this to keep the model parameters unchanged.

    if vis_model:
        from torchsummary import summary
        # Summy Print Shows Network Structure and Parameters
        summary(model, input_size=(3, 224, 224), device="cpu")

    model.to(device)
    return model

1.5 inference

torch.no_grad and validation mode

    # 3/5 inference  tensor --> vector
    torch.no_grad():# By adding this to forward propagation, you can speed up operations and reduce memory consumption, since forward propagation does not require calculating gradients
    with torch.no_grad():
        time_tic = time.time()
        outputs = alexnet_model(img_tensor) # Pass pictures directly to the model
        time_toc = time.time()

1.6 Output Top 1 and Top5 probabilities

# 4/5 index to class names
    _, pred_int = torch.max(outputs.data, 1) # Pred_ Index corresponding to int:Top-1
    _, top5_idx = torch.topk(outputs.data, 5, dim=1) # Top5_ Index corresponding to idx:Top-5

    pred_idx = int(pred_int.cpu().numpy()) 
    pred_str, pred_cn = cls_n[pred_idx], cls_n_cn[pred_idx] # Find pred_ Name of the label corresponding to the idx:Top-5 index
    print("img: {} is: {}\n{}".format(os.path.basename(path_img), pred_str, pred_cn))
    print("time consuming:{:.2f}s".format(time_toc - time_tic))

1.7 Image Display

# 5/5 visualization
    plt.imshow(img_rgb)
    plt.title("predict:{}".format(pred_str))
    top5_num = top5_idx.cpu().numpy().squeeze()
    text_str = [cls_n[t] for t in top5_num]
    for idx in range(len(top5_num)):
        plt.text(5, 15+idx*30, "top {}:{}".format(idx+1, text_str[idx]), bbox=dict(fc='yellow'))
    plt.show()

Output As Following:

Topics: Python AI Deep Learning matplotlib