Interpretation of crowd counting [can] (context aware crowd counting) code

Posted by biannucci on Fri, 04 Mar 2022 00:06:12 +0100

The code reproduction has been completed before, and I will also part in Shanghai tech_ A_ Final and part_ B_ The final dataset results came out. Now do a detailed interpretation of the code to deepen your understanding. If there is anything wrong, please give us more advice!!
(this article has read this paper by default and has a certain understanding of python syntax)

code analysis

You can see create in it_ json. py,dataset.py,image.py,make_dataset.py,model.py,test.py,train.py and utils Py has eight py code parts. We understand the code in order of execution.

1,images.py

image.py mainly realizes the processing and transformation of pictures. In this paper, it is proposed to randomly cut 1 / 4 of the original image blocks in different positions in the training process. Only one function load is defined in this file_ Data() is mainly used to generate h5py files, read images and density maps, and cut images.

import random
import os
from PIL import Image
import numpy as np
import h5py
import cv2


#It mainly realizes the processing and transformation of pictures.
def load_data(img_path, train=True):
    # The file name of a h5py file generated: the replace function can replace the old string in the string with the new string
    gt_path = img_path.replace('.jpg', '.h5').replace('images', 'ground_truth')
    #Read out the image and add after it convert('RGB ') is to convert an image to RGB
    #If not used If convert('RGB '), the read image is RGBA four channel, and channel A is transparent, which is temporarily unavailable for deep learning model training.
    img = Image.open(img_path).convert('RGB')
    #Generate h5py file: create an h5py file for the picture and open it as GT in the program_ file
    gt_file = h5py.File(gt_path, 'r')
    #Read the density map corresponding to the picture in the file and convert it to numpy format
    target = np.asarray(gt_file['density'])
    #Randomly cut the image at different positions
    if train:
        ratio = 0.5
        crop_size = (int(img.size[0]*ratio), int(img.size[1]*ratio))
        #Production random number
        rdn_value = random.random()
        if rdn_value < 0.25:
            dx = 0
            dy = 0
        elif rdn_value < 0.5:
            dx = int(img.size[0]*ratio)
            dy = 0
        elif rdn_value < 0.75:
            dx = 0
            dy = int(img.size[1]*ratio)
        else:
            dx = int(img.size[0]*ratio)
            dy = int(img.size[1]*ratio)

        img = img.crop((dx, dy, crop_size[0]+dx, crop_size[1]+dy))
        target = target[dy:(crop_size[1]+dy), dx:(crop_size[0]+dx)]
        if random.random() > 0.8:
            #The function is to flip the array in the left and right directions
            target = np.fliplr(target)
            #Matrix transpose
            img = img.transpose(Image.FLIP_LEFT_RIGHT)

    #The operation of taking 1 / 8 of the width and height is adopted for the target, because the output result of CAN is 1 / 8 of the size of the original drawing, and Inter is adopted_ Cubic transformation method, and multiplied by 64 at the end to ensure that the sum of picture pixels remains unchanged.
    target = cv2.resize(target, (target.shape[1]//8, target.shape[0]//8), interpolation=cv2.INTER_CUBIC)*64

    return img, target

2,dataset.py

Dataset file mainly realizes the creation of dataset. In__ getitem__ () function called image. Load in PY_ Data() function to get the processed image and density map.

import os
import random
import torch
import numpy as np
from torch.utils.data import Dataset
from PIL import Image
from image import *
import torchvision.transforms.functional as F

#Dataset file mainly realizes the creation of dataset
class listDataset(Dataset):
    #Root refers to the path of pictures used for training, which is a list type; shuffle indicates whether the order of root needs to be disrupted; train indicates whether the data set is used for training; batch_size indicates the size of training batch data
    def __init__(self, root, shape=None, shuffle=True, transform=None,  train=False, seen=0, batch_size=1, num_workers=4):
        #Disrupt the path
        random.shuffle(root)
        #Initialization of variable members
        self.nSamples = len(root)  #Calculate the number of pictures
        self.lines = root
        self.transform = transform
        self.train = train
        self.shape = shape
        self.seen = seen
        self.batch_size = batch_size
        self.num_workers = num_workers
        
    #In the constructor, self is defined Nsamples is the length of the root list, so you only need to return self NSample is enough.
    def __len__(self):
        return self.nSamples

    #Get pictures and density maps
    def __getitem__(self, index):
        # In getitem(), the assert statement is used to declare that a condition is True, which is usually followed by a logical expression. Nothing happens when the formula sub value is True, and an exception will be thrown when the formula sub value is False. This is obviously to judge whether the subscript overflows.
        assert index <= len(self), 'index range error'
        #img_path from member function self The address of the picture is read at lines
        img_path = self.lines[index]
        #load_data() function. This function is in image Defined in the PY file. This function reads the picture address and returns the numpy variable of the training picture and the density map of the picture pair
        img, target = load_data(img_path, self.train)

        #Next, judge whether the image needs to be transformed. Finally, return the image and the density map corresponding to the image.
        #From transform to None, this sentence does not need to be executed
        if self.transform is not None:
            img = self.transform(img)
        return img, target

3,make_dataset.py

Run make first_ dataset. Py file is mainly used to generate hdf5 file (target file) through Gaussian kernel calculation of image and mat files, and run make_ dataset. After the PY script, the hdf5 file (suffix. h5) will be generated in the data set ground_ In the truth folder.

import h5py
import scipy.io as io
import PIL.Image as Image
import numpy as np
import os
import glob
from matplotlib import pyplot as plt
from scipy.ndimage.filters import gaussian_filter
from matplotlib import cm as CM
from image import *

#Through make_dataset.py create h5py file (target file)
#Let the image and mat files go through the calculation of Gaussian kernel to generate the target file
#After running successfully, the h5py file (suffix. h5) will be generated in the ground of the dataset_ In the truth folder

# Path to ShanghaiTech dataset
root = './data/'

#Directly assign the address and use OS path. Join() function
#part_B_final_train = './part_B_final/train_data/images'
#os.path.join() function: connect two or more pathname components
part_B_final_train = os.path.join(root, 'part_B_final/train_data', 'images')
part_B_final_test = os.path.join(root, 'part_B_final/test_data', 'images')
path_sets = [part_B_final_train, part_B_final_test]


img_paths = []
#glob.glob() function: returns a list of all matching file paths
#append() function: used to add a new object at the end of the list.
#Is to put all paths in img in turn_ In paths
for path in path_sets:
    for img_path in glob.glob(os.path.join(path, '*.jpg')):
        img_paths.append(img_path)

#Traverse the processed image and generate h5 file
for img_path in img_paths:
    print(img_path)
    #Use the module SciPy Io's functions loadmat and savemat can realize Python's reading and writing of mat data
    #The replace function can replace the old string in the string with a new string. The max parameter indicates that the replacement is not more than max times; replace(old,new,max)
    mat = io.loadmat(img_path.replace('.jpg', '.mat').replace('images', 'ground_truth').replace('IMG_', 'GT_IMG_'))
    #The imread() function in the pyplot module of the matplotlib library is used to read the images in the file into an array.
    img = plt.imread(img_path)
    #The shape function is numpy core. The function in frommetric is to read the length of the matrix. For example, shape[0] is to read the length of the first dimension of the matrix, that is, the number of rows of the matrix; [0] number of columns to read
    #Create a full 0 matrix of the same type as img
    k = np.zeros((img.shape[0], img.shape[1]))
    #image_info get pictures or all pictures in the directory
    gt = mat["image_info"][0, 0][0, 0][0]
    #range(start,end), excluding end

    # Let the image and mat files go through the calculation of Gaussian kernel to generate the target file, which is the density map
    for i in range(0, len(gt)):
        if int(gt[i][1]) < img.shape[0] and int(gt[i][0]) < img.shape[1]:
            k[int(gt[i][1]), int(gt[i][0])] = 1
    #Gaussian filter_ Gaussian filter is a kind of linear smoothing filter, which can remove Gaussian noise. Its effect is to reduce the sharp change of image gray, that is, the image is blurred.
    k = gaussian_filter(k, 15)
    #Generate the generated h5py file (suffix. h5) in the ground of the dataset_ In the truth folder
    with h5py.File(img_path.replace('.jpg', '.h5').replace('images', 'ground_truth'), 'w') as hf:
            hf['density'] = k
            

4,create_json.py

Then execute create_json.py file to generate JSON file, train JSON and val.json files are the paths of training and test files respectively. It's relatively simple. I won't say more.

import json
from os.path import join
import glob

#Generate JSON file, train JSON and val.json files are the paths of training and testing files respectively, which contain the paths of all pictures
if __name__ == '__main__':
    # The path to the folder containing the image
    #img_folder = './data/part_B_final/train_data/images'
    img_folder = './data/part_B_final/test_data/images'

    # The path to the final json file
    #output_json = './train.json'
    output_json = './val.json'

    img_list = []

    # glob.glob() function: returns a list of all matching file paths
    # append() function: used to add a new object at the end of the list.
    for img_path in glob.glob(join(img_folder, '*.jpg')):
        img_list.append(img_path)

    #Write image path to json file
    with open(output_json, 'w') as f:
        json.dump(img_list, f)

5,utils.py

There are three functions defined in it. The first two are not used, so only the third function is explained. (mainly because I don't understand what the first two functions do. If any God knows, you can tell me)

import h5py
import torch
import shutil
import numpy as np


def save_net(fname, net):
    with h5py.File(fname, 'w') as h5f:
        for k, v in net.state_dict().items():
            h5f.create_dataset(k, data=v.cpu().numpy())

def load_net(fname, net):
    with h5py.File(fname, 'r') as h5f:
        for k, v in net.state_dict().items():        
            param = torch.from_numpy(np.asarray(h5f[k]))         
            v.copy_(param)

#The first two functions are not used. The focus is on this function
#After each training, a model will be generated as checkpoint pth. Tar file. This function is mainly used to judge whether this model is the best model. If so, copy it to the file model_best.pth.tar
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    # torch.save() usage: save model parameters
    torch.save(state, filename)
    if is_best:
        #shutil.copyfile() copy file
        shutil.copyfile(filename, 'model_best.pth.tar')            

6,model.py

model.py file mainly realizes the definition of network model (the most complex file in it). Let's first understand the basics.
a. All custom operations in pytorch basically inherit NN Module class
b. Generally, the layers with learnable parameters in the network (such as full connection layer, convolution layer, etc.) are placed in the constructor__ init__ In (), of course, you can also put layers without parameters in them;
c. Generally, layers without learnable parameters (such as ReLU, dropout and batchspecification layers) can be placed in the constructor or not. If not, they can be placed in the constructor__ init__ Inside, NN can be used in the forward method Functional instead
In this file, we mainly realize the model of extracting multi-scale context information and our own network model. (many calculations are mentioned in the paper and the reasons are explained)

import torch.nn as nn
import torch
from torch.nn import functional as F
from torchvision import models

#model.py mainly realizes the definition of network model

#Model definition for extracting multi-scale context information
class ContextualModule(nn.Module):
    #We use S = 4 different scales, corresponding to the block size k(j) ∈ {1,2,3,6}, because it shows better performance than other settings.
    #features refers to the size of each input sample; out_features refers to the size of each output sample; sizes is the size of the scale
    def __init__(self, features, out_features=512, sizes=(1, 2, 3, 6)):
        super(ContextualModule, self).__init__()
        self.scales = []
        #Using feature pyramid to calculate scale features?
        self.scales = nn.ModuleList([self._make_scale(features, size) for size in sizes])
        #The function of bottleneck is to reduce the number of parameters in a more reasonable way without deleting key features as much as possible
        #nn.Conv2d(in_channels, out_channels, kernel_size) as the realization of two-dimensional convolution; features*2 refers to the number of channels of the input tensor; kernel_ Size refers to the size of convolution kernel, which is 1 * 1
        self.bottleneck = nn.Conv2d(features * 2, out_features, kernel_size=1)
        #nn.ReLU() is the activation function and also the convolution layer
        self.relu = nn.ReLU()
        #Each such network outputs a specific proportion of the form weight graph
        self.weight_net = nn.Conv2d(features, features, kernel_size=1)

    #Function for calculating prediction weight w
    def __make_weight(self, feature, scale_feature):
        #The calculation formula of comparative characteristics given in the original text: C_j = S_j - f_j. Is to calculate the comparative characteristics
        weight_feature = feature - scale_feature
        #Sgimoid function is a function similar to S, which also becomes S function. It is often used as classification in machine learning
        #sigmoid function to avoid dividing by 0
        #Use self weight_ Net () to further learn the weights of scale perception features
        return F.sigmoid(self.weight_net(weight_feature))

    #Calculation scale
    def _make_scale(self, features, size):
        #nn. Adaptive avgpool2d() adaptive average pooling function
        prior = nn.AdaptiveAvgPool2d(output_size=(size, size))
        #Two dimensional convolution
        conv = nn.Conv2d(features, features, kernel_size=1, bias=False)
        #nn.Sequential() is an ordered container, and the neural network modules will be added to the calculation diagram for execution in the order of incoming constructors,
        return nn.Sequential(prior, conv)

    #Feedback to back-end network
    def forward(self, feats):
        h, w = feats.size(2), feats.size(3)
        #Adaptive processing, learning, computing scale
        multi_scales = [F.upsample(input=stage(feats), size=(h, w), mode='bilinear') for stage in self.scales]
        #The weight is calculated according to the scale obtained by adaptive processing
        weights = [self.__make_weight(feats, scale_feature) for scale_feature in multi_scales]
        #Use the weight to calculate the context feature (formula (5) of the original text)
        overall_features = [(multi_scales[0]*weights[0]+multi_scales[1]*weights[1]+multi_scales[2]*weights[2]+multi_scales[3]*weights[3])/(weights[0]+weights[1]+weights[2]+weights[3])]+ [feats]
        #Convolution processing
        bottle = self.bottleneck(torch.cat(overall_features, 1))
        return self.relu(bottle)

#This paper presents its own network model
class CANNet(nn.Module):
    def __init__(self, load_weights=False):
        super(CANNet, self).__init__()
        self.seen = 0
        #Using the previously established model to extract multi-scale context information
        self.context = ContextualModule(512, 512)
        #Network front end,'M 'indicates that this layer is a MaxPooling layer, frontend_feat represents the first 10 layers of VGG-16 network
        self.frontend_feat = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512]
        #Network back end_ Feat represents the back end using void convolution
        self.backend_feat = [512, 512, 512, 256, 128, 64]
        #Both the front end and the back end use make_ Custom implementation of layers() function
        self.frontend = make_layers(self.frontend_feat)
        self.backend = make_layers(self.backend_feat, in_channels=512, batch_norm=True, dilation=True)
        #The final output layer adopts 1 * 1 convolution kernel to realize the transformation of the number of feature planes to 1.
        self.output_layer = nn.Conv2d(64, 1, kernel_size=1)
        #According to the previous load_weights is False, so this if must be executed
        if not load_weights:
            #Save the network structure of the first 10 layers of VGG-16
            mod = models.vgg16(pretrained=True)
            #Call self first_ initialize_ The weights () method is initialized manually
            self._initialize_weights()
            #net.state_dict() is a dictionary of all parameters of the network. The key in it is the name of the parameters of each layer of the network, and the value is the Tensor of the encapsulated parameters
            #dict.items() statement will return a traversable (key, value) tuple. By traversing this tuple, i.e. I access mod. Item by item state_ dict(). items()[i][1]. Data [:], you can traverse all parameters and complete the copy.
            for i in range(len(self.frontend.state_dict().items())):
                list(self.frontend.state_dict().items())[i][1].data[:] = list(mod.state_dict().items())[i][1].data[:]

    #Definition and initialization method of network forward
    def forward(self, x):
        x = self.frontend(x)
        x = self.context(x)
        x = self.backend(x)
        x = self.output_layer(x)
        return x

    #net.modules() is the list of network layers. We use m to traverse the elements in the list, judge the layer type of M, and then use the functions under init to complete initialization.
    def _initialize_weights(self):
        for m in self.modules():
            #Normal segment: NN init. normal_ (tensor, mean=0, std=1)
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight, std=0.01)
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

#The function defined by the network structure is written outside the definition of the class
def make_layers(cfg, in_channels = 3, batch_norm=False, dilation = False):
    #Judge the size of the hole rate through the division. If False, it is the front end of the network, and the hole rate is 1. If True, it is the back end of the network, and the hole rate is 2.
    if dilation:
        d_rate = 2
    else:
        d_rate = 1
    layers = []
    #Next, traverse the incoming frontend_feat and backend_feat to determine the type of network layer. If'M ', it is MaxPooling layer;
    #If it is a number, it is a convolution layer. A set of convolution layers includes Conv2d layer, BatchNorm layer and ReLU layer. After traversing, the creation of the front-end and back-end of the network can be completed.
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=d_rate, dilation=d_rate)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                #Due to batch_norm=False, execute else statement directly
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)

7,train.py

It is to train the model to input an image and output the corresponding more accurate density map.

import sys
import os

import warnings

from model import CANNet

from utils import save_checkpoint

import torch
import torch.nn as nn
#torch.autograd provides classes and functions for deriving arbitrary scalar functions.
from torch.autograd import Variable
from torchvision import datasets, transforms

import numpy as np
import argparse
import json
import cv2
import dataset
import time

#The argparse module allows you to receive parameters when the python command line starts
#Through argparse Argumentparser() to create a parsing object
parser = argparse.ArgumentParser(description='PyTorch CANNet')

#Through parser add_ Argument() function to add command line parameters. metavar can change the displayed name, and the help parameter will be displayed when - h or - help is typed on the command line
parser.add_argument('train_json', metavar='TRAIN',
                    help='path to train json')
parser.add_argument('val_json', metavar='VAL',
                    help='path to val json')

def main():

    global args, best_prec1

    best_prec1 = 1e6

    args = parser.parse_args()
    args.lr = 1e-4
    args.batch_size = 1

    args.decay = 5*1e-4
    args.start_epoch = 0
    args.epochs = 1000
    args.workers = 4
    args.seed = int(time.time())
    args.print_freq = 4
    #json.load() can read JSON characters from the file
    #Is to read the image path
    with open(args.train_json, 'r') as outfile:
        train_list = json.load(outfile)
    with open(args.val_json, 'r') as outfile:
        val_list = json.load(outfile)

    #Set a random seed for the current GPU. It should be used when there are multiple GPUs
    torch.cuda.manual_seed(args.seed)

    #Define network object model
    model = CANNet()

    #Transfer the model to GPU
    model = model.cuda()

    #Define the error function as MSE (mean square error)
    criterion = nn.MSELoss(size_average=False).cuda()

    #The optimizer is defined, which adopts the random gradient descent method to provide the learning rate, momentum and decay rate
    optimizer = torch.optim.Adam(model.parameters(), args.lr,
                                    weight_decay=args.decay)

    #Loop, step by step training network
    for epoch in range(args.start_epoch, args.epochs):
        #Network forward propagation and error back propagation
        train(train_list, model, criterion, optimizer, epoch)
        #Accuracy detection function
        prec1 = validate(val_list, model, criterion)

        #Determine whether the MAE returned by validate is optimal
        is_best = prec1 < best_prec1
        #Save the optimal MEA and output it to the screen
        best_prec1 = min(prec1, best_prec1)
        print(' * best MAE {mae:.3f} '
              .format(mae=best_prec1))
        #state_dict is a list, which contains all the parameter dictionaries of epoch, network and optimizer and the optimal MAE
        save_checkpoint({
            'state_dict': model.state_dict(),
        }, is_best)

#It mainly realizes the train function and is also the core part of training. It mainly completes the definition of training batch data, network forward propagation and error inverse propagation
def train(train_list, model, criterion, optimizer, epoch):

    losses = AverageMeter()
    batch_time = AverageMeter()
    data_time = AverageMeter()

    #Use torch utils. data. The dataloader () method creates the training batch data. The first parameter is the dataset, which is already in dataset The PY file is defined
    #train_list is the address of the training picture, shuffle is True, and transform the picture to Normalize
    train_loader = torch.utils.data.DataLoader(
        dataset.listDataset(train_list,
                       shuffle=True,
                       transform=transforms.Compose([
                       transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225]),
                   ]),
                       train=True,
                       seen=model.seen,
                       batch_size=args.batch_size,
                       num_workers=args.workers),
        batch_size=args.batch_size)
    print('epoch %d, processed %d samples, lr %.10f' % (epoch, epoch * len(train_loader.dataset), args.lr))

    #Our network contains BN, so we need to declare model before training train()
    model.train()
    #time.time() returns the timestamp of the current time (the number of floating-point seconds elapsed after the 1970 era).
    end = time.time()

    #Use enumerate to read the training batch data together with the subscript. img is the picture and target is the real density map (end-to-end)
    for i, (img, target) in enumerate(train_loader):
        data_time.update(time.time() - end)

        #Forward and reverse propagation process
        #Transfer img to GPU
        img = img.cuda()
        #Declare img as a Variable variable. To use automatic derivation, you can include all tensor s in the Variable object.
        img = Variable(img)
        #img is introduced into the network and the predicted density map is obtained
        output = model(img)[:, 0, :, :]

        #True density map
        target = target.type(torch.FloatTensor).cuda()
        target = Variable(target)

        #Compare output with target to get loss (loss function)
        loss = criterion(output, target)

        #Update loss every time
        losses.update(loss.item(), img.size(0))
        #Gradient cleaning
        optimizer.zero_grad()
        #Gradient calculation
        loss.backward()
        #Update network parameters according to gradient
        optimizer.step()
        #Calculate the time of this training
        batch_time.update(time.time() - end)
        #Record end time
        end = time.time()

        if i % args.print_freq == 0:
            print('Epoch: [{0}][{1}/{2}]\t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                  'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                  .format(
                   epoch, i, len(train_loader), batch_time=batch_time,
                   data_time=data_time, loss=losses))

#Complete the establishment of batch data test
def validate(val_list, model, criterion):
    print('begin val')
    val_loader = torch.utils.data.DataLoader(
    dataset.listDataset(val_list,
                   shuffle=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225]),
                   ]),  train=False),
    batch_size=1)

    #Declare model eval()
    model.eval()
    # Initialize mae
    mae = 0

    #Training network
    for i, (img, target) in enumerate(val_loader):
        #img.shape [] get the height and width of the picture
        h, w = img.shape[2:4]
        #Cut the image into four parts
        h_d = h//2
        w_d = w//2
        ##Declare img as a Variable variable. To use automatic derivation, you can include all tensor s in the Variable object.
        img_1 = Variable(img[:, :, :h_d, :w_d].cuda())
        img_2 = Variable(img[:, :, :h_d, w_d:].cuda())
        img_3 = Variable(img[:, :, h_d:, :w_d].cuda())
        img_4 = Variable(img[:, :, h_d:, w_d:].cuda())
        #Find the density map of each block
        density_1 = model(img_1).data.cpu().numpy()
        density_2 = model(img_2).data.cpu().numpy()
        density_3 = model(img_3).data.cpu().numpy()
        density_4 = model(img_4).data.cpu().numpy()

        #The predicted density map is obtained by adding
        pred_sum = density_1.sum()+density_2.sum()+density_3.sum()+density_4.sum()

        #Calculate loss function
        mae += abs(pred_sum-target.sum())

    mae = mae/len(val_loader)
    print(' * MAE {mae:.3f} '
              .format(mae=mae))

    return mae

#Used to encapsulate statistics and calculate them
class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self):
        self.reset()

    def reset(self):
        #variance
        self.val = 0
        #average
        self.avg = 0
        #and
        self.sum = 0
        #quantity
        self.count = 0

    #Methods for calculating these quantities are provided in update()
    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

if __name__ == '__main__':
    main()

8,test.py

Finally, there is the last file. This file is to generate the density map from the trained model, and then calculate the number of people.

import h5py
import PIL.Image as Image
import numpy as np
import os
import glob
import scipy
from image import *
from model import CANNet
import torch
from torch.autograd import Variable

from sklearn.metrics import mean_squared_error,mean_absolute_error

from torchvision import transforms

#torchvision.transforms.Compose() class, which is mainly used to concatenate multiple image transformations.
transform = transforms.Compose([
                       transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225]),
                   ])

# the folder contains all the test images
img_folder = './data/part_B_final/test_data/images'
img_paths = []

#glob.glob() function: returns a list of all matching file paths
#append() function: used to add a new object at the end of the list.
for img_path in glob.glob(os.path.join(img_folder, '*.jpg')):
    img_paths.append(img_path)

model = CANNet()

model = model.cuda()

#Import the best model file
checkpoint = torch.load('model_best.pth.tar')

#model.state_dict() is a shallow copy, which copies the outermost values and pointers. It does not copy deeper objects, that is, it only copies the parent object
#model.load_state_dict() is a deep copy, which copies the value, pointer and the deep memory space pointed by the pointer, and copies the parent object and its children
model.load_state_dict(checkpoint['state_dict'])

#Declare model eval()
model.eval()

pred= []
gt = []

#Traverse test image
for i in range(len(img_paths)):
    #Convert to GPU available
    img = transform(Image.open(img_paths[i]).convert('RGB')).cuda()
    #Add a dimension
    #It is often used for CNN, because the input of conv2d must be four-dimensional (batch,channel,height,width). If the input is text, it is usually only three-dimensional (batch,length,dim)
    # Therefore, it is necessary to unsqueeze(1) and add one-dimensional channel to perform convolution operation (I don't understand why it is 0 in the code here)
    img = img.unsqueeze(0)
    # img.shape [] get the height and width of the picture
    h, w = img.shape[2:4]
    h_d = h//2
    w_d = w//2
    img_1 = Variable(img[:, :, :h_d, :w_d].cuda())
    img_2 = Variable(img[:, :, :h_d, w_d:].cuda())
    img_3 = Variable(img[:, :, h_d:, :w_d].cuda())
    img_4 = Variable(img[:, :, h_d:, w_d:].cuda())
    density_1 = model(img_1).data.cpu().numpy()
    density_2 = model(img_2).data.cpu().numpy()
    density_3 = model(img_3).data.cpu().numpy()
    density_4 = model(img_4).data.cpu().numpy()

    #os.path.splitext() splits the path and returns the tuple of the path name and file extension
    pure_name = os.path.splitext(os.path.basename(img_paths[i]))[0]
    # Generate h5py file: create an h5py file for the picture and open it as GT in the program_ file
    gt_file = h5py.File(img_paths[i].replace('.jpg', '.h5').replace('images', 'ground_truth'), 'r')
    #Read the density map corresponding to the picture in the file and convert it to numpy format
    groundtruth = np.asarray(gt_file['density'])
    #Predict the number of people (the density map is a point representing a person, which is 1 in the matrix, sum () is to add and sum all the elements of the matrix, and all the calculations are the number of people)
    pred_sum = density_1.sum()+density_2.sum()+density_3.sum()+density_4.sum()
    pred.append(pred_sum)
    #Direct summation of real numbers
    gt.append(np.sum(groundtruth))

#Calculate the average absolute error
mae = mean_absolute_error(pred, gt)
#Calculated root mean square error
rmse = np.sqrt(mean_squared_error(pred, gt))

print('pred:', pred)
print('gt:', gt)
print('MAE: ', mae)
print('RMSE: ', rmse)

Postscript

Feel model I understand the basic code of Py, but I still don't quite understand what's going on. I mainly don't understand the convolution operation.

Topics: Python Pycharm neural networks Deep Learning