The code reproduction has been completed before, and I will also part in Shanghai tech_ A_ Final and part_ B_ The final dataset results came out. Now do a detailed interpretation of the code to deepen your understanding. If there is anything wrong, please give us more advice!!
(this article has read this paper by default and has a certain understanding of python syntax)
code analysis
You can see create in it_ json. py,dataset.py,image.py,make_dataset.py,model.py,test.py,train.py and utils Py has eight py code parts. We understand the code in order of execution.
1,images.py
image.py mainly realizes the processing and transformation of pictures. In this paper, it is proposed to randomly cut 1 / 4 of the original image blocks in different positions in the training process. Only one function load is defined in this file_ Data() is mainly used to generate h5py files, read images and density maps, and cut images.
import random import os from PIL import Image import numpy as np import h5py import cv2 #It mainly realizes the processing and transformation of pictures. def load_data(img_path, train=True): # The file name of a h5py file generated: the replace function can replace the old string in the string with the new string gt_path = img_path.replace('.jpg', '.h5').replace('images', 'ground_truth') #Read out the image and add after it convert('RGB ') is to convert an image to RGB #If not used If convert('RGB '), the read image is RGBA four channel, and channel A is transparent, which is temporarily unavailable for deep learning model training. img = Image.open(img_path).convert('RGB') #Generate h5py file: create an h5py file for the picture and open it as GT in the program_ file gt_file = h5py.File(gt_path, 'r') #Read the density map corresponding to the picture in the file and convert it to numpy format target = np.asarray(gt_file['density']) #Randomly cut the image at different positions if train: ratio = 0.5 crop_size = (int(img.size[0]*ratio), int(img.size[1]*ratio)) #Production random number rdn_value = random.random() if rdn_value < 0.25: dx = 0 dy = 0 elif rdn_value < 0.5: dx = int(img.size[0]*ratio) dy = 0 elif rdn_value < 0.75: dx = 0 dy = int(img.size[1]*ratio) else: dx = int(img.size[0]*ratio) dy = int(img.size[1]*ratio) img = img.crop((dx, dy, crop_size[0]+dx, crop_size[1]+dy)) target = target[dy:(crop_size[1]+dy), dx:(crop_size[0]+dx)] if random.random() > 0.8: #The function is to flip the array in the left and right directions target = np.fliplr(target) #Matrix transpose img = img.transpose(Image.FLIP_LEFT_RIGHT) #The operation of taking 1 / 8 of the width and height is adopted for the target, because the output result of CAN is 1 / 8 of the size of the original drawing, and Inter is adopted_ Cubic transformation method, and multiplied by 64 at the end to ensure that the sum of picture pixels remains unchanged. target = cv2.resize(target, (target.shape[1]//8, target.shape[0]//8), interpolation=cv2.INTER_CUBIC)*64 return img, target
2,dataset.py
Dataset file mainly realizes the creation of dataset. In__ getitem__ () function called image. Load in PY_ Data() function to get the processed image and density map.
import os import random import torch import numpy as np from torch.utils.data import Dataset from PIL import Image from image import * import torchvision.transforms.functional as F #Dataset file mainly realizes the creation of dataset class listDataset(Dataset): #Root refers to the path of pictures used for training, which is a list type; shuffle indicates whether the order of root needs to be disrupted; train indicates whether the data set is used for training; batch_size indicates the size of training batch data def __init__(self, root, shape=None, shuffle=True, transform=None, train=False, seen=0, batch_size=1, num_workers=4): #Disrupt the path random.shuffle(root) #Initialization of variable members self.nSamples = len(root) #Calculate the number of pictures self.lines = root self.transform = transform self.train = train self.shape = shape self.seen = seen self.batch_size = batch_size self.num_workers = num_workers #In the constructor, self is defined Nsamples is the length of the root list, so you only need to return self NSample is enough. def __len__(self): return self.nSamples #Get pictures and density maps def __getitem__(self, index): # In getitem(), the assert statement is used to declare that a condition is True, which is usually followed by a logical expression. Nothing happens when the formula sub value is True, and an exception will be thrown when the formula sub value is False. This is obviously to judge whether the subscript overflows. assert index <= len(self), 'index range error' #img_path from member function self The address of the picture is read at lines img_path = self.lines[index] #load_data() function. This function is in image Defined in the PY file. This function reads the picture address and returns the numpy variable of the training picture and the density map of the picture pair img, target = load_data(img_path, self.train) #Next, judge whether the image needs to be transformed. Finally, return the image and the density map corresponding to the image. #From transform to None, this sentence does not need to be executed if self.transform is not None: img = self.transform(img) return img, target
3,make_dataset.py
Run make first_ dataset. Py file is mainly used to generate hdf5 file (target file) through Gaussian kernel calculation of image and mat files, and run make_ dataset. After the PY script, the hdf5 file (suffix. h5) will be generated in the data set ground_ In the truth folder.
import h5py import scipy.io as io import PIL.Image as Image import numpy as np import os import glob from matplotlib import pyplot as plt from scipy.ndimage.filters import gaussian_filter from matplotlib import cm as CM from image import * #Through make_dataset.py create h5py file (target file) #Let the image and mat files go through the calculation of Gaussian kernel to generate the target file #After running successfully, the h5py file (suffix. h5) will be generated in the ground of the dataset_ In the truth folder # Path to ShanghaiTech dataset root = './data/' #Directly assign the address and use OS path. Join() function #part_B_final_train = './part_B_final/train_data/images' #os.path.join() function: connect two or more pathname components part_B_final_train = os.path.join(root, 'part_B_final/train_data', 'images') part_B_final_test = os.path.join(root, 'part_B_final/test_data', 'images') path_sets = [part_B_final_train, part_B_final_test] img_paths = [] #glob.glob() function: returns a list of all matching file paths #append() function: used to add a new object at the end of the list. #Is to put all paths in img in turn_ In paths for path in path_sets: for img_path in glob.glob(os.path.join(path, '*.jpg')): img_paths.append(img_path) #Traverse the processed image and generate h5 file for img_path in img_paths: print(img_path) #Use the module SciPy Io's functions loadmat and savemat can realize Python's reading and writing of mat data #The replace function can replace the old string in the string with a new string. The max parameter indicates that the replacement is not more than max times; replace(old,new,max) mat = io.loadmat(img_path.replace('.jpg', '.mat').replace('images', 'ground_truth').replace('IMG_', 'GT_IMG_')) #The imread() function in the pyplot module of the matplotlib library is used to read the images in the file into an array. img = plt.imread(img_path) #The shape function is numpy core. The function in frommetric is to read the length of the matrix. For example, shape[0] is to read the length of the first dimension of the matrix, that is, the number of rows of the matrix; [0] number of columns to read #Create a full 0 matrix of the same type as img k = np.zeros((img.shape[0], img.shape[1])) #image_info get pictures or all pictures in the directory gt = mat["image_info"][0, 0][0, 0][0] #range(start,end), excluding end # Let the image and mat files go through the calculation of Gaussian kernel to generate the target file, which is the density map for i in range(0, len(gt)): if int(gt[i][1]) < img.shape[0] and int(gt[i][0]) < img.shape[1]: k[int(gt[i][1]), int(gt[i][0])] = 1 #Gaussian filter_ Gaussian filter is a kind of linear smoothing filter, which can remove Gaussian noise. Its effect is to reduce the sharp change of image gray, that is, the image is blurred. k = gaussian_filter(k, 15) #Generate the generated h5py file (suffix. h5) in the ground of the dataset_ In the truth folder with h5py.File(img_path.replace('.jpg', '.h5').replace('images', 'ground_truth'), 'w') as hf: hf['density'] = k
4,create_json.py
Then execute create_json.py file to generate JSON file, train JSON and val.json files are the paths of training and test files respectively. It's relatively simple. I won't say more.
import json from os.path import join import glob #Generate JSON file, train JSON and val.json files are the paths of training and testing files respectively, which contain the paths of all pictures if __name__ == '__main__': # The path to the folder containing the image #img_folder = './data/part_B_final/train_data/images' img_folder = './data/part_B_final/test_data/images' # The path to the final json file #output_json = './train.json' output_json = './val.json' img_list = [] # glob.glob() function: returns a list of all matching file paths # append() function: used to add a new object at the end of the list. for img_path in glob.glob(join(img_folder, '*.jpg')): img_list.append(img_path) #Write image path to json file with open(output_json, 'w') as f: json.dump(img_list, f)
5,utils.py
There are three functions defined in it. The first two are not used, so only the third function is explained. (mainly because I don't understand what the first two functions do. If any God knows, you can tell me)
import h5py import torch import shutil import numpy as np def save_net(fname, net): with h5py.File(fname, 'w') as h5f: for k, v in net.state_dict().items(): h5f.create_dataset(k, data=v.cpu().numpy()) def load_net(fname, net): with h5py.File(fname, 'r') as h5f: for k, v in net.state_dict().items(): param = torch.from_numpy(np.asarray(h5f[k])) v.copy_(param) #The first two functions are not used. The focus is on this function #After each training, a model will be generated as checkpoint pth. Tar file. This function is mainly used to judge whether this model is the best model. If so, copy it to the file model_best.pth.tar def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): # torch.save() usage: save model parameters torch.save(state, filename) if is_best: #shutil.copyfile() copy file shutil.copyfile(filename, 'model_best.pth.tar')
6,model.py
model.py file mainly realizes the definition of network model (the most complex file in it). Let's first understand the basics.
a. All custom operations in pytorch basically inherit NN Module class
b. Generally, the layers with learnable parameters in the network (such as full connection layer, convolution layer, etc.) are placed in the constructor__ init__ In (), of course, you can also put layers without parameters in them;
c. Generally, layers without learnable parameters (such as ReLU, dropout and batchspecification layers) can be placed in the constructor or not. If not, they can be placed in the constructor__ init__ Inside, NN can be used in the forward method Functional instead
In this file, we mainly realize the model of extracting multi-scale context information and our own network model. (many calculations are mentioned in the paper and the reasons are explained)
import torch.nn as nn import torch from torch.nn import functional as F from torchvision import models #model.py mainly realizes the definition of network model #Model definition for extracting multi-scale context information class ContextualModule(nn.Module): #We use S = 4 different scales, corresponding to the block size k(j) ∈ {1,2,3,6}, because it shows better performance than other settings. #features refers to the size of each input sample; out_features refers to the size of each output sample; sizes is the size of the scale def __init__(self, features, out_features=512, sizes=(1, 2, 3, 6)): super(ContextualModule, self).__init__() self.scales = [] #Using feature pyramid to calculate scale features? self.scales = nn.ModuleList([self._make_scale(features, size) for size in sizes]) #The function of bottleneck is to reduce the number of parameters in a more reasonable way without deleting key features as much as possible #nn.Conv2d(in_channels, out_channels, kernel_size) as the realization of two-dimensional convolution; features*2 refers to the number of channels of the input tensor; kernel_ Size refers to the size of convolution kernel, which is 1 * 1 self.bottleneck = nn.Conv2d(features * 2, out_features, kernel_size=1) #nn.ReLU() is the activation function and also the convolution layer self.relu = nn.ReLU() #Each such network outputs a specific proportion of the form weight graph self.weight_net = nn.Conv2d(features, features, kernel_size=1) #Function for calculating prediction weight w def __make_weight(self, feature, scale_feature): #The calculation formula of comparative characteristics given in the original text: C_j = S_j - f_j. Is to calculate the comparative characteristics weight_feature = feature - scale_feature #Sgimoid function is a function similar to S, which also becomes S function. It is often used as classification in machine learning #sigmoid function to avoid dividing by 0 #Use self weight_ Net () to further learn the weights of scale perception features return F.sigmoid(self.weight_net(weight_feature)) #Calculation scale def _make_scale(self, features, size): #nn. Adaptive avgpool2d() adaptive average pooling function prior = nn.AdaptiveAvgPool2d(output_size=(size, size)) #Two dimensional convolution conv = nn.Conv2d(features, features, kernel_size=1, bias=False) #nn.Sequential() is an ordered container, and the neural network modules will be added to the calculation diagram for execution in the order of incoming constructors, return nn.Sequential(prior, conv) #Feedback to back-end network def forward(self, feats): h, w = feats.size(2), feats.size(3) #Adaptive processing, learning, computing scale multi_scales = [F.upsample(input=stage(feats), size=(h, w), mode='bilinear') for stage in self.scales] #The weight is calculated according to the scale obtained by adaptive processing weights = [self.__make_weight(feats, scale_feature) for scale_feature in multi_scales] #Use the weight to calculate the context feature (formula (5) of the original text) overall_features = [(multi_scales[0]*weights[0]+multi_scales[1]*weights[1]+multi_scales[2]*weights[2]+multi_scales[3]*weights[3])/(weights[0]+weights[1]+weights[2]+weights[3])]+ [feats] #Convolution processing bottle = self.bottleneck(torch.cat(overall_features, 1)) return self.relu(bottle) #This paper presents its own network model class CANNet(nn.Module): def __init__(self, load_weights=False): super(CANNet, self).__init__() self.seen = 0 #Using the previously established model to extract multi-scale context information self.context = ContextualModule(512, 512) #Network front end,'M 'indicates that this layer is a MaxPooling layer, frontend_feat represents the first 10 layers of VGG-16 network self.frontend_feat = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512] #Network back end_ Feat represents the back end using void convolution self.backend_feat = [512, 512, 512, 256, 128, 64] #Both the front end and the back end use make_ Custom implementation of layers() function self.frontend = make_layers(self.frontend_feat) self.backend = make_layers(self.backend_feat, in_channels=512, batch_norm=True, dilation=True) #The final output layer adopts 1 * 1 convolution kernel to realize the transformation of the number of feature planes to 1. self.output_layer = nn.Conv2d(64, 1, kernel_size=1) #According to the previous load_weights is False, so this if must be executed if not load_weights: #Save the network structure of the first 10 layers of VGG-16 mod = models.vgg16(pretrained=True) #Call self first_ initialize_ The weights () method is initialized manually self._initialize_weights() #net.state_dict() is a dictionary of all parameters of the network. The key in it is the name of the parameters of each layer of the network, and the value is the Tensor of the encapsulated parameters #dict.items() statement will return a traversable (key, value) tuple. By traversing this tuple, i.e. I access mod. Item by item state_ dict(). items()[i][1]. Data [:], you can traverse all parameters and complete the copy. for i in range(len(self.frontend.state_dict().items())): list(self.frontend.state_dict().items())[i][1].data[:] = list(mod.state_dict().items())[i][1].data[:] #Definition and initialization method of network forward def forward(self, x): x = self.frontend(x) x = self.context(x) x = self.backend(x) x = self.output_layer(x) return x #net.modules() is the list of network layers. We use m to traverse the elements in the list, judge the layer type of M, and then use the functions under init to complete initialization. def _initialize_weights(self): for m in self.modules(): #Normal segment: NN init. normal_ (tensor, mean=0, std=1) if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight, std=0.01) if m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0) #The function defined by the network structure is written outside the definition of the class def make_layers(cfg, in_channels = 3, batch_norm=False, dilation = False): #Judge the size of the hole rate through the division. If False, it is the front end of the network, and the hole rate is 1. If True, it is the back end of the network, and the hole rate is 2. if dilation: d_rate = 2 else: d_rate = 1 layers = [] #Next, traverse the incoming frontend_feat and backend_feat to determine the type of network layer. If'M ', it is MaxPooling layer; #If it is a number, it is a convolution layer. A set of convolution layers includes Conv2d layer, BatchNorm layer and ReLU layer. After traversing, the creation of the front-end and back-end of the network can be completed. for v in cfg: if v == 'M': layers += [nn.MaxPool2d(kernel_size=2, stride=2)] else: conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=d_rate, dilation=d_rate) if batch_norm: layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)] else: #Due to batch_norm=False, execute else statement directly layers += [conv2d, nn.ReLU(inplace=True)] in_channels = v return nn.Sequential(*layers)
7,train.py
It is to train the model to input an image and output the corresponding more accurate density map.
import sys import os import warnings from model import CANNet from utils import save_checkpoint import torch import torch.nn as nn #torch.autograd provides classes and functions for deriving arbitrary scalar functions. from torch.autograd import Variable from torchvision import datasets, transforms import numpy as np import argparse import json import cv2 import dataset import time #The argparse module allows you to receive parameters when the python command line starts #Through argparse Argumentparser() to create a parsing object parser = argparse.ArgumentParser(description='PyTorch CANNet') #Through parser add_ Argument() function to add command line parameters. metavar can change the displayed name, and the help parameter will be displayed when - h or - help is typed on the command line parser.add_argument('train_json', metavar='TRAIN', help='path to train json') parser.add_argument('val_json', metavar='VAL', help='path to val json') def main(): global args, best_prec1 best_prec1 = 1e6 args = parser.parse_args() args.lr = 1e-4 args.batch_size = 1 args.decay = 5*1e-4 args.start_epoch = 0 args.epochs = 1000 args.workers = 4 args.seed = int(time.time()) args.print_freq = 4 #json.load() can read JSON characters from the file #Is to read the image path with open(args.train_json, 'r') as outfile: train_list = json.load(outfile) with open(args.val_json, 'r') as outfile: val_list = json.load(outfile) #Set a random seed for the current GPU. It should be used when there are multiple GPUs torch.cuda.manual_seed(args.seed) #Define network object model model = CANNet() #Transfer the model to GPU model = model.cuda() #Define the error function as MSE (mean square error) criterion = nn.MSELoss(size_average=False).cuda() #The optimizer is defined, which adopts the random gradient descent method to provide the learning rate, momentum and decay rate optimizer = torch.optim.Adam(model.parameters(), args.lr, weight_decay=args.decay) #Loop, step by step training network for epoch in range(args.start_epoch, args.epochs): #Network forward propagation and error back propagation train(train_list, model, criterion, optimizer, epoch) #Accuracy detection function prec1 = validate(val_list, model, criterion) #Determine whether the MAE returned by validate is optimal is_best = prec1 < best_prec1 #Save the optimal MEA and output it to the screen best_prec1 = min(prec1, best_prec1) print(' * best MAE {mae:.3f} ' .format(mae=best_prec1)) #state_dict is a list, which contains all the parameter dictionaries of epoch, network and optimizer and the optimal MAE save_checkpoint({ 'state_dict': model.state_dict(), }, is_best) #It mainly realizes the train function and is also the core part of training. It mainly completes the definition of training batch data, network forward propagation and error inverse propagation def train(train_list, model, criterion, optimizer, epoch): losses = AverageMeter() batch_time = AverageMeter() data_time = AverageMeter() #Use torch utils. data. The dataloader () method creates the training batch data. The first parameter is the dataset, which is already in dataset The PY file is defined #train_list is the address of the training picture, shuffle is True, and transform the picture to Normalize train_loader = torch.utils.data.DataLoader( dataset.listDataset(train_list, shuffle=True, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]), train=True, seen=model.seen, batch_size=args.batch_size, num_workers=args.workers), batch_size=args.batch_size) print('epoch %d, processed %d samples, lr %.10f' % (epoch, epoch * len(train_loader.dataset), args.lr)) #Our network contains BN, so we need to declare model before training train() model.train() #time.time() returns the timestamp of the current time (the number of floating-point seconds elapsed after the 1970 era). end = time.time() #Use enumerate to read the training batch data together with the subscript. img is the picture and target is the real density map (end-to-end) for i, (img, target) in enumerate(train_loader): data_time.update(time.time() - end) #Forward and reverse propagation process #Transfer img to GPU img = img.cuda() #Declare img as a Variable variable. To use automatic derivation, you can include all tensor s in the Variable object. img = Variable(img) #img is introduced into the network and the predicted density map is obtained output = model(img)[:, 0, :, :] #True density map target = target.type(torch.FloatTensor).cuda() target = Variable(target) #Compare output with target to get loss (loss function) loss = criterion(output, target) #Update loss every time losses.update(loss.item(), img.size(0)) #Gradient cleaning optimizer.zero_grad() #Gradient calculation loss.backward() #Update network parameters according to gradient optimizer.step() #Calculate the time of this training batch_time.update(time.time() - end) #Record end time end = time.time() if i % args.print_freq == 0: print('Epoch: [{0}][{1}/{2}]\t' 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' .format( epoch, i, len(train_loader), batch_time=batch_time, data_time=data_time, loss=losses)) #Complete the establishment of batch data test def validate(val_list, model, criterion): print('begin val') val_loader = torch.utils.data.DataLoader( dataset.listDataset(val_list, shuffle=False, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]), train=False), batch_size=1) #Declare model eval() model.eval() # Initialize mae mae = 0 #Training network for i, (img, target) in enumerate(val_loader): #img.shape [] get the height and width of the picture h, w = img.shape[2:4] #Cut the image into four parts h_d = h//2 w_d = w//2 ##Declare img as a Variable variable. To use automatic derivation, you can include all tensor s in the Variable object. img_1 = Variable(img[:, :, :h_d, :w_d].cuda()) img_2 = Variable(img[:, :, :h_d, w_d:].cuda()) img_3 = Variable(img[:, :, h_d:, :w_d].cuda()) img_4 = Variable(img[:, :, h_d:, w_d:].cuda()) #Find the density map of each block density_1 = model(img_1).data.cpu().numpy() density_2 = model(img_2).data.cpu().numpy() density_3 = model(img_3).data.cpu().numpy() density_4 = model(img_4).data.cpu().numpy() #The predicted density map is obtained by adding pred_sum = density_1.sum()+density_2.sum()+density_3.sum()+density_4.sum() #Calculate loss function mae += abs(pred_sum-target.sum()) mae = mae/len(val_loader) print(' * MAE {mae:.3f} ' .format(mae=mae)) return mae #Used to encapsulate statistics and calculate them class AverageMeter(object): """Computes and stores the average and current value""" def __init__(self): self.reset() def reset(self): #variance self.val = 0 #average self.avg = 0 #and self.sum = 0 #quantity self.count = 0 #Methods for calculating these quantities are provided in update() def update(self, val, n=1): self.val = val self.sum += val * n self.count += n self.avg = self.sum / self.count if __name__ == '__main__': main()
8,test.py
Finally, there is the last file. This file is to generate the density map from the trained model, and then calculate the number of people.
import h5py import PIL.Image as Image import numpy as np import os import glob import scipy from image import * from model import CANNet import torch from torch.autograd import Variable from sklearn.metrics import mean_squared_error,mean_absolute_error from torchvision import transforms #torchvision.transforms.Compose() class, which is mainly used to concatenate multiple image transformations. transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) # the folder contains all the test images img_folder = './data/part_B_final/test_data/images' img_paths = [] #glob.glob() function: returns a list of all matching file paths #append() function: used to add a new object at the end of the list. for img_path in glob.glob(os.path.join(img_folder, '*.jpg')): img_paths.append(img_path) model = CANNet() model = model.cuda() #Import the best model file checkpoint = torch.load('model_best.pth.tar') #model.state_dict() is a shallow copy, which copies the outermost values and pointers. It does not copy deeper objects, that is, it only copies the parent object #model.load_state_dict() is a deep copy, which copies the value, pointer and the deep memory space pointed by the pointer, and copies the parent object and its children model.load_state_dict(checkpoint['state_dict']) #Declare model eval() model.eval() pred= [] gt = [] #Traverse test image for i in range(len(img_paths)): #Convert to GPU available img = transform(Image.open(img_paths[i]).convert('RGB')).cuda() #Add a dimension #It is often used for CNN, because the input of conv2d must be four-dimensional (batch,channel,height,width). If the input is text, it is usually only three-dimensional (batch,length,dim) # Therefore, it is necessary to unsqueeze(1) and add one-dimensional channel to perform convolution operation (I don't understand why it is 0 in the code here) img = img.unsqueeze(0) # img.shape [] get the height and width of the picture h, w = img.shape[2:4] h_d = h//2 w_d = w//2 img_1 = Variable(img[:, :, :h_d, :w_d].cuda()) img_2 = Variable(img[:, :, :h_d, w_d:].cuda()) img_3 = Variable(img[:, :, h_d:, :w_d].cuda()) img_4 = Variable(img[:, :, h_d:, w_d:].cuda()) density_1 = model(img_1).data.cpu().numpy() density_2 = model(img_2).data.cpu().numpy() density_3 = model(img_3).data.cpu().numpy() density_4 = model(img_4).data.cpu().numpy() #os.path.splitext() splits the path and returns the tuple of the path name and file extension pure_name = os.path.splitext(os.path.basename(img_paths[i]))[0] # Generate h5py file: create an h5py file for the picture and open it as GT in the program_ file gt_file = h5py.File(img_paths[i].replace('.jpg', '.h5').replace('images', 'ground_truth'), 'r') #Read the density map corresponding to the picture in the file and convert it to numpy format groundtruth = np.asarray(gt_file['density']) #Predict the number of people (the density map is a point representing a person, which is 1 in the matrix, sum () is to add and sum all the elements of the matrix, and all the calculations are the number of people) pred_sum = density_1.sum()+density_2.sum()+density_3.sum()+density_4.sum() pred.append(pred_sum) #Direct summation of real numbers gt.append(np.sum(groundtruth)) #Calculate the average absolute error mae = mean_absolute_error(pred, gt) #Calculated root mean square error rmse = np.sqrt(mean_squared_error(pred, gt)) print('pred:', pred) print('gt:', gt) print('MAE: ', mae) print('RMSE: ', rmse)
Postscript
Feel model I understand the basic code of Py, but I still don't quite understand what's going on. I mainly don't understand the convolution operation.