Preface: This paper shares a visualization code of feature map I use.
Welcome to the official account. CV technical guide , focusing on the technical summary of computer vision, the latest technology tracking, the interpretation of classic papers and CV recruitment information.
The words written in the front
Feature graph visualization is a work that many papers need to do. Its role can be used to prove the effectiveness of the method, or to increase the workload and add words to the paper.
Specifically, it is to visualize the difference between the two diagrams before and after using the new method, and then look at the diagram and write a paper to explain the function of the new method.
The first grade cannot read what the author of primary school is, but the visual map is changed, but it doesn't understand what this change shows. Anyway, make complaints about it, forcing it to the story of its own new method, like the composition of the first grade in primary school -- reading a picture and writing a composition.
I knew there was a very hot topic on the. If I made a little improvement on the baseline, but it had great effect, can I write a paper?
The biggest problem in this situation is how to write more than seven pages. A little improvement may take less than one page to write ideas, formula reasoning, drawing and other contents. How to do the rest? Visual feature map!!!
This can be reflected in many papers I have read. Anyway, I didn't understand the visualization given in the paper, but the author can talk so much. This should be used to increase the number of words and workload of the paper.
In a word, visual feature map is a very important work, and it is best to be able to do it.
Initialize configuration
This part first completes loading data, modifying network, defining network and loading pre training model.
Load data and preprocess
Only one picture is loaded here, so you don't have to go through the classdataset, because the classdataset is for a large amount of data. It generates an iterator to send the pictures to the network batch by batch. However, we still need to complete the data preprocessing part of the classdataset.
The necessary operations for data preprocessing are resizing, converting to Tensor format and normalization. As for other data enhancement or preprocessing operations, add them as needed.
def image_proprecess(img_path): img = Image.open(img_path) data_transforms = transforms.Compose([ transforms.Resize((384, 384), interpolation=3), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) data = data_transforms(img) data = torch.unsqueeze(data,0) return data
Since only one image is loaded here, torch. Is used later Unsqueeze transforms a three-dimensional tensor into a four-dimensional tensor.
Modify network
If you want to visualize the characteristic diagram of a layer, you need to return the characteristic diagram of that layer, so you need to modify the forward function in the network first. The specific modification method is as follows.
def forward(self, x): x = self.model.conv1(x) x = self.model.bn1(x) x = self.model.relu(x) x = self.model.maxpool(x) feature = self.model.layer1(x) x = self.model.layer2(feature) x = self.model.layer3(x) x = self.model.layer4(x) return feature,x
Define the network and load the pre training model
def Init_Setting(epoch): dirname = '/mnt/share/VideoReID/share/models/Methods5_trial1' model = siamese_resnet50(701, stride=1, pool='avg') trained_path = os.path.join(dirname, 'net_%03d.pth' % epoch) print("load %03d.pth" % epoch) model.load_state_dict(torch.load(trained_path)) model = model.cuda().eval() return model
What needs to be explained in this part is the last line, which sets the network to reasoning mode.
Visual feature map
This part mainly transforms a channel of the feature map into a graph for visualization.
def visualize_feature_map(img_batch,out_path,type,BI): feature_map = torch.squeeze(img_batch) feature_map = feature_map.detach().cpu().numpy() feature_map_sum = feature_map[0, :, :] feature_map_sum = np.expand_dims(feature_map_sum, axis=2) for i in range(0, 2048): feature_map_split = feature_map[i,:, :] feature_map_split = np.expand_dims(feature_map_split,axis=2) if i > 0: feature_map_sum +=feature_map_split feature_map_split = BI.transform(feature_map_split) plt.imshow(feature_map_split) plt.savefig(out_path + str(i) + "_{}.jpg".format(type) ) plt.xticks() plt.yticks() plt.axis('off') feature_map_sum = BI.transform(feature_map_sum) plt.imshow(feature_map_sum) plt.savefig(out_path + "sum_{}.jpg".format(type)) print("save sum_{}.jpg".format(type))
Let's explain it line by line.
1. Parameter img_batch is the characteristic diagram transmitted from a certain layer in the network. BI is a bilinear interpolation function. It is user-defined. It will be discussed below.
2. Since only one picture is visualized, img_batch is four-dimensional, and the batch size dimension is 1. The third line gets it from the GPU to the CPU and changes it to numpy format.
3. The remaining part mainly completes the transformation of each channel into a graph, and adds the corresponding positions of each element of all channels and saves them.
bilinear interpolation
Due to repeated network downsampling, the feature map of the later layer often becomes only 7x7,16x16 in size. It is very small after visualization, so it needs to be up sampled. Here, the sampling method is bilinear interpolation. Therefore, here is a bilinear interpolation code.
class BilinearInterpolation(object): def __init__(self, w_rate: float, h_rate: float, *, align='center'): if align not in ['center', 'left']: logging.exception(f'{align} is not a valid align parameter') align = 'center' self.align = align self.w_rate = w_rate self.h_rate = h_rate def set_rate(self,w_rate: float, h_rate: float): self.w_rate = w_rate #Zoom rate of w + self.h_rate = h_rate #Scaling rate of h + #The coordinates of the original image are obtained from the transformed pixel coordinates def get_src_h(self, dst_i,source_h,goal_h) -> float: if self.align == 'left': #Align top left corner src_i = float(dst_i * (source_h/goal_h)) elif self.align == 'center': #The geometric centers of the two images coincide. src_i = float((dst_i + 0.5) * (source_h/goal_h) - 0.5) src_i += 0.001 src_i = max(0.0, src_i) src_i = min(float(source_h - 1), src_i) return src_i #The coordinates of the original image are obtained from the transformed pixel coordinates def get_src_w(self, dst_j,source_w,goal_w) -> float: if self.align == 'left': #Align top left corner src_j = float(dst_j * (source_w/goal_w)) elif self.align == 'center': #The geometric centers of the two images coincide. src_j = float((dst_j + 0.5) * (source_w/goal_w) - 0.5) src_j += 0.001 src_j = max(0.0, src_j) src_j = min((source_w - 1), src_j) return src_j def transform(self, img): source_h, source_w, source_c = img.shape # (235, 234, 3) goal_h, goal_w = round( source_h * self.h_rate), round(source_w * self.w_rate) new_img = np.zeros((goal_h, goal_w, source_c), dtype=np.uint8) for i in range(new_img.shape[0]): # h src_i = self.get_src_h(i,source_h,goal_h) for j in range(new_img.shape[1]): src_j = self.get_src_w(j,source_w,goal_w) i2 = ceil(src_i) i1 = int(src_i) j2 = ceil(src_j) j1 = int(src_j) x2_x = j2 - src_j x_x1 = src_j - j1 y2_y = i2 - src_i y_y1 = src_i - i1 new_img[i, j] = img[i1, j1]*x2_x*y2_y + img[i1, j2] * \ x_x1*y2_y + img[i2, j1]*x2_x*y_y1 + img[i2, j2]*x_x1*y_y1 return new_img #usage method BI = BilinearInterpolation(8, 8) feature_map = BI.transform(feature_map)
main function flow
The code of each part is introduced above, and the following is the overall process. Relatively simple.
imgs_path = "/path/to/imgs/" save_path = "/save/path/to/output/" model = Init_Setting(120) BI = BilinearInterpolation(8, 8) data = image_proprecess(out_path + "0836.jpg") data = data.cuda() output, _ = model(data) visualize_feature_map(output, save_path, "drone", BI)
Visual effect drawing
Welcome to the official account. CV technical guide , focusing on the technical summary of computer vision, the latest technology tracking, the interpretation of classic papers and CV recruitment information.
In the official account, you can get the introductory guide to computer.
CV technical guide creates a group with a good communication atmosphere. Except for too remote questions, almost all questions are answered. Pay attention to the official account, add the edited micro signal, invite inviting and exchanging groups.
Other articles
Summary of frame position optimization in target detection
Tutorial of building pytoch model from zero (I) data reading
Overview of self encoder: concept, illustration and Application
ICLR2022 | cosformer: Rethinking softmax in attention
ICLR2022 | ViDT: an effective and efficient pure transformer target detector
Some personal thinking habits and thought summary about learning a new technology or field quickly
Panoptic SegFormer: a general framework for end-to-end Transformer panoramic segmentation
Cvpr2021 | trivialagent: SOTA data enhancement strategy without tuning
AAAI2021 | dynamic Anchor learning in arbitrary direction target detection
ICCV2021 | learning spatiotemporal transformer for visual tracking
ICCV2021 𞓜 progressive style Vision Transformer
MobileVIT: lightweight visual Transformer + mobile terminal deployment
ICCV2021 | SOTR: use transformer to segment objects
Iccv2021 | PNP Detr: efficient visual analysis with Transformer
Reflection and improvement of relative position coding in ICCV2021 | Vision Transformer
ICCV2021 | rethinking the spatial dimension of visual transformers
CVPR2021 | TransCenter: transformer for multi-target tracking algorithm
CVPR2021 | new way of feature pyramid YOLOF
CVPR2021 𞓜 rethinking Batch in BatchNorm