A visual feature map code

Posted by stephenlk on Wed, 09 Mar 2022 05:23:09 +0100

Preface: This paper shares a visualization code of feature map I use.

 

Welcome to the official account. CV technical guide , focusing on the technical summary of computer vision, the latest technology tracking, the interpretation of classic papers and CV recruitment information.

 

The words written in the front

Feature graph visualization is a work that many papers need to do. Its role can be used to prove the effectiveness of the method, or to increase the workload and add words to the paper.

Specifically, it is to visualize the difference between the two diagrams before and after using the new method, and then look at the diagram and write a paper to explain the function of the new method.

The first grade cannot read what the author of primary school is, but the visual map is changed, but it doesn't understand what this change shows. Anyway, make complaints about it, forcing it to the story of its own new method, like the composition of the first grade in primary school -- reading a picture and writing a composition.

I knew there was a very hot topic on the. If I made a little improvement on the baseline, but it had great effect, can I write a paper?

The biggest problem in this situation is how to write more than seven pages. A little improvement may take less than one page to write ideas, formula reasoning, drawing and other contents. How to do the rest? Visual feature map!!!

This can be reflected in many papers I have read. Anyway, I didn't understand the visualization given in the paper, but the author can talk so much. This should be used to increase the number of words and workload of the paper.

In a word, visual feature map is a very important work, and it is best to be able to do it.

 

Initialize configuration

This part first completes loading data, modifying network, defining network and loading pre training model.

Load data and preprocess

Only one picture is loaded here, so you don't have to go through the classdataset, because the classdataset is for a large amount of data. It generates an iterator to send the pictures to the network batch by batch. However, we still need to complete the data preprocessing part of the classdataset.

The necessary operations for data preprocessing are resizing, converting to Tensor format and normalization. As for other data enhancement or preprocessing operations, add them as needed.

def image_proprecess(img_path):
    img = Image.open(img_path)
    data_transforms = transforms.Compose([
        transforms.Resize((384, 384), interpolation=3),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])
    data = data_transforms(img)
    data = torch.unsqueeze(data,0)
    return data

Since only one image is loaded here, torch. Is used later Unsqueeze transforms a three-dimensional tensor into a four-dimensional tensor.

 

Modify network

If you want to visualize the characteristic diagram of a layer, you need to return the characteristic diagram of that layer, so you need to modify the forward function in the network first. The specific modification method is as follows.

def forward(self, x):
    x = self.model.conv1(x)
    x = self.model.bn1(x)
    x = self.model.relu(x)
    x = self.model.maxpool(x)
    feature = self.model.layer1(x)
    x = self.model.layer2(feature)
    x = self.model.layer3(x)
    x = self.model.layer4(x)
    return feature,x

 

Define the network and load the pre training model

def Init_Setting(epoch):
    dirname = '/mnt/share/VideoReID/share/models/Methods5_trial1'
    model = siamese_resnet50(701, stride=1, pool='avg')
    trained_path = os.path.join(dirname, 'net_%03d.pth' % epoch)
    print("load %03d.pth" % epoch)
    model.load_state_dict(torch.load(trained_path))
    model = model.cuda().eval()
    return model

What needs to be explained in this part is the last line, which sets the network to reasoning mode.

 

Visual feature map

This part mainly transforms a channel of the feature map into a graph for visualization.

def visualize_feature_map(img_batch,out_path,type,BI):
    feature_map = torch.squeeze(img_batch)
    feature_map = feature_map.detach().cpu().numpy()

    feature_map_sum = feature_map[0, :, :]
    feature_map_sum = np.expand_dims(feature_map_sum, axis=2)
    for i in range(0, 2048):
        feature_map_split = feature_map[i,:, :]
        feature_map_split = np.expand_dims(feature_map_split,axis=2)
        if i > 0:
            feature_map_sum +=feature_map_split
        feature_map_split = BI.transform(feature_map_split)

        plt.imshow(feature_map_split)
        plt.savefig(out_path + str(i) + "_{}.jpg".format(type) )
        plt.xticks()
        plt.yticks()
        plt.axis('off')

    feature_map_sum = BI.transform(feature_map_sum)
    plt.imshow(feature_map_sum)
    plt.savefig(out_path + "sum_{}.jpg".format(type))
    print("save sum_{}.jpg".format(type))

Let's explain it line by line.

1. Parameter img_batch is the characteristic diagram transmitted from a certain layer in the network. BI is a bilinear interpolation function. It is user-defined. It will be discussed below.

2. Since only one picture is visualized, img_batch is four-dimensional, and the batch size dimension is 1. The third line gets it from the GPU to the CPU and changes it to numpy format.

3. The remaining part mainly completes the transformation of each channel into a graph, and adds the corresponding positions of each element of all channels and saves them.

 

bilinear interpolation

Due to repeated network downsampling, the feature map of the later layer often becomes only 7x7,16x16 in size. It is very small after visualization, so it needs to be up sampled. Here, the sampling method is bilinear interpolation. Therefore, here is a bilinear interpolation code.

class BilinearInterpolation(object):
    def __init__(self, w_rate: float, h_rate: float, *, align='center'):
        if align not in ['center', 'left']:
            logging.exception(f'{align} is not a valid align parameter')
            align = 'center'
        self.align = align
        self.w_rate = w_rate
        self.h_rate = h_rate

    def set_rate(self,w_rate: float, h_rate: float):
        self.w_rate = w_rate    #Zoom rate of w +
        self.h_rate = h_rate    #Scaling rate of h +

    #The coordinates of the original image are obtained from the transformed pixel coordinates
    def get_src_h(self, dst_i,source_h,goal_h) -> float:
        if self.align == 'left':
            #Align top left corner
            src_i = float(dst_i * (source_h/goal_h))
        elif self.align == 'center':
            #The geometric centers of the two images coincide.
            src_i = float((dst_i + 0.5) * (source_h/goal_h) - 0.5)
        src_i += 0.001
        src_i = max(0.0, src_i)
        src_i = min(float(source_h - 1), src_i)
        return src_i
    #The coordinates of the original image are obtained from the transformed pixel coordinates
    def get_src_w(self, dst_j,source_w,goal_w) -> float:
        if self.align == 'left':
            #Align top left corner
            src_j = float(dst_j * (source_w/goal_w))
        elif self.align == 'center':
            #The geometric centers of the two images coincide.
            src_j = float((dst_j + 0.5) * (source_w/goal_w) - 0.5)
        src_j += 0.001
        src_j = max(0.0, src_j)
        src_j = min((source_w - 1), src_j)
        return src_j

    def transform(self, img):
        source_h, source_w, source_c = img.shape  # (235, 234, 3)
        goal_h, goal_w = round(
            source_h * self.h_rate), round(source_w * self.w_rate)
        new_img = np.zeros((goal_h, goal_w, source_c), dtype=np.uint8)

        for i in range(new_img.shape[0]):       # h
            src_i = self.get_src_h(i,source_h,goal_h)
            for j in range(new_img.shape[1]):
                src_j = self.get_src_w(j,source_w,goal_w)
                i2 = ceil(src_i)
                i1 = int(src_i)
                j2 = ceil(src_j)
                j1 = int(src_j)
                x2_x = j2 - src_j
                x_x1 = src_j - j1
                y2_y = i2 - src_i
                y_y1 = src_i - i1
                new_img[i, j] = img[i1, j1]*x2_x*y2_y + img[i1, j2] * \
                    x_x1*y2_y + img[i2, j1]*x2_x*y_y1 + img[i2, j2]*x_x1*y_y1
        return new_img
#usage method
BI = BilinearInterpolation(8, 8)
feature_map = BI.transform(feature_map)

 

main function flow

The code of each part is introduced above, and the following is the overall process. Relatively simple.

imgs_path = "/path/to/imgs/"
save_path = "/save/path/to/output/"
model = Init_Setting(120)
BI = BilinearInterpolation(8, 8)

data = image_proprecess(out_path + "0836.jpg")
data = data.cuda()
output, _ = model(data)
visualize_feature_map(output, save_path, "drone", BI)

 

Visual effect drawing

Welcome to the official account. CV technical guide , focusing on the technical summary of computer vision, the latest technology tracking, the interpretation of classic papers and CV recruitment information.

In the official account, you can get the introductory guide to computer.

CV technical guide creates a group with a good communication atmosphere. Except for too remote questions, almost all questions are answered. Pay attention to the official account, add the edited micro signal, invite inviting and exchanging groups.

 

Other articles

Summary of frame position optimization in target detection

Tutorial of building pytoch model from zero (I) data reading

Overview of self encoder: concept, illustration and Application

To solve the real problem of landing scene in image segmentation, Hong Kong Chinese proposed: open world entity segmentation

Summary of anchor free application methods for target detection, instance segmentation and multi-target tracking

ICLR2022 | cosformer: Rethinking softmax in attention

ICLR2022 | ViDT: an effective and efficient pure transformer target detector

Some personal thinking habits and thought summary about learning a new technology or field quickly

Panoptic SegFormer: a general framework for end-to-end Transformer panoramic segmentation

Cvpr2021 | trivialagent: SOTA data enhancement strategy without tuning

ICCV2021 𞓜 a simple and effective long tail visual recognition scheme: distillation self supervision (SSD)

AAAI2021 | dynamic Anchor learning in arbitrary direction target detection

ICCV2021 | learning spatiotemporal transformer for visual tracking

ICCV2021 𞓜 progressive style Vision Transformer

MobileVIT: lightweight visual Transformer + mobile terminal deployment

ICCV2021 | SOTR: use transformer to segment objects

Iccv2021 | PNP Detr: efficient visual analysis with Transformer

Reflection and improvement of relative position coding in ICCV2021 | Vision Transformer

ICCV2021 | rethinking the spatial dimension of visual transformers

CVPR2021 | TransCenter: transformer for multi-target tracking algorithm

CVPR2021 | new way of feature pyramid YOLOF

CVPR2021 𞓜 rethinking Batch in BatchNorm