yolov4 project record 4- test process

Posted by py343 on Mon, 03 Jan 2022 23:57:01 +0100

catalogue

1, Overview

2, Test process

1. Parameter preparation

2. Define the model

3. Obtain necessary data

4. Input model

5. Anchor frame screening

① Screening by Object Confidence

② Get type

③ Sort by Object Confidence

④ Non maximal inhibition

6. Draw the external frame

3, Code summary

1, Overview

Before training, we first take the pre training data and enter a diagram to see what the specific test process is. Here, I'd like to show the used code in blocks, and finally put it together.

2, Test process

1. Parameter preparation

In the class, we need to define the model, pre training parameters, loading functions, decoding layer, and other methods that need to be used. We can prepare keyword parameters in advance and input them into the model.

The first three are pre training parameters, anchor box size and file paths of all categories. model_image_size is the size of the input picture, confidence is the threshold used to filter the confidence of object detection. Only when the confidence is greater than this threshold will it be left. cuda is the device parameter.

    params = {
        "model_path": 'pth/yolo4_weights_my.pth',
        "anchors_path": 'work_dir/yolo_anchors_coco.txt',
        "classes_path": 'work_dir/coco_classes.txt',
        "model_image_size": (608, 608, 3),
        "confidence": 0.4,
        "cuda": True
    }

    model = Inference(**params)

2. Define the model

After preparing the parameters, we define the influence class as the model. In this class, we need to import the pre training parameters and initialize the yolo model and yolo decoding model that will be used.

YoloBody is the backbone+neck+head of our model. We only need to provide the number of input channels and the number of output categories. The final output should be three, namely (1255,19,19), (1255,38,38), (1255,76,76).

255 in the middle is 3 * (4 + 1 + 80), because we have 80 categories, which have been recorded in the previous article, and the details will not be recorded here. We will decode these three.

Yolo layer is our decoding layer. There are records in front, too__ init__ We can complete the initialization by putting in the required parameters. So we're here__ init__ Inside, I got self Net, and self yolo_ Decodes, we will use these two models to run testing and decoding. We just need to define functions below to call them.

Contents to be filled in: picture size, anchor frame mask, used to filter anchor frames, number of categories, a priori anchor frame size, a priori anchor frame number and scaling factor.

class Inference(object):
    # ---------------------------------------------------#
    #   Initialize the model and parameters and import the trained weights
    # ---------------------------------------------------#
    def __init__(self, **kwargs):
        self.model_path = kwargs['model_path']
        self.anchors_path = kwargs['anchors_path']
        self.classes_path = kwargs['classes_path']
        self.model_image_size = kwargs['model_image_size']
        self.confidence = kwargs['confidence']
        self.cuda = kwargs['cuda']

        self.class_names = self.get_class()
        self.anchors = self.get_anchors()
        print(self.anchors)
        # =================Here is the initialization model
        self.net = YoloBody(3, len(self.class_names)).eval()
        self.load_model_pth(self.net, self.model_path)

        if self.cuda:
            self.net = self.net.cuda()
            self.net.eval()

        print('Finished!')

        self.yolo_decodes = []
        anchor_masks = [[0,1,2],[3,4,5],[6,7,8]]
        # =================Here is the initialization decoding part. Since there are three outputs, three decoding models are required
        for i in range(3):
            head = YoloLayer(self.model_image_size, anchor_masks, len(self.class_names),
                                               self.anchors, len(self.anchors)//2).eval()
            self.yolo_decodes.append(head)


        print('{} model, anchors, and classes loaded.'.format(self.model_path))

3. Obtain necessary data

Since you want to test, you must input the picture data and process the picture. For example, the size is changed to 608 * 608 or other multiples of 3 required by the model (because the full connection layer is not used in the model, but the full connection layer is replaced by full convolution, so there is no fixed requirement for the size).

To predict categories, we need to prepare category names in advance.

The following two functions are used to obtain category data and picture data respectively. Picture data is used to input the model for prediction, and category data is used for filtering and labeling.

def load_class_names(namesfile):
    class_names = []
    with open(namesfile, 'r') as fp:
        lines = fp.readlines()
    for line in lines:
        line = line.rstrip()
        class_names.append(line)
    return class_names

def detect_image(self, image_src):
    h, w, _ = image_src.shape
    image = cv2.resize(image_src, (608, 608))
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    img = np.array(image, dtype=np.float32)
    img = np.transpose(img / 255.0, (2, 0, 1))
    images = np.asarray([img])

4. Input model

with torch.no_grad():
    images = torch.from_numpy(images)
    if self.cuda:
        images = images.cuda()
        outputs = self.net(images)

With the output result, you can input the decoding module to obtain all anchor box information. After getting the anchor box information, we put all together. Therefore, there is only one: (1, 22743, 85). There are more than 20000 anchor boxes and the information they carry, which should be screened from here.

output_list = []
for i in range(3):
    output_list.append(self.yolo_decodes[i](outputs[i]))
output = torch.cat(output_list, 1)
print(output.shape)

5. Anchor frame screening

① Screening by Object Confidence

Object confidence is the number with index 4 in the last dimension. It is used to compare with the set threshold. Here, 0.5 is set, and only the information of the anchor box with object confidence greater than 0.5 is retained. After filtering here, there are only 17 anchor boxes left, so the image here_ The pred shape is (17, 85).

def non_max_suppression(prediction, num_classes, conf_thres=0.5, nms_thres=0.4):
    # Find the upper left corner and lower right corner
    box_corner = prediction.new(prediction.shape)
    box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
    box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
    box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
    box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
    prediction[:, :, :4] = box_corner[:, :, :4]

    output = [None for _ in range(len(prediction))]
    for image_i, image_pred in enumerate(prediction):
        # First round screening using Object Confidence
        conf_mask = (image_pred[:, 4] >= conf_thres).squeeze()
        # ================Get the filtered anchor box
        image_pred = image_pred[conf_mask]

        if not image_pred.size(0):
            continue

        # Obtain the category and its confidence, obtain the classification confidence value and the corresponding index
        class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)

② Get type

Multiple objects may be predicted in a picture, that is, there will be multiple categories corresponding to multiple anchor boxes. Therefore, we need to obtain all categories contained in the predicted content, and filter the anchor boxes for each category again.

In the last line of the above code, the maximum value of all categories is obtained through torch Max, you can get the maximum value and index, that is, we get the "category index class_pred with the largest category prediction score" carried in the 17 anchor boxes and the corresponding confidence class_conf.

Here, the obtained indexes and scores are spliced with the previous anchor box information to obtain a 7-dimensional vector. According to the above code, we have replaced the values of the first four dimensions with X1 and Y1 coordinates in the upper left corner and X2 and Y2 coordinates in the lower right corner. Therefore, what we get here is (x1,y1, x2,y2, obj_conf, class_conf, class_pred).

And the last dimension is class_pred is the index of the category. If you remove a unique to it, you will get all the prediction categories. There are three categories, namely (1,7,16).

# The obtained contents are (x1, y1, x2, y2, obj_conf, class_conf, class_pred)
detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)

# Acquisition type
unique_labels = detections[:, -1].cpu().unique()

if prediction.is_cuda:
    unique_labels = unique_labels.cuda()

③ Sort by Object Confidence

First get all the prediction results of a certain class, and then put the object existence confidence here in order. Through torch After sort, we get the sorted confidence and their index. We use their index to sort all the prediction results of this category.

Here, there are four prediction results for the first type, so the shape is (4,7).

for c in unique_labels:
    # Obtain all the prediction results after a certain type of preliminary screening
    detections_class = detections[detections[:, -1] == c]
    # Sort according to the confidence of existing objects
    _, conf_sort_index = torch.sort(detections_class[:, 4], descending=True)
    detections_class = detections_class[conf_sort_index]

④ Non maximal inhibition

These four boxes are all boxes that predict the same category, so there must be redundant boxes. Therefore, take the box with the highest confidence first, and calculate the IOU for the next three. We set nms_thresh is 0.4. As long as it is greater than this threshold, it indicates that it is redundant. Remove it and only keep the largest box. If it is smaller than this, it means that the intersection of the two boxes is very small, so it is reserved.

Select the final anchor box and relevant information, and put it into the output. The output is (3,7), that is, three categories, and the anchor box information corresponding to each category.

max_detections = []
while detections_class.size(0):
    # Take out the one with the highest confidence and judge step by step to judge whether the coincidence degree is greater than nms_thres, if yes, remove
    max_detections.append(detections_class[0].unsqueeze(0))
    if len(detections_class) == 1:
        break
    ious = bbox_iou(max_detections[-1], detections_class[1:])
    detections_class = detections_class[1:][ious < nms_thres]

# Stack
max_detections = torch.cat(max_detections).data
# Add max detections to outputs
output[image_i] = max_detections if output[image_i] is None else torch.cat(
    (output[image_i], max_detections))

IOU calculation process:

We already have x1,y1 in the upper left corner and x2,y2 in the lower right corner.

Because we want to calculate the area of the intersection, and the origin of the coordinate system is in the upper left corner, we take the maximum value of the upper left corner of the two bounding boxes and the minimum value of the lower right corner, so we get the upper left corner and lower right corner of the intersection.

However, if you directly subtract the upper left corner and lower right corner of the intersection, negative numbers may appear, so use torch Clamp specifies that the minimum value of subtraction is 0, that is, if the maximum value of one box x minus the minimum value of the other box x is a negative number and the two boxes do not intersect, the result is 0. The same goes for y.

Multiply the two to get the area of the intersection box. The area of union is the superposition of the two areas minus the area of intersection.

Intersection / Union, so you get the IOU values of the first box and all three subsequent boxes.

def bbox_iou(box1, box2, x1y1x2y2=True):
    """
        calculation IOU
    """
    if not x1y1x2y2:
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]

    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)

    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1e-3, min=0) * \
                 torch.clamp(inter_rect_y2 - inter_rect_y1 + 1e-3, min=0)

    b1_area = (b1_x2 - b1_x1 + 1e-3) * (b1_y2 - b1_y1 + 1e-3)
    b2_area = (b2_x2 - b2_x1 + 1e-3) * (b2_y2 - b2_y1 + 1e-3)

    iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)

    return iou

6. Draw the external frame

Pass the picture information, anchor box information, category name and saved file name into the function to draw and save the external box through cv2 related interfaces.

def plot_boxes_cv2(img, boxes, savename=None, class_names=None, color=None):
    img = np.copy(img)
    colors = np.array([[1, 0, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 0], [1, 0, 0]], dtype=np.float32)

    def get_color(c, x, max_val):
        ratio = float(x) / max_val * 5
        i = int(math.floor(ratio))
        j = int(math.ceil(ratio))
        ratio = ratio - i
        r = (1 - ratio) * colors[i][c] + ratio * colors[j][c]
        return int(r * 255)

    width = img.shape[1]
    height = img.shape[0]
    for i in range(len(boxes)):
        box = boxes[i]
        x1 = int(box[0] * width)
        y1 = int(box[1] * height)
        x2 = int(box[2] * width)
        y2 = int(box[3] * height)

        if color:
            rgb = color
        else:
            rgb = (255, 0, 0)
        if len(box) >= 7 and class_names:
            cls_conf = box[5]
            cls_id = box[6]
            # print('%s: %f' % (class_names[cls_id], cls_conf))
            classes = len(class_names)
            offset = cls_id * 123457 % classes
            red = get_color(2, offset, classes)
            green = get_color(1, offset, classes)
            blue = get_color(0, offset, classes)
            if color is None:
                rgb = (red, green, blue)
            img = cv2.putText(img, class_names[int(cls_id)], (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1.2, rgb, 2)
        img = cv2.rectangle(img, (x1, y1), (x2, y2), rgb, 3)
    if savename:
        print("save plot results to %s" % savename)
        cv2.imwrite(savename, img)
    return img

3, Code summary

So far, we have completed the testing process. All the code is divided into two parts, model and tool. as follows

class Inference(object):
    # ---------------------------------------------------#
    #   Initialize the model and parameters and import the trained weights
    # ---------------------------------------------------#
    def __init__(self, **kwargs):
        self.model_path = kwargs['model_path']
        self.anchors_path = kwargs['anchors_path']
        self.classes_path = kwargs['classes_path']
        self.model_image_size = kwargs['model_image_size']
        self.confidence = kwargs['confidence']
        self.cuda = kwargs['cuda']

        self.class_names = self.get_class()
        self.anchors = self.get_anchors()
        print(self.anchors)
        self.net = YoloBody(3, len(self.class_names)).eval()
        self.load_model_pth(self.net, self.model_path)

        if self.cuda:
            self.net = self.net.cuda()
            self.net.eval()

        print('Finished!')

        self.yolo_decodes = []
        anchor_masks = [[0,1,2],[3,4,5],[6,7,8]]
        for i in range(3):
            head = YoloLayer(self.model_image_size, anchor_masks, len(self.class_names),
                                               self.anchors, len(self.anchors)//2).eval()
            self.yolo_decodes.append(head)


        print('{} model, anchors, and classes loaded.'.format(self.model_path))

    def load_model_pth(self, model, pth):
        print('Loading weights into state dict, name: %s' % (pth))
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        model_dict = model.state_dict()
        pretrained_dict = torch.load(pth, map_location=device)
        matched_dict = {}

        with open('pretrained_.txt', 'w') as f:
            for k, v in pretrained_dict.items():
                f.write(k+'\n')
        with open('myparams_.txt', 'w') as f:
            for k, v in model_dict.items():
                f.write(k+'\n')


        for k, v in pretrained_dict.items():
            if np.shape(model_dict[k]) == np.shape(v):
                matched_dict[k] = v
            else:
                print('un matched layers: %s' % k)
        print(len(model_dict.keys()), len(pretrained_dict.keys()))
        print('%d layers matched,  %d layers miss' % (
        len(matched_dict.keys()), len(model_dict) - len(matched_dict.keys())))
        model_dict.update(matched_dict)
        model.load_state_dict(pretrained_dict)
        print('Finished!')
        return model

    # ---------------------------------------------------#
    #   Get all categories
    # ---------------------------------------------------#
    def get_class(self):
        classes_path = os.path.expanduser(self.classes_path)
        with open(classes_path) as f:
            class_names = f.readlines()
        class_names = [c.strip() for c in class_names]
        return class_names

    # ---------------------------------------------------#
    #   Get all a priori boxes
    # ---------------------------------------------------#
    def get_anchors(self):
        anchors_path = os.path.expanduser(self.anchors_path)
        with open(anchors_path) as f:
            anchors = f.readline()
        anchors = [float(x) for x in anchors.split(',')]
        return anchors
        #return np.array(anchors).reshape([-1, 3, 2])[::-1, :, :]


    # ---------------------------------------------------#
    #   Detection picture
    # ---------------------------------------------------#
    def detect_image(self, image_src):
        h, w, _ = image_src.shape
        image = cv2.resize(image_src, (608, 608))
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        img = np.array(image, dtype=np.float32)
        img = np.transpose(img / 255.0, (2, 0, 1))
        images = np.asarray([img])

        with torch.no_grad():
            images = torch.from_numpy(images)
            if self.cuda:
                images = images.cuda()
            outputs = self.net(images)

        output_list = []
        for i in range(3):
            output_list.append(self.yolo_decodes[i](outputs[i]))
        output = torch.cat(output_list, 1)
        print(output.shape)
        batch_detections = non_max_suppression(output, len(self.class_names),
                                               conf_thres=self.confidence,
                                               nms_thres=0.1)
        boxes = [box.cpu().numpy() for box in batch_detections]
        print(boxes[0])
        return boxes[0]


if __name__ == '__main__':
    params = {
        "model_path": 'pth/yolo4_weights_my.pth',
        "anchors_path": 'work_dir/yolo_anchors_coco.txt',
        "classes_path": 'work_dir/coco_classes.txt',
        "model_image_size": (608, 608, 3),
        "confidence": 0.4,
        "cuda": True
    }

    model = Inference(**params)
    class_names = load_class_names(params['classes_path'])
    image_src = cv2.imread('dog.jpg')
    boxes = model.detect_image(image_src)
    plot_boxes_cv2(image_src, boxes, savename='output3.jpg', class_names=class_names)
import torch
import numpy as np
import math
import cv2


def plot_boxes_cv2(img, boxes, savename=None, class_names=None, color=None):
    img = np.copy(img)
    colors = np.array([[1, 0, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 0], [1, 0, 0]], dtype=np.float32)

    def get_color(c, x, max_val):
        ratio = float(x) / max_val * 5
        i = int(math.floor(ratio))
        j = int(math.ceil(ratio))
        ratio = ratio - i
        r = (1 - ratio) * colors[i][c] + ratio * colors[j][c]
        return int(r * 255)

    width = img.shape[1]
    height = img.shape[0]
    for i in range(len(boxes)):
        box = boxes[i]
        x1 = int(box[0] * width)
        y1 = int(box[1] * height)
        x2 = int(box[2] * width)
        y2 = int(box[3] * height)

        if color:
            rgb = color
        else:
            rgb = (255, 0, 0)
        if len(box) >= 7 and class_names:
            cls_conf = box[5]
            cls_id = box[6]
            # print('%s: %f' % (class_names[cls_id], cls_conf))
            classes = len(class_names)
            offset = cls_id * 123457 % classes
            red = get_color(2, offset, classes)
            green = get_color(1, offset, classes)
            blue = get_color(0, offset, classes)
            if color is None:
                rgb = (red, green, blue)
            img = cv2.putText(img, class_names[int(cls_id)], (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1.2, rgb, 2)
        img = cv2.rectangle(img, (x1, y1), (x2, y2), rgb, 3)
    if savename:
        print("save plot results to %s" % savename)
        cv2.imwrite(savename, img)
    return img


def load_class_names(namesfile):
    class_names = []
    with open(namesfile, 'r') as fp:
        lines = fp.readlines()
    for line in lines:
        line = line.rstrip()
        class_names.append(line)
    return class_names


def bbox_iou1(box1, box2, x1y1x2y2=True):
    # print('iou box1:', box1)
    # print('iou box2:', box2)

    if x1y1x2y2:
        mx = min(box1[0], box2[0])
        Mx = max(box1[2], box2[2])
        my = min(box1[1], box2[1])
        My = max(box1[3], box2[3])
        w1 = box1[2] - box1[0]
        h1 = box1[3] - box1[1]
        w2 = box2[2] - box2[0]
        h2 = box2[3] - box2[1]
    else:
        w1 = box1[2]
        h1 = box1[3]
        w2 = box2[2]
        h2 = box2[3]

        mx = min(box1[0], box2[0])
        Mx = max(box1[0] + w1, box2[0] + w2)
        my = min(box1[1], box2[1])
        My = max(box1[1] + h1, box2[1] + h2)
    uw = Mx - mx
    uh = My - my
    cw = w1 + w2 - uw
    ch = h1 + h2 - uh
    carea = 0
    if cw <= 0 or ch <= 0:
        return 0.0

    area1 = w1 * h1
    area2 = w2 * h2
    carea = cw * ch
    uarea = area1 + area2 - carea
    return carea / uarea


def bbox_iou(box1, box2, x1y1x2y2=True):
    """
        calculation IOU
    """
    if not x1y1x2y2:
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]

    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)

    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1e-3, min=0) * \
                 torch.clamp(inter_rect_y2 - inter_rect_y1 + 1e-3, min=0)

    b1_area = (b1_x2 - b1_x1 + 1e-3) * (b1_y2 - b1_y1 + 1e-3)
    b2_area = (b2_x2 - b2_x1 + 1e-3) * (b2_y2 - b2_y1 + 1e-3)

    iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)

    return iou


def non_max_suppression(prediction, num_classes, conf_thres=0.5, nms_thres=0.4):
    # Find the upper left corner and lower right corner
    box_corner = prediction.new(prediction.shape)
    box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
    box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
    box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
    box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
    prediction[:, :, :4] = box_corner[:, :, :4]

    output = [None for _ in range(len(prediction))]
    for image_i, image_pred in enumerate(prediction):
        # First round screening using Object Confidence
        conf_mask = (image_pred[:, 4] >= conf_thres).squeeze()
        image_pred = image_pred[conf_mask]

        if not image_pred.size(0):
            continue

        # Obtain the category and its confidence, obtain the classification confidence value and the corresponding index
        class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)

        # The obtained contents are (x1, y1, x2, y2, obj_conf, class_conf, class_pred)
        detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)

        # Acquisition type
        unique_labels = detections[:, -1].cpu().unique()

        if prediction.is_cuda:
            unique_labels = unique_labels.cuda()

        for c in unique_labels:
            # Obtain all the prediction results after a certain type of preliminary screening
            detections_class = detections[detections[:, -1] == c]
            # Sort according to the confidence of existing objects
            _, conf_sort_index = torch.sort(detections_class[:, 4], descending=True)
            detections_class = detections_class[conf_sort_index]
            # Non maximal inhibition
            max_detections = []
            while detections_class.size(0):
                # Take out the one with the highest confidence and judge step by step to judge whether the coincidence degree is greater than nms_thres, if yes, remove
                max_detections.append(detections_class[0].unsqueeze(0))
                if len(detections_class) == 1:
                    break
                ious = bbox_iou(max_detections[-1], detections_class[1:])
                detections_class = detections_class[1:][ious < nms_thres]

            # Stack
            max_detections = torch.cat(max_detections).data
            # Add max detections to outputs
            output[image_i] = max_detections if output[image_i] is None else torch.cat(
                (output[image_i], max_detections))
    return output

Topics: neural networks Pytorch Computer Vision Deep Learning CV