catalogue
① Screening by Object Confidence
1, Overview
Before training, we first take the pre training data and enter a diagram to see what the specific test process is. Here, I'd like to show the used code in blocks, and finally put it together.
2, Test process
1. Parameter preparation
In the class, we need to define the model, pre training parameters, loading functions, decoding layer, and other methods that need to be used. We can prepare keyword parameters in advance and input them into the model.
The first three are pre training parameters, anchor box size and file paths of all categories. model_image_size is the size of the input picture, confidence is the threshold used to filter the confidence of object detection. Only when the confidence is greater than this threshold will it be left. cuda is the device parameter.
params = { "model_path": 'pth/yolo4_weights_my.pth', "anchors_path": 'work_dir/yolo_anchors_coco.txt', "classes_path": 'work_dir/coco_classes.txt', "model_image_size": (608, 608, 3), "confidence": 0.4, "cuda": True } model = Inference(**params)
2. Define the model
After preparing the parameters, we define the influence class as the model. In this class, we need to import the pre training parameters and initialize the yolo model and yolo decoding model that will be used.
YoloBody is the backbone+neck+head of our model. We only need to provide the number of input channels and the number of output categories. The final output should be three, namely (1255,19,19), (1255,38,38), (1255,76,76).
255 in the middle is 3 * (4 + 1 + 80), because we have 80 categories, which have been recorded in the previous article, and the details will not be recorded here. We will decode these three.
Yolo layer is our decoding layer. There are records in front, too__ init__ We can complete the initialization by putting in the required parameters. So we're here__ init__ Inside, I got self Net, and self yolo_ Decodes, we will use these two models to run testing and decoding. We just need to define functions below to call them.
Contents to be filled in: picture size, anchor frame mask, used to filter anchor frames, number of categories, a priori anchor frame size, a priori anchor frame number and scaling factor.
class Inference(object): # ---------------------------------------------------# # Initialize the model and parameters and import the trained weights # ---------------------------------------------------# def __init__(self, **kwargs): self.model_path = kwargs['model_path'] self.anchors_path = kwargs['anchors_path'] self.classes_path = kwargs['classes_path'] self.model_image_size = kwargs['model_image_size'] self.confidence = kwargs['confidence'] self.cuda = kwargs['cuda'] self.class_names = self.get_class() self.anchors = self.get_anchors() print(self.anchors) # =================Here is the initialization model self.net = YoloBody(3, len(self.class_names)).eval() self.load_model_pth(self.net, self.model_path) if self.cuda: self.net = self.net.cuda() self.net.eval() print('Finished!') self.yolo_decodes = [] anchor_masks = [[0,1,2],[3,4,5],[6,7,8]] # =================Here is the initialization decoding part. Since there are three outputs, three decoding models are required for i in range(3): head = YoloLayer(self.model_image_size, anchor_masks, len(self.class_names), self.anchors, len(self.anchors)//2).eval() self.yolo_decodes.append(head) print('{} model, anchors, and classes loaded.'.format(self.model_path))
3. Obtain necessary data
Since you want to test, you must input the picture data and process the picture. For example, the size is changed to 608 * 608 or other multiples of 3 required by the model (because the full connection layer is not used in the model, but the full connection layer is replaced by full convolution, so there is no fixed requirement for the size).
To predict categories, we need to prepare category names in advance.
The following two functions are used to obtain category data and picture data respectively. Picture data is used to input the model for prediction, and category data is used for filtering and labeling.
def load_class_names(namesfile): class_names = [] with open(namesfile, 'r') as fp: lines = fp.readlines() for line in lines: line = line.rstrip() class_names.append(line) return class_names def detect_image(self, image_src): h, w, _ = image_src.shape image = cv2.resize(image_src, (608, 608)) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) img = np.array(image, dtype=np.float32) img = np.transpose(img / 255.0, (2, 0, 1)) images = np.asarray([img])
4. Input model
with torch.no_grad(): images = torch.from_numpy(images) if self.cuda: images = images.cuda() outputs = self.net(images)
With the output result, you can input the decoding module to obtain all anchor box information. After getting the anchor box information, we put all together. Therefore, there is only one: (1, 22743, 85). There are more than 20000 anchor boxes and the information they carry, which should be screened from here.
output_list = [] for i in range(3): output_list.append(self.yolo_decodes[i](outputs[i])) output = torch.cat(output_list, 1) print(output.shape)
5. Anchor frame screening
① Screening by Object Confidence
Object confidence is the number with index 4 in the last dimension. It is used to compare with the set threshold. Here, 0.5 is set, and only the information of the anchor box with object confidence greater than 0.5 is retained. After filtering here, there are only 17 anchor boxes left, so the image here_ The pred shape is (17, 85).
def non_max_suppression(prediction, num_classes, conf_thres=0.5, nms_thres=0.4): # Find the upper left corner and lower right corner box_corner = prediction.new(prediction.shape) box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2 box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2 box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2 box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2 prediction[:, :, :4] = box_corner[:, :, :4] output = [None for _ in range(len(prediction))] for image_i, image_pred in enumerate(prediction): # First round screening using Object Confidence conf_mask = (image_pred[:, 4] >= conf_thres).squeeze() # ================Get the filtered anchor box image_pred = image_pred[conf_mask] if not image_pred.size(0): continue # Obtain the category and its confidence, obtain the classification confidence value and the corresponding index class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)
② Get type
Multiple objects may be predicted in a picture, that is, there will be multiple categories corresponding to multiple anchor boxes. Therefore, we need to obtain all categories contained in the predicted content, and filter the anchor boxes for each category again.
In the last line of the above code, the maximum value of all categories is obtained through torch Max, you can get the maximum value and index, that is, we get the "category index class_pred with the largest category prediction score" carried in the 17 anchor boxes and the corresponding confidence class_conf.
Here, the obtained indexes and scores are spliced with the previous anchor box information to obtain a 7-dimensional vector. According to the above code, we have replaced the values of the first four dimensions with X1 and Y1 coordinates in the upper left corner and X2 and Y2 coordinates in the lower right corner. Therefore, what we get here is (x1,y1, x2,y2, obj_conf, class_conf, class_pred).
And the last dimension is class_pred is the index of the category. If you remove a unique to it, you will get all the prediction categories. There are three categories, namely (1,7,16).
# The obtained contents are (x1, y1, x2, y2, obj_conf, class_conf, class_pred) detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1) # Acquisition type unique_labels = detections[:, -1].cpu().unique() if prediction.is_cuda: unique_labels = unique_labels.cuda()
③ Sort by Object Confidence
First get all the prediction results of a certain class, and then put the object existence confidence here in order. Through torch After sort, we get the sorted confidence and their index. We use their index to sort all the prediction results of this category.
Here, there are four prediction results for the first type, so the shape is (4,7).
for c in unique_labels: # Obtain all the prediction results after a certain type of preliminary screening detections_class = detections[detections[:, -1] == c] # Sort according to the confidence of existing objects _, conf_sort_index = torch.sort(detections_class[:, 4], descending=True) detections_class = detections_class[conf_sort_index]
④ Non maximal inhibition
These four boxes are all boxes that predict the same category, so there must be redundant boxes. Therefore, take the box with the highest confidence first, and calculate the IOU for the next three. We set nms_thresh is 0.4. As long as it is greater than this threshold, it indicates that it is redundant. Remove it and only keep the largest box. If it is smaller than this, it means that the intersection of the two boxes is very small, so it is reserved.
Select the final anchor box and relevant information, and put it into the output. The output is (3,7), that is, three categories, and the anchor box information corresponding to each category.
max_detections = [] while detections_class.size(0): # Take out the one with the highest confidence and judge step by step to judge whether the coincidence degree is greater than nms_thres, if yes, remove max_detections.append(detections_class[0].unsqueeze(0)) if len(detections_class) == 1: break ious = bbox_iou(max_detections[-1], detections_class[1:]) detections_class = detections_class[1:][ious < nms_thres] # Stack max_detections = torch.cat(max_detections).data # Add max detections to outputs output[image_i] = max_detections if output[image_i] is None else torch.cat( (output[image_i], max_detections))
IOU calculation process:
We already have x1,y1 in the upper left corner and x2,y2 in the lower right corner.
Because we want to calculate the area of the intersection, and the origin of the coordinate system is in the upper left corner, we take the maximum value of the upper left corner of the two bounding boxes and the minimum value of the lower right corner, so we get the upper left corner and lower right corner of the intersection.
However, if you directly subtract the upper left corner and lower right corner of the intersection, negative numbers may appear, so use torch Clamp specifies that the minimum value of subtraction is 0, that is, if the maximum value of one box x minus the minimum value of the other box x is a negative number and the two boxes do not intersect, the result is 0. The same goes for y.
Multiply the two to get the area of the intersection box. The area of union is the superposition of the two areas minus the area of intersection.
Intersection / Union, so you get the IOU values of the first box and all three subsequent boxes.
def bbox_iou(box1, box2, x1y1x2y2=True): """ calculation IOU """ if not x1y1x2y2: b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2 b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2 b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2 b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2 else: b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3] b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3] inter_rect_x1 = torch.max(b1_x1, b2_x1) inter_rect_y1 = torch.max(b1_y1, b2_y1) inter_rect_x2 = torch.min(b1_x2, b2_x2) inter_rect_y2 = torch.min(b1_y2, b2_y2) inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1e-3, min=0) * \ torch.clamp(inter_rect_y2 - inter_rect_y1 + 1e-3, min=0) b1_area = (b1_x2 - b1_x1 + 1e-3) * (b1_y2 - b1_y1 + 1e-3) b2_area = (b2_x2 - b2_x1 + 1e-3) * (b2_y2 - b2_y1 + 1e-3) iou = inter_area / (b1_area + b2_area - inter_area + 1e-16) return iou
6. Draw the external frame
Pass the picture information, anchor box information, category name and saved file name into the function to draw and save the external box through cv2 related interfaces.
def plot_boxes_cv2(img, boxes, savename=None, class_names=None, color=None): img = np.copy(img) colors = np.array([[1, 0, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 0], [1, 0, 0]], dtype=np.float32) def get_color(c, x, max_val): ratio = float(x) / max_val * 5 i = int(math.floor(ratio)) j = int(math.ceil(ratio)) ratio = ratio - i r = (1 - ratio) * colors[i][c] + ratio * colors[j][c] return int(r * 255) width = img.shape[1] height = img.shape[0] for i in range(len(boxes)): box = boxes[i] x1 = int(box[0] * width) y1 = int(box[1] * height) x2 = int(box[2] * width) y2 = int(box[3] * height) if color: rgb = color else: rgb = (255, 0, 0) if len(box) >= 7 and class_names: cls_conf = box[5] cls_id = box[6] # print('%s: %f' % (class_names[cls_id], cls_conf)) classes = len(class_names) offset = cls_id * 123457 % classes red = get_color(2, offset, classes) green = get_color(1, offset, classes) blue = get_color(0, offset, classes) if color is None: rgb = (red, green, blue) img = cv2.putText(img, class_names[int(cls_id)], (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1.2, rgb, 2) img = cv2.rectangle(img, (x1, y1), (x2, y2), rgb, 3) if savename: print("save plot results to %s" % savename) cv2.imwrite(savename, img) return img
3, Code summary
So far, we have completed the testing process. All the code is divided into two parts, model and tool. as follows
class Inference(object): # ---------------------------------------------------# # Initialize the model and parameters and import the trained weights # ---------------------------------------------------# def __init__(self, **kwargs): self.model_path = kwargs['model_path'] self.anchors_path = kwargs['anchors_path'] self.classes_path = kwargs['classes_path'] self.model_image_size = kwargs['model_image_size'] self.confidence = kwargs['confidence'] self.cuda = kwargs['cuda'] self.class_names = self.get_class() self.anchors = self.get_anchors() print(self.anchors) self.net = YoloBody(3, len(self.class_names)).eval() self.load_model_pth(self.net, self.model_path) if self.cuda: self.net = self.net.cuda() self.net.eval() print('Finished!') self.yolo_decodes = [] anchor_masks = [[0,1,2],[3,4,5],[6,7,8]] for i in range(3): head = YoloLayer(self.model_image_size, anchor_masks, len(self.class_names), self.anchors, len(self.anchors)//2).eval() self.yolo_decodes.append(head) print('{} model, anchors, and classes loaded.'.format(self.model_path)) def load_model_pth(self, model, pth): print('Loading weights into state dict, name: %s' % (pth)) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model_dict = model.state_dict() pretrained_dict = torch.load(pth, map_location=device) matched_dict = {} with open('pretrained_.txt', 'w') as f: for k, v in pretrained_dict.items(): f.write(k+'\n') with open('myparams_.txt', 'w') as f: for k, v in model_dict.items(): f.write(k+'\n') for k, v in pretrained_dict.items(): if np.shape(model_dict[k]) == np.shape(v): matched_dict[k] = v else: print('un matched layers: %s' % k) print(len(model_dict.keys()), len(pretrained_dict.keys())) print('%d layers matched, %d layers miss' % ( len(matched_dict.keys()), len(model_dict) - len(matched_dict.keys()))) model_dict.update(matched_dict) model.load_state_dict(pretrained_dict) print('Finished!') return model # ---------------------------------------------------# # Get all categories # ---------------------------------------------------# def get_class(self): classes_path = os.path.expanduser(self.classes_path) with open(classes_path) as f: class_names = f.readlines() class_names = [c.strip() for c in class_names] return class_names # ---------------------------------------------------# # Get all a priori boxes # ---------------------------------------------------# def get_anchors(self): anchors_path = os.path.expanduser(self.anchors_path) with open(anchors_path) as f: anchors = f.readline() anchors = [float(x) for x in anchors.split(',')] return anchors #return np.array(anchors).reshape([-1, 3, 2])[::-1, :, :] # ---------------------------------------------------# # Detection picture # ---------------------------------------------------# def detect_image(self, image_src): h, w, _ = image_src.shape image = cv2.resize(image_src, (608, 608)) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) img = np.array(image, dtype=np.float32) img = np.transpose(img / 255.0, (2, 0, 1)) images = np.asarray([img]) with torch.no_grad(): images = torch.from_numpy(images) if self.cuda: images = images.cuda() outputs = self.net(images) output_list = [] for i in range(3): output_list.append(self.yolo_decodes[i](outputs[i])) output = torch.cat(output_list, 1) print(output.shape) batch_detections = non_max_suppression(output, len(self.class_names), conf_thres=self.confidence, nms_thres=0.1) boxes = [box.cpu().numpy() for box in batch_detections] print(boxes[0]) return boxes[0] if __name__ == '__main__': params = { "model_path": 'pth/yolo4_weights_my.pth', "anchors_path": 'work_dir/yolo_anchors_coco.txt', "classes_path": 'work_dir/coco_classes.txt', "model_image_size": (608, 608, 3), "confidence": 0.4, "cuda": True } model = Inference(**params) class_names = load_class_names(params['classes_path']) image_src = cv2.imread('dog.jpg') boxes = model.detect_image(image_src) plot_boxes_cv2(image_src, boxes, savename='output3.jpg', class_names=class_names)
import torch import numpy as np import math import cv2 def plot_boxes_cv2(img, boxes, savename=None, class_names=None, color=None): img = np.copy(img) colors = np.array([[1, 0, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 0], [1, 0, 0]], dtype=np.float32) def get_color(c, x, max_val): ratio = float(x) / max_val * 5 i = int(math.floor(ratio)) j = int(math.ceil(ratio)) ratio = ratio - i r = (1 - ratio) * colors[i][c] + ratio * colors[j][c] return int(r * 255) width = img.shape[1] height = img.shape[0] for i in range(len(boxes)): box = boxes[i] x1 = int(box[0] * width) y1 = int(box[1] * height) x2 = int(box[2] * width) y2 = int(box[3] * height) if color: rgb = color else: rgb = (255, 0, 0) if len(box) >= 7 and class_names: cls_conf = box[5] cls_id = box[6] # print('%s: %f' % (class_names[cls_id], cls_conf)) classes = len(class_names) offset = cls_id * 123457 % classes red = get_color(2, offset, classes) green = get_color(1, offset, classes) blue = get_color(0, offset, classes) if color is None: rgb = (red, green, blue) img = cv2.putText(img, class_names[int(cls_id)], (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1.2, rgb, 2) img = cv2.rectangle(img, (x1, y1), (x2, y2), rgb, 3) if savename: print("save plot results to %s" % savename) cv2.imwrite(savename, img) return img def load_class_names(namesfile): class_names = [] with open(namesfile, 'r') as fp: lines = fp.readlines() for line in lines: line = line.rstrip() class_names.append(line) return class_names def bbox_iou1(box1, box2, x1y1x2y2=True): # print('iou box1:', box1) # print('iou box2:', box2) if x1y1x2y2: mx = min(box1[0], box2[0]) Mx = max(box1[2], box2[2]) my = min(box1[1], box2[1]) My = max(box1[3], box2[3]) w1 = box1[2] - box1[0] h1 = box1[3] - box1[1] w2 = box2[2] - box2[0] h2 = box2[3] - box2[1] else: w1 = box1[2] h1 = box1[3] w2 = box2[2] h2 = box2[3] mx = min(box1[0], box2[0]) Mx = max(box1[0] + w1, box2[0] + w2) my = min(box1[1], box2[1]) My = max(box1[1] + h1, box2[1] + h2) uw = Mx - mx uh = My - my cw = w1 + w2 - uw ch = h1 + h2 - uh carea = 0 if cw <= 0 or ch <= 0: return 0.0 area1 = w1 * h1 area2 = w2 * h2 carea = cw * ch uarea = area1 + area2 - carea return carea / uarea def bbox_iou(box1, box2, x1y1x2y2=True): """ calculation IOU """ if not x1y1x2y2: b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2 b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2 b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2 b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2 else: b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3] b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3] inter_rect_x1 = torch.max(b1_x1, b2_x1) inter_rect_y1 = torch.max(b1_y1, b2_y1) inter_rect_x2 = torch.min(b1_x2, b2_x2) inter_rect_y2 = torch.min(b1_y2, b2_y2) inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1e-3, min=0) * \ torch.clamp(inter_rect_y2 - inter_rect_y1 + 1e-3, min=0) b1_area = (b1_x2 - b1_x1 + 1e-3) * (b1_y2 - b1_y1 + 1e-3) b2_area = (b2_x2 - b2_x1 + 1e-3) * (b2_y2 - b2_y1 + 1e-3) iou = inter_area / (b1_area + b2_area - inter_area + 1e-16) return iou def non_max_suppression(prediction, num_classes, conf_thres=0.5, nms_thres=0.4): # Find the upper left corner and lower right corner box_corner = prediction.new(prediction.shape) box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2 box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2 box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2 box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2 prediction[:, :, :4] = box_corner[:, :, :4] output = [None for _ in range(len(prediction))] for image_i, image_pred in enumerate(prediction): # First round screening using Object Confidence conf_mask = (image_pred[:, 4] >= conf_thres).squeeze() image_pred = image_pred[conf_mask] if not image_pred.size(0): continue # Obtain the category and its confidence, obtain the classification confidence value and the corresponding index class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True) # The obtained contents are (x1, y1, x2, y2, obj_conf, class_conf, class_pred) detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1) # Acquisition type unique_labels = detections[:, -1].cpu().unique() if prediction.is_cuda: unique_labels = unique_labels.cuda() for c in unique_labels: # Obtain all the prediction results after a certain type of preliminary screening detections_class = detections[detections[:, -1] == c] # Sort according to the confidence of existing objects _, conf_sort_index = torch.sort(detections_class[:, 4], descending=True) detections_class = detections_class[conf_sort_index] # Non maximal inhibition max_detections = [] while detections_class.size(0): # Take out the one with the highest confidence and judge step by step to judge whether the coincidence degree is greater than nms_thres, if yes, remove max_detections.append(detections_class[0].unsqueeze(0)) if len(detections_class) == 1: break ious = bbox_iou(max_detections[-1], detections_class[1:]) detections_class = detections_class[1:][ious < nms_thres] # Stack max_detections = torch.cat(max_detections).data # Add max detections to outputs output[image_i] = max_detections if output[image_i] is None else torch.cat( (output[image_i], max_detections)) return output