Object detection practice: pedestrian detection using Opencv built-in method

Posted by Canabalooza on Tue, 21 Dec 2021 10:17:15 +0100

Did you know that OpenCV has a built-in method to perform pedestrian detection?

OpenCV comes with a pre trained HOG + linear SVM model that can be used to perform pedestrian detection in images and video streams.

Today, we use Opencv's own model to detect pedestrians in the video stream. Just open a new file and name it detect Py, and then add the code:

# import the necessary packages
from __future__ import print_function
import numpy as np
import argparse
import cv2
import os

Import the required packages, and then define the methods required by the project.

def nms(boxes, probs=None, overlapThresh=0.3):
    # if there are no boxes, return an empty list
    if len(boxes) == 0:
        return []

    # if the bounding boxes are integers, convert them to floats -- this
    # is important since we'll be doing a bunch of divisions
    if boxes.dtype.kind == "i":
        boxes = boxes.astype("float")

    # initialize the list of picked indexes
    pick = []

    # grab the coordinates of the bounding boxes
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]

    # compute the area of the bounding boxes and grab the indexes to sort
    # (in the case that no probabilities are provided, simply sort on the
    # bottom-left y-coordinate)
    area = (x2 - x1 + 1) * (y2 - y1 + 1)
    idxs = y2

    # if probabilities are provided, sort on them instead
    if probs is not None:
        idxs = probs

    # sort the indexes
    idxs = np.argsort(idxs)

    # keep looping while some indexes still remain in the indexes list
    while len(idxs) > 0:
        # grab the last index in the indexes list and add the index value
        # to the list of picked indexes
        last = len(idxs) - 1
        i = idxs[last]
        pick.append(i)

        # find the largest (x, y) coordinates for the start of the bounding
        # box and the smallest (x, y) coordinates for the end of the bounding
        # box
        xx1 = np.maximum(x1[i], x1[idxs[:last]])
        yy1 = np.maximum(y1[i], y1[idxs[:last]])
        xx2 = np.minimum(x2[i], x2[idxs[:last]])
        yy2 = np.minimum(y2[i], y2[idxs[:last]])

        # compute the width and height of the bounding box
        w = np.maximum(0, xx2 - xx1 + 1)
        h = np.maximum(0, yy2 - yy1 + 1)

        # compute the ratio of overlap
        overlap = (w * h) / area[idxs[:last]]

        # delete all indexes from the index list that have overlap greater
        # than the provided overlap threshold
        idxs = np.delete(idxs, np.concatenate(([last],
                                               np.where(overlap > overlapThresh)[0])))

    # return only the bounding boxes that were picked
    return boxes[pick].astype("int")
image_types = (".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff")
def list_images(basePath, contains=None):
    # return the set of files that are valid
    return list_files(basePath, validExts=image_types, contains=contains)


def list_files(basePath, validExts=None, contains=None):
    # loop over the directory structure
    for (rootDir, dirNames, filenames) in os.walk(basePath):
        # loop over the filenames in the current directory
        for filename in filenames:
            # if the contains string is not none and the filename does not contain
            # the supplied string, then ignore the file
            if contains is not None and filename.find(contains) == -1:
                continue
            # determine the file extension of the current file
            ext = filename[filename.rfind("."):].lower()
            # check to see if the file is an image and should be processed
            if validExts is None or ext.endswith(validExts):
                # construct the path to the image and yield it
                imagePath = os.path.join(rootDir, filename)
                yield imagePath
def resize(image, width=None, height=None, inter=cv2.INTER_AREA):
    dim = None
    (h, w) = image.shape[:2]
    # If the height and width are None, return directly
    if width is None and height is None:
        return image
    # Check if the width is None
    if width is None:
        # Calculate the proportion of height and calculate the width according to the proportion
        r = height / float(h)
        dim = (int(w * r), height)
    # High to None
    else:
        # Calculate the width scale and calculate the height
        r = width / float(w)
        dim = (width, int(h * r))
    resized = cv2.resize(image, dim, interpolation=inter)
    # return the resized image
    return resized

nms function: non maximum suppression.

list_images: read images.

resize: changes the size proportionally.

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", default='test1', help="path to images directory")
args = vars(ap.parse_args())
# Initialize HOG descriptor / person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

Defines the folder path for the input pictures.

Initialize the HOG detector.

# loop over the image paths
for imagePath in list_images(args["images"]):
    # Load and resize the image to
    # (1) Reduce detection time
    # (2) Improve detection accuracy
    image = cv2.imread(imagePath)
    image = resize(image, width=min(400, image.shape[1]))
    orig = image.copy()
    print(image)
    # detect people in the image
    (rects, weights) = hog.detectMultiScale(image, winStride=(4, 4),
                                            padding=(8, 8), scale=1.05)
    # draw the original bounding boxes
    print(rects)
    for (x, y, w, h) in rects:
        cv2.rectangle(orig, (x, y), (x + w, y + h), (0, 0, 255), 2)
    # Non maximum suppression is applied to the bounding box using a considerable overlap threshold in an attempt to maintain the overlapping box that is still human
    rects = np.array([[x, y, x + w, y + h] for (x, y, w, h) in rects])
    pick = nms(rects, probs=None, overlapThresh=0.65)
    # draw the final bounding boxes
    for (xA, yA, xB, yB) in pick:
        cv2.rectangle(image, (xA, yA), (xB, yB), (0, 255, 0), 2)
    # show some information on the number of bounding boxes
    filename = imagePath[imagePath.rfind("/") + 1:]
    print("[INFO] {}: {} original boxes, {} after suppression".format(
        filename, len(rects), len(pick)))
    # show the output images
    cv2.imshow("Before NMS", orig)
    cv2.imshow("After NMS", image)
    cv2.waitKey(0)

Traverse the images in the -- images directory.

Then, adjust the image to a maximum width of 400 pixels. There are two reasons to try to reduce the image size:

  • Reducing the image size can ensure that there are fewer sliding windows in the image pyramid to be evaluated (i.e. extracting the HOG feature from the linear SVM and then passing it to the linear SVM), so as to reduce the detection time (and improve the overall detection throughput).
  • Adjusting our image size also improves the overall accuracy of our pedestrian detection (i.e. less false positives).

The pedestrian in the image is detected by calling the detectMultiScale method of the hog descriptor. The detectMultiScale method constructs an image pyramid with a scale of 1.05, and the sliding window steps are (4,4) pixels in the x and y directions respectively.

The size of the sliding window is fixed at 64 x 128 pixels, as suggested in the groundbreaking Dalal and triggers paper directional gradient histogram for human detection. The detectMultiScale function returns the 2-tuple of rects, or the bounding box (x, y) coordinates and weights of each person in the image. SVM returns the confidence value for each detection.

A larger scale size will evaluate fewer layers in the image pyramid, which can make the algorithm run faster. However, Too big (that is, the number of layers in the image pyramid is small) will make pedestrians undetectable. Similarly, a too small scale will significantly increase the number of image pyramid layers to be evaluated. This will not only cause computational waste, but also significantly increase the number of false positives detected by the pedestrian detector. That is, when performing pedestrian detection, the proportion is one of the most important parameters to be adjusted One. I will review each parameter more thoroughly in future blog posts to detect multi-scale.

Get the initial bounding boxes and draw them on the image.

However, for some images, you will notice that each person detects multiple overlapping bounding boxes.

In this case, we have two options. We can detect whether one bounding box is completely contained in another. Alternatively, we can apply non maximum suppression and suppress bounding boxes that overlap important thresholds.

After applying non maximum suppression, the final bounding box is obtained, and then the image is output.

Operation results:

Before nms:

After nms:


Conclusion:
Compared with the current deep learning methods, the accuracy of machine learning is much lower.

Topics: OpenCV AI Computer Vision