HOG+SVM custom object recognition based on OpenCV

Posted by LiamBailey on Sat, 08 Jan 2022 12:26:14 +0100

1. Preface

This case is an online learning course purchased a long time ago. Recently, when I suddenly think of it, I realize to record it. Today, with the gradual flow of deep learning, the application of HOG+SVM custom object recognition may not be very large, but in a fixed scene, if the size of custom objects in the image is relatively constant, this method only needs less data sets to achieve good recognition effect. Evaluating the function of an algorithm should be combined with a specific scene. It is not objective to talk about the function without the scene.

2. HOG features

Difference between HOG and SIFT
Both HOG and SIFT are descriptors. Because there are many similar steps in specific operation, many people mistakenly believe that HOG is a kind of SIFT. In fact, there are great differences between the two in use purpose and specific processing details. The main differences between HOG and SIFT are as follows:
(1)SIFT is a description based on key point feature vector.

(2)HOG is to divide the image evenly into adjacent small blocks, and then count the gradient in all small blocks histogram.
(3)SIFT needs to find the extreme points of pixels in the image scale space, but not in HOG.
(4)SIFT generally has two steps. The first step extracts feature points from the image, while HOG does not extract feature points from the image.

Advantages and disadvantages of HOG:
(1)HOG represents the structural features of the edge (gradient), so it can describe the local shape information; (2) The quantization of position and direction space can suppress the influence of translation and rotation to a certain extent;
(3) The influence of illumination change can be partially offset by normalizing the histogram in the local area;
(4) Because the influence of illumination color on the image is ignored to a certain extent, the dimension of characterization data required by the image is reduced;
(5) Moreover, due to this block and unit processing method, the relationship between local pixels of the image can be well characterized. Disadvantages:
(1) The generation process of descriptor is lengthy, resulting in slow speed and poor real-time performance; (2) It is difficult to deal with occlusion; (3) Due to the nature of gradient, the descriptor is quite sensitive to noise.


3. Case description + code implementation

The training data is divided into positive sample set and negative sample set. The electricity meter in the positive sample set picture basically fills up the image, as shown below:

The negative sample image does not contain the meter image.

When calculating HOG, the positive sample image will be grayed -- resize (64128) -- to calculate the descriptor.

During the detection, the window sliding detection will be carried out on the test diagram, and the window size is (64128).

Let's go directly to the code. Pay attention to the annotation set. This code has many places worth optimizing, such as adding NMS to the prediction results. If more than two meters in the test image can not be detected effectively, but mechanically give the coordinate average value of several targets, we need to find a way to solve it. For example, we can use the idea of IOU intersection and parallel comparison. Data, I will not post resources temporarily.

Code (please note the comments):

// testOpencv.cpp: defines the entry point for the console application.

#include "stdafx.h"
#include <windows.h>
#include <time.h>
#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>

using namespace cv;
using namespace cv::ml;
using namespace std;

//Relative path of positive sample picture
string positive_dir = "elec_watch/positive";

//Negative sample picture relative path
string negative_dir = "elec_watch/negative";

//Calculate HOG descriptor
void get_hog_descriptor(const Mat &image, vector<float> &desc);
//Generate training data sets and labels
void generate_dataset(Mat &trainData, Mat &labels);
//SVM classifier training
void svm_train(Mat &trainData, Mat &labels);

#define		FIXED_WIDTH			64
#define		FIXED_HEIGHT		128

struct PredictResult
	Rect rect;
	float score;

	static bool LessThan(PredictResult a, PredictResult b)
		return a.score < b.score;

int main(int argc, char** argv) 
	// read data and generate dataset
	// How did 3780 come from?
	//The window size is 64 * 128 and the cell size is 8 * 8, so the window is divided into (64 / 8) * (128 / 8) cell blocks,
	//Every 4 (2 * 2) cell blocks are a block, with a total of (64 / 8 - 1) * (128 / 8 - 1) sliding block combinations, and each block generates 36 descriptors,
	//So the total descriptor size is 7 * 15 * 36 = 3780
	//Mat trainData = Mat::zeros(Size((FIXED_WIDTH / 8 - 1)*(FIXED_HEIGHT / 8 - 1)*36, 26), CV_32FC1);//26 pictures (10 positive samples and 16 negative samples)
	//Mat labels = Mat::zeros(Size(1, 26), CV_32SC1);
	//generate_dataset(trainData, labels);
	 SVM train, and save model
	//svm_train(trainData, labels);

	// load model 
	Ptr<SVM> svm = SVM::load("hog_elec.xml");

	//  detect custom object test picture
	Mat test = imread("elec_watch/test/scene_06-02.jpg");

	resize(test, test, Size(0, 0), 0.2, 0.2);
	imshow("input", test);
	Rect winRect;
	winRect.width = FIXED_WIDTH;
	winRect.height = FIXED_HEIGHT;
	int sum_x = 0;
	int sum_y = 0;
	int count = 0;

	vector<PredictResult> predictResultVec;

	// Open (slide) window detection The window size is 64(w)*128(h)
	for (int row = FIXED_HEIGHT / 2; row < test.rows - FIXED_HEIGHT / 2; row += 4)
		for (int col = FIXED_WIDTH / 2; col < test.cols - FIXED_WIDTH / 2; col += 4)
			winRect.x = col - FIXED_WIDTH  / 2;//Top left corner x
			winRect.y = row - FIXED_HEIGHT / 2;//Upper left corner y

			//Get HOG descriptor
			vector<float> fv;
			get_hog_descriptor(test(winRect), fv);

			//Fills the HOG descriptor in the specified format
			Mat one_row = Mat::zeros(Size(fv.size(), 1), CV_32FC1);
			for (int i = 0; i < fv.size(); i++)
				one_row.at<float>(0, i) = fv[i];

			float result = svm->predict(one_row);
			if (result > 0) 
				//rectangle(test, winRect, Scalar(0, 0, 255), 1, 8, 0);
				count += 1;
				sum_x += winRect.x;
				sum_y += winRect.y;
				PredictResult tmp;
				tmp.rect = winRect;
				tmp.score = result;

	sort(predictResultVec.begin(), predictResultVec.end(), PredictResult::LessThan);

	// Display box (average)
	winRect.x = sum_x / count;
	winRect.y = sum_y / count;
	rectangle(test, winRect, Scalar(255, 0, 0), 2, 8, 0);

	Instead, find the largest, not the average
	//rectangle(test, predictResultVec[0].rect, Scalar(255, 0, 0), 2, 8, 0);
	imshow("object detection result", test);
	//imwrite("D:/case02.png", test);
	return 0;

void get_hog_descriptor(const Mat &image, vector<float> &desc) 
	HOGDescriptor hog;
	int h = image.rows;
	int w = image.cols;

	//Ensure scaling without distortion
	float rate = FIXED_WIDTH*1.0 / w;
	Mat img, gray;
	resize(image, img, Size(FIXED_WIDTH, int(rate*h)));

	cvtColor(img, gray, COLOR_BGR2GRAY);
	Mat result = Mat::zeros(Size(FIXED_WIDTH, FIXED_HEIGHT), CV_8UC1);
	result = Scalar(127);
	Rect roi;
	roi.x = 0;
	roi.width = FIXED_WIDTH;
	roi.y = (FIXED_HEIGHT - gray.rows) / 2;
	roi.height = gray.rows;
	hog.compute(result, desc, Size(8, 8), Size(0, 0));
	// printf("desc len : %d \n", desc.size());

void generate_dataset(Mat &trainData, Mat &labels) 
	vector<cv::String> images;

	//The purpose of the function is to_ The file name used under the dir (positive sample) path is saved in images
	glob(positive_dir, images);
	int pos_num = images.size();
	for (int i = 0; i < images.size(); i++) 
		Mat image = imread(images[i].c_str());
		vector<float> fv;
		get_hog_descriptor(image, fv);
		for (int j = 0; j < fv.size(); j++) 
			trainData.at<float>(i, j) = fv[j];
		labels.at<int>(i, 0) = 1;				//Positive sample

	//The purpose of the function is to set negative_ The file name used under the dir (negative sample) path is saved in images
	glob(negative_dir, images);
	for (int i = 0; i < images.size(); i++)
		Mat image = imread(images[i].c_str());
		vector<float> fv;
		get_hog_descriptor(image, fv);
		for (int j = 0; j < fv.size(); j++)
			trainData.at<float>(i + pos_num, j) = fv[j];
		labels.at<int>(i + pos_num, 0) = -1;    //Negative sample

void svm_train(Mat &trainData, Mat &labels) 
	printf("\n start SVM training... \n");
	Ptr<SVM> svm = SVM::create();
	svm->setKernel(SVM::LINEAR);   //Linear SVM kernel
	svm->train(trainData, ROW_SAMPLE, labels);
	clog << "....[Done]" << endl;
	printf("end train...\n");

	// save xml

Topics: OpenCV Machine Learning AI