As an undergraduate majoring in artificial intelligence, I came into contact with face detection in my freshman study. And with the help of teachers, successfully use the code to recognize and train a series of face pictures. Now I want to share the problems and training results I encountered in the process of training the model.
Face Detection and face recognition technology are Deep learning One of the important applications of.
Face detection is very common in today's society. Face detection is very easy for us. For the needs of social life, we have a special face detection module in our brain, which is very sensitive to faces. However, computer face detection is a relatively difficult problem. Although the structure of human face is determined, which is composed of eyebrows, eyes, nose and mouth, and is approximately a rigid body, due to the changes of posture and expression, the appearance differences of different people, the influence of illumination and occlusion, it is relatively difficult to accurately detect human face under various conditions by computer.
Under the epidemic situation, masks have become necessary for people to go out. It is very important to recognize faces when wearing masks. Especially in the dense places such as shopping malls, it is unrealistic for you to ask customers to take off their masks in public places. People will certainly refuse. So how to recognize faces with masks? The biggest difficulty here is how to detect the face area blocked by obstacles.
Reference source code link:
code:
""" This code uses the pytorch model to detect faces from live video or camera. """ import argparse import sys import cv2 from vision.ssd.config.fd_config import define_img_size parser = argparse.ArgumentParser( description='detect_video') parser.add_argument('--net_type', default="RFB", type=str, help='The network architecture ,optional: RFB (higher precision) or slim (faster)') parser.add_argument('--input_size', default=480, type=int, help='define network input size,default optional value 128/160/320/480/640/1280') parser.add_argument('--threshold', default=0.7, type=float, help='score threshold') parser.add_argument('--candidate_size', default=1000, type=int, help='nms candidate size') parser.add_argument('--path', default="imgs", type=str, help='imgs dir') parser.add_argument('--test_device', default="cpu", type=str, help='cuda:0 or cpu') parser.add_argument('--video_path', default="/home/linzai/Videos/video/16_1.MP4", type=str, help='path of video') args = parser.parse_args() input_img_size = args.input_size define_img_size(input_img_size) # must put define_img_size() before 'import create_mb_tiny_fd, create_mb_tiny_fd_predictor' from vision.ssd.mb_tiny_fd import create_mb_tiny_fd, create_mb_tiny_fd_predictor from vision.ssd.mb_tiny_RFB_fd import create_Mb_Tiny_RFB_fd, create_Mb_Tiny_RFB_fd_predictor from vision.utils.misc import Timer label_path = "./models/voc-model-labels.txt" net_type = args.net_type #cap = cv2.VideoCapture(args.video_path) # capture from video cap = cv2.VideoCapture(0) # capture from camera class_names = [name.strip() for name in open(label_path).readlines()] num_classes = len(class_names) test_device = args.test_device candidate_size = args.candidate_size threshold = args.threshold if net_type == 'slim': model_path = "models/pretrained/version-slim-320.pth" # model_path = "models/pretrained/version-slim-640.pth" net = create_mb_tiny_fd(len(class_names), is_test=True, device=test_device) predictor = create_mb_tiny_fd_predictor(net, candidate_size=candidate_size, device=test_device) elif net_type == 'RFB': model_path = "models/pretrained/version-RFB-320.pth" # model_path = "models/pretrained/version-RFB-640.pth" net = create_Mb_Tiny_RFB_fd(len(class_names), is_test=True, device=test_device) predictor = create_Mb_Tiny_RFB_fd_predictor(net, candidate_size=candidate_size, device=test_device) else: print("The net type is wrong!") sys.exit(1) net.load(model_path) timer = Timer() sum = 0 while True: ret, orig_image = cap.read() if orig_image is None: print("end") break image = cv2.cvtColor(orig_image, cv2.COLOR_BGR2RGB) timer.start() boxes, labels, probs = predictor.predict(image, candidate_size / 2, threshold) interval = timer.end() print('Time: {:.6f}s, Detect Objects: {:d}.'.format(interval, labels.size(0))) for i in range(boxes.size(0)): box = boxes[i, :] label = f" {probs[i]:.2f}" cv2.rectangle(orig_image, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (0, 255, 0), 4) # cv2.putText(orig_image, label, # (box[0], box[1] - 10), # cv2.FONT_HERSHEY_SIMPLEX, # 0.5, # font scale # (0, 0, 255), # 2) # line type orig_image = cv2.resize(orig_image, None, None, fx=0.8, fy=0.8) sum += boxes.size(0) cv2.imshow('annotated', orig_image) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows() print("all face num:{}".format(sum))
1: Preparatory work
Preparation in advance:
Install Python 3
Configure and set up the virtual environment for Python and python development environment under windows
Download and install OpenCV Library
Download the above code and training data set
2: Run
First of all, I choose a lightweight face detection model. Its feature is that the model size is very small and the speed is very fast. Try the program first after the new project
cuda may report errors. In this case, change 'cuda: 0' to cpu
'CV2' may occur Error: opencv (4.5.4): - 1: '', put CV2 box() in rectangle (orig_image, (int (box [0]), int (box [1]), (int (box [2]), int (box [3]), (0, 255, 0), 4) can be changed to integer type
3, Experimental results
Face detection