如何实时检测对象并自动跟踪，而不需要用户在要跟踪的对象周围绘制边界框? [英] How to detect an object real time and track it automatically, instead of user having to draw a bounding box around the object to be tracked?

查看：78 发布时间：2021/4/29 20:48:00 python-3.x opencv deep-learning object-detection video-tracking

本文介绍了如何实时检测对象并自动跟踪，而不需要用户在要跟踪的对象周围绘制边界框?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下代码，用户可以按 p 暂停视频，在要跟踪的对象周围绘制边框，然后按Enter(回车)以在其中跟踪该对象视频供稿:

I have the following code where the user can press p to pause the video, draw a bounding box around the object to be tracked, and then press Enter (carriage return) to track that object in the video feed:

import cv2
import sys

major_ver, minor_ver, subminor_ver = cv2.__version__.split('.')

if __name__ == '__main__' :

    # Set up tracker.
    tracker_types = ['BOOSTING', 'MIL','KCF', 'TLD', 'MEDIANFLOW', 'GOTURN', 'MOSSE', 'CSRT']
    tracker_type = tracker_types[1]

    if int(minor_ver) < 3:
        tracker = cv2.Tracker_create(tracker_type)
    else:
        if tracker_type == 'BOOSTING':
            tracker = cv2.TrackerBoosting_create()
        if tracker_type == 'MIL':
            tracker = cv2.TrackerMIL_create()
        if tracker_type == 'KCF':
            tracker = cv2.TrackerKCF_create()
        if tracker_type == 'TLD':
            tracker = cv2.TrackerTLD_create()
        if tracker_type == 'MEDIANFLOW':
            tracker = cv2.TrackerMedianFlow_create()
        if tracker_type == 'GOTURN':
            tracker = cv2.TrackerGOTURN_create()
        if tracker_type == 'MOSSE':
            tracker = cv2.TrackerMOSSE_create()
        if tracker_type == "CSRT":
            tracker = cv2.TrackerCSRT_create()

    # Read video
    video = cv2.VideoCapture(0) # 0 means webcam. Otherwise if you want to use a video file, replace 0 with "video_file.MOV")

    # Exit if video not opened.
    if not video.isOpened():
        print ("Could not open video")
        sys.exit()

    while True:

        # Read first frame.
        ok, frame = video.read()
        if not ok:
            print ('Cannot read video file')
            sys.exit()
        
        # Retrieve an image and Display it.
        if((0xFF & cv2.waitKey(10))==ord('p')): # Press key `p` to pause the video to start tracking
            break
        cv2.namedWindow("Image", cv2.WINDOW_NORMAL)
        cv2.imshow("Image", frame)
    cv2.destroyWindow("Image");

    # select the bounding box
    bbox = (287, 23, 86, 320)

    # Uncomment the line below to select a different bounding box
    bbox = cv2.selectROI(frame, False)

    # Initialize tracker with first frame and bounding box
    ok = tracker.init(frame, bbox)

    while True:
        # Read a new frame
        ok, frame = video.read()
        if not ok:
            break
        
        # Start timer
        timer = cv2.getTickCount()

        # Update tracker
        ok, bbox = tracker.update(frame)

        # Calculate Frames per second (FPS)
        fps = cv2.getTickFrequency() / (cv2.getTickCount() - timer);

        # Draw bounding box
        if ok:
            # Tracking success
            p1 = (int(bbox[0]), int(bbox[1]))
            p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
            cv2.rectangle(frame, p1, p2, (255,0,0), 2, 1)
        else :
            # Tracking failure
            cv2.putText(frame, "Tracking failure detected", (100,80), cv2.FONT_HERSHEY_SIMPLEX, 0.75,(0,0,255),2)

        # Display tracker type on frame
        cv2.putText(frame, tracker_type + " Tracker", (100,20), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (50,170,50),2);
    
        # Display FPS on frame
        cv2.putText(frame, "FPS : " + str(int(fps)), (100,50), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (50,170,50), 2);

        # Display result
        cv2.imshow("Tracking", frame)

        # Exit if ESC pressed
        k = cv2.waitKey(1) & 0xff
        if k == 27 : break

现在，如何让用户暂停视频并在对象周围绘制边框，而不是让用户暂停视频并在对象周围绘制边框，以便它可以自动检测到我感兴趣的特定对象(在我的情况下为牙刷)是在视频供稿中介绍的，然后对其进行跟踪?

Now, instead of having the user pause the video and draw the bounding box around the object, how do I make it such that it can automatically detect the particular object I am interested in (which is toothbrush in my case) whenever it is introduced in the video feed, and then track it?

我发现

I found this article which talks about how we can detect objects in video using ImageAI and Yolo.

from imageai.Detection import VideoObjectDetection
import os
import cv2

execution_path = os.getcwd()

camera = cv2.VideoCapture(0) 

detector = VideoObjectDetection()
detector.setModelTypeAsYOLOv3()
detector.setModelPath(os.path.join(execution_path , "yolo.h5"))
detector.loadModel()

video_path = detector.detectObjectsFromVideo(camera_input=camera,
                                output_file_path=os.path.join(execution_path, "camera_detected_1")
                                , frames_per_second=29, log_progress=True)
print(video_path)

现在，Yolo确实可以检测到牙刷，它是默认情况下可以检测到的80个奇异物体之一.但是，这篇文章有两点对我来说不是理想的解决方案:

Now, Yolo does detect toothbrush, it is among the 80 odd objects that it can detect by default. However, there are 2 points about this article that makes it not the ideal solution for me:

此方法首先分析每个视频帧(每帧花费大约1-2秒，因此大约需要1分钟来分析来自网络摄像头的2-3秒视频流)，并将检测到的视频保存在单独的视频中文件.而我想实时检测网络摄像头视频提要中的牙刷.有解决方案吗?

This method first analyses each video frame (takes about 1-2 seconds per frame, so about 1 minute to analyse a 2-3 second video stream from the webcam), and saves the detected video in a separate video file. Whereas, I want to detect the toothbrush in the webcam video feed in real time. Is there a solution for this?

使用的Yolo v3模型可以检测到所有80个对象，但我只希望检测到2或3个对象-牙刷，持有牙刷的人和背景(如果需要的话).那么，有没有一种方法可以通过仅选择这两个或三个要检测的对象来减轻模型权重?

The Yolo v3 model being used can detect all 80 objects, but I want only 2 or 3 objects detected - the toothbrush, the person holding the toothbrush and the background possibly, if needed at all. So, is there a way in which I can reduce the model weight by selecting only these 2 or 3 objects to detect?

If you want a quick and easy solution, you can use one of the more lightweight yolo files. You can get the weights and config files (they come in pairs and must be used together) from this website: https://pjreddie.com/darknet/yolo/ (don't worry, it looks sketch but it's fine)

使用较小的网络将获得更高的fps，但准确性也会下降.如果这是您愿意接受的折衷方案，那么这是最简单的操作.

Using a smaller network will get you much higher fps, but also worse accuracy. If that's a tradeoff you're willing to accept then this is the easiest thing to do.

这里有一些检测牙刷的代码.第一个文件只是一个类文件，可以帮助您更无缝地使用Yolo网络.第二个是主要"信息.文件，该文件打开了VideoCapture并将图像馈送到网络.

Here's some code for detecting toothbrushes. The first file is just a class file to help make using the Yolo network more seamless. The second is the "main" file that opens up a VideoCapture and feeds images to the network.

yolo.py

import cv2
import numpy as np

class Yolo:
    def __init__(self, cfg, weights, names, conf_thresh, nms_thresh, use_cuda = False):
        # save thresholds
        self.ct = conf_thresh;
        self.nmst = nms_thresh;

        # create net
        self.net = cv2.dnn.readNet(weights, cfg);
        print("Finished: " + str(weights));
        self.classes = [];
        file = open(names, 'r');
        for line in file:
            self.classes.append(line.strip());

        # use gpu + CUDA to speed up detections
        if use_cuda:
            self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA);
            self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA);

        # get output names
        layer_names = self.net.getLayerNames();
        self.output_layers = [layer_names[i[0]-1] for i in self.net.getUnconnectedOutLayers()];

    # runs detection on the image and draws on it
    def detect(self, img, target_id):
        # get detection stuff
        b, c, ids, idxs = self.get_detection_data(img, target_id);

        # draw result
        img = self.draw(img, b, c, ids, idxs);
        return img, len(idxs);

    # returns boxes, confidences, class_ids, and indexes (indices?)
    def get_detection_data(self, img, target_id):
        # get output
        layer_outputs = self.get_inf(img);

        # get dims
        height, width = img.shape[:2];

        # filter thresholds and target
        b, c, ids, idxs = self.thresh(layer_outputs, width, height, target_id);
        return b, c, ids, idxs;

    # runs the network on an image
    def get_inf(self, img):
        # construct a blob
        blob = cv2.dnn.blobFromImage(img, 1 / 255.0, (416,416), swapRB=True, crop=False);

        # get response
        self.net.setInput(blob);
        layer_outputs = self.net.forward(self.output_layers);
        return layer_outputs;

    # filters the layer output by conf, nms and id
    def thresh(self, layer_outputs, width, height, target_id):
        # some lists
        boxes = [];
        confidences = [];
        class_ids = [];

        # each layer outputs
        for output in layer_outputs:
            for detection in output:
                # get id and confidence
                scores = detection[5:];
                class_id = np.argmax(scores);
                confidence = scores[class_id];

                # filter out low confidence
                if confidence > self.ct and class_id == target_id:
                    # scale bounding box back to the image size
                    box = detection[0:4] * np.array([width, height, width, height]);
                    (cx, cy, w, h) = box.astype('int');

                    # grab the top-left corner of the box
                    tx = int(cx - (w / 2));
                    ty = int(cy - (h / 2));

                    # update lists
                    boxes.append([tx,ty,int(w),int(h)]);
                    confidences.append(float(confidence));
                    class_ids.append(class_id);

        # apply NMS
        idxs = cv2.dnn.NMSBoxes(boxes, confidences, self.ct, self.nmst);
        return boxes, confidences, class_ids, idxs;

    # draw detections on image
    def draw(self, img, boxes, confidences, class_ids, idxs):
        # check for zero
        if len(idxs) > 0:
            # loop over indices
            for i in idxs.flatten():
                # extract the bounding box coords
                (x,y) = (boxes[i][0], boxes[i][1]);
                (w,h) = (boxes[i][2], boxes[i][3]);

                # draw a box
                cv2.rectangle(img, (x,y), (x+w,y+h), (0,0,255), 2);

                # draw text
                text = "{}: {:.4}".format(self.classes[class_ids[i]], confidences[i]);
                cv2.putText(img, text, (x, y-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255), 2);
        return img;

main.py

import cv2
import numpy as np

# this is the "yolo.py" file, I assume it's in the same folder as this program
from yolo import Yolo

# these are the filepaths of the yolo files
weights = "yolov3-tiny.weights";
config = "yolov3-tiny.cfg";
labels = "yolov3.txt";

# init yolo network
target_class_id = 79; # toothbrush
conf_thresh = 0.4; # less == more boxes (but more false positives)
nms_thresh = 0.4; # less == more boxes (but more overlap)
net = Yolo(config, weights, labels, conf_thresh, nms_thresh);

# open video capture
cap = cv2.VideoCapture(0);

# loop
done = False;
while not done:
    # get frame
    ret, frame = cap.read();
    if not ret:
        done = cv2.waitKey(1) == ord('q');
        continue;

    # do detection
    frame, _ = net.detect(frame, target_class_id);

    # show
    cv2.imshow("Marked", frame);
    done = cv2.waitKey(1) == ord('q');

如果您不想使用较轻的重量文件，则有几种选择.

There are a few options for you if you don't want to use a lighter weights file.

如果您具有Nvidia GPU，则可以使用CUDA 大幅提高fps.即使是适度的nvidia gpu，其运行速度也比仅在cpu上运行的速度快几倍.

If you have an Nvidia GPU you can use CUDA to drastically increase your fps. Even modest nvidia gpu's are several times faster than running solely on cpu.

绕过不断运行检测成本的一种常见策略是仅使用它来最初获取目标.您可以使用来自神经网络的检测来初始化您的对象跟踪器，类似于在对象周围绘制边界框的人.对象跟踪器速度更快，并且不需要不断地对每一帧进行全面检测.

A common strategy to bypass the cost of constantly running detection is to only use it to initially acquire a target. You can use the detection from the neural net to initialize your object tracker, similar to a person drawing a bounding box around the object. Object trackers are way faster and there's no need to constantly do a full detection every frame.

如果您在单独的线程中运行Yolo和对象跟踪，则可以以您的相机所能达到的最快速度运行.您需要存储框架的历史记录，以便当Yolo线程完成一个框架时，您可以检查旧框架以查看是否已经在跟踪对象，因此可以在相应的框架上快速启动对象跟踪器-转发它，使其赶上来.该程序并不简单，您需要确保正确管理线程之间的数据.不过，这是使自己适应多线程的好习惯，这是编程的重要一步.

If you run Yolo and object tracking in a separate thread then you can run as fast as your camera is capable of. You'll need to store a history of frames so that when the Yolo thread finishes a frame you can check the old frame to see if you're already tracking the object, and so you can start the object tracker on the corresponding frame and fast-forward it to let it catch up. This program isn't simple and you'll need to make sure you're managing data between your threads correctly. It's a good excercise for becoming comfortable with multithreading though, which is a huge step in programming.

这篇关于如何实时检测对象并自动跟踪，而不需要用户在要跟踪的对象周围绘制边界框?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何实时检测对象并自动跟踪，而不需要用户在要跟踪的对象周围绘制边界框? [英] How to detect an object real time and track it automatically, instead of user having to draw a bounding box around the object to be tracked?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何实时检测对象并自动跟踪，而不需要用户在要跟踪的对象周围绘制边界框? [英] How to detect an object real time and track it automatically, instead of user having to draw a bounding box around the object to be tracked?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭