如何对自定义数据集执行RCNN对象检测? [英] How to perform RCNN object detection on custom dataset?

查看:69
本文介绍了如何对自定义数据集执行RCNN对象检测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在

I'm trying to perform object detection with RCNN on my own dataset following the tutorial on Matlab webpage. Based on the picture below:

我应该将图像路径放在第一列中,并将每个对象的边界框放在接下来的列中.但是在我的每张图像中,每种对象都有一个以上的对象.例如,一张图片中有20辆车.我该如何处理?我应该为图像中的每个车辆实例创建一个单独的行吗?

I'm supposed to put image paths in the first column and the bounding box of each object in the following columns. But in each of my images, there is more than one object of each kind. For example there are 20 vehicles in one image. How should I deal with that? Should I create a separate row for each instance of vehicle in an image?

推荐答案

在网站上找到的示例查找得分最高的像素邻域,并在图像中的该区域周围绘制边框.现在,当您有多个对象时,事情就变得复杂了.您可以使用两种方法来方便地查找多个对象.

The example found on the website finds the pixel neighbourhood with the largest score and draws a bounding box around that region in the image. When you have multiple objects now, that complicates things. There are two approaches that you can use to facilitate finding multiple objects.

  1. 找到分数超过某个整体阈值的所有边界框.
  2. 找到得分最高的边界框,并找到那些超出此阈值百分比的边界框.这个百分比是任意的,但是根据经验以及我在实践中所看到的,人们倾向于在图像中找到的最大分数的80%到95%之间进行选择.如果您将图像作为查询提交,而对象未经训练就不能被分类器检测到,那么这当然会给您带来误报,但是您最终必须实施更多的后处理逻辑.

另一种方法是选择某个值k,然后显示与k最高得分相关的顶部k边界框.当然,这要求您先知道k的值是什么,并且它总是假定您像第二种方法一样在图像中找到了一个对象.

An alternative approach would be to choose some value k and you would display the top k bounding boxes associated with the k highest scores. This of course requires that you know what the value of k is before hand and it will always assume that you have found an object in the image like the second approach.

除上述逻辑外,您声明需要在图像中为车辆的每个实例创建单独的行的方法也是正确的.这意味着,如果单个图像中有一个对象的多个候选对象,则需要在每个实例中引入一行,同时保持图像文件名相同.因此,例如,如果在一个图像中有20辆车,则需要在表中创建20行,文件名都相同,并且该图像中的每个不同对象都有一个单一的边界框指定.

In addition to the above logic, the approach that you state where you need to create a separate row for each instance of vehicle in the image is correct. This means that if you have multiple candidates of an object in a single image, you would need to introduce one row per instance while keeping the image filename the same. Therefore, if you had for example 20 vehicles in one image, you would need to create 20 rows in your table where the filename is all the same and you would have a single bounding box specification for each distinct object in that image.

完成此操作后,假设您已经训练了R-CNN检测器并且想要使用它,则用于检测对象的原始代码是以下引用网站的代码:

Once you have done this, assuming that you have already trained the R-CNN detector and you want to use it, the original code to detect objects is the following referencing the website:

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% Display the detection results
[score, idx] = max(score);

bbox = bboxes(idx, :);
annotation = sprintf('%s: (Confidence = %f)', label(idx), score);

outputImage = insertObjectAnnotation(testImage, 'rectangle', bbox, annotation);

figure
imshow(outputImage)

这仅适用于得分最高的一个对象.如果要对多个对象执行此操作,则可以使用从detect方法输出的score,并找到可以适应情况1或情况2的位置.

This only works for one object which has the highest score. If you wanted to do this for multiple objects, you would use the score that is output from the detect method and find those locations that either accommodate situation 1 or situation 2.

如果您遇到情况1,则可以将其修改为如下所示.

If you had situation 1, you would modify it to look like the following.

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = score >= T;

% Retrieve those scores that surpassed the threshold
s = score(idx);

% Do the same for the labels as well
lbl = label(idx);

bbox = bboxes(idx, :); % This logic doesn't change

% New - Loop through each box and print out its confidence on the image
outputImage = testImage; % Make a copy of the test image to write to
for ii = 1 : size(bbox, 1)
    annotation = sprintf('%s: (Confidence = %f)', lbl(ii), s(ii)); % Change    
    outputImage = insertObjectAnnotation(outputImage, 'rectangle', bbox(ii,:), annotation); % New - Choose the right box
end

figure
imshow(outputImage)

请注意,我已经将原始的边界框,标签和分数存储在它们的原始变量中,而将超出阈值的子集的子集存储在单独的变量中,以防您想在两者之间进行交叉引用.如果要适应情况2,则代码与情况1相同,只是定义了阈值.

Note that I've stored the original bounding boxes, labels and scores in their original variables while the subset of the ones that surpassed the threshold in separate variables in case you want to cross-reference between the two. If you wanted to accommodate for situation 2, the code remains the same as situation 1 with the exception of defining the threshold.

来自以下代码:

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = scores >= T;
% [score, idx] = max(score);

...现在将更改为:

... would now change to:

% New - Find those bounding boxes that surpassed a threshold
perc = 0.85; % 85% of the maximum threshold
T = perc * max(score); % Define threshold here
idx = score >= T;


最终结果将是图像中检测到的对象的多个边界框-每个检测到的对象一个注释.


The end result will be multiple bounding boxes of the detected objects in the image - one annotation per detected object.

这篇关于如何对自定义数据集执行RCNN对象检测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆