如何在自定义数据集上执行 RCNN 对象检测? [英] How to perform RCNN object detection on custom dataset?

查看：19 发布时间：2021/12/27 17:04:58 matlab machine-learning computer-vision deep-learning object-detection

本文介绍了如何在自定义数据集上执行 RCNN 对象检测?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 RCNN 在我自己的数据集上执行对象检测

我应该将图像路径放在第一列中，并将每个对象的边界框放在以下列中.但是在我的每幅图像中，每种类型都有不止一个物体.例如，一张图像中有 20 辆车.我该如何处理?我应该为图像中的每个车辆实例创建一个单独的行吗?

解决方案

在网站上找到的示例找到得分最高的像素邻域，并在图像中围绕该区域绘制一个边界框.当您现在有多个对象时，这会使事情变得复杂.您可以使用两种方法来方便地查找多个对象.

查找得分超过某个全局阈值的所有边界框.
找到得分最高的边界框，并找到超过该阈值百分比的边界框.这个百分比是任意的，但根据经验和我在实践中看到的，人们倾向于在图像中找到的最大分数的 80% 到 95% 之间进行选择.如果您提交的图像作为查询对象未经过训练以被分类器检测到，这当然会给您带来误报，但您必须在自己的一端实施更多的后处理逻辑.

另一种方法是选择某个值 k，然后您将显示与 k 最高分相关的顶部 k 边界框.这当然需要您事先知道 k 的值是什么，并且它总是假设您已经像第二种方法一样在图像中找到了一个对象.

<小时>

除了上述逻辑之外，您声明需要为图像中的每个车辆实例创建单独行的方法是正确的.这意味着如果您在单个图像中有多个候选对象，则需要为每个实例引入一行，同时保持图像文件名相同.因此，例如，如果您在一张图像中有 20 辆车，则需要在表中创建 20 行，其中文件名都相同，并且该图像中的每个不同对象都有一个边界框规范.

完成此操作后，假设您已经训练了 R-CNN 检测器并想使用它，那么原始检测对象的代码如下所示:

% 读取测试图像testImage = imread('stopSignTest.jpg');% 检测停车标志[bboxes，分数，标签] = 检测(rcnn，testImage，'MiniBatchSize'，128)% 显示检测结果[分数，idx] = 最大值(分数)；bbox = bboxes(idx, :);annotation = sprintf('%s: (Confidence = %f)', label(idx), score);outputImage = insertObjectAnnotation(testImage, 'rectangle', bbox, annotation);数字imshow(输出图像)

这仅适用于得分最高的一个对象.如果您想对多个对象执行此操作，您可以使用 detect 方法输出的 score 并找到那些适合情况 1 或情况 2 的位置.

如果您遇到情况 1，您可以将其修改为如下所示.

% 读取测试图像testImage = imread('stopSignTest.jpg');% 检测停车标志[bboxes，分数，标签] = 检测(rcnn，testImage，'MiniBatchSize'，128)% New - 找到那些超过阈值的边界框T = 0.7；% 在此处定义阈值idx = 分数 >= T;% 检索那些超过阈值的分数s = 分数(idx)；% 对标签也做同样的事情lbl = 标签(idx)；bbox = bboxes(idx, :);% 这个逻辑不变% New - 循环遍历每个框并打印出其对图像的置信度输出图像 = 测试图像；% 复制要写入的测试图像对于 ii = 1 :大小(bbox，1)annotation = sprintf('%s: (Confidence = %f)', lbl(ii), s(ii));％ 改变outputImage = insertObjectAnnotation(outputImage, 'rectangle', bbox(ii,:), annotation);% New - 选择正确的框结尾数字imshow(输出图像)

请注意，我已将原始边界框、标签和分数存储在其原始变量中，而将超过阈值的那些子集存储在单独的变量中，以防您想在两者之间进行交叉引用.如果您想适应情况 2，则代码与情况 1 相同，只是定义了阈值.

代码来自:

% New - 找到那些超过阈值的边界框T = 0.7；% 在此处定义阈值idx = 分数 >= T;% [score, idx] = max(score);

... 现在将更改为:

% New - 找到那些超过阈值的边界框perc = 0.85;% 最大阈值的 85%T = perc * max(score);% 在此处定义阈值idx = 分数 >= T;

<小时>

最终结果将是图像中检测到的对象的多个边界框 - 每个检测到的对象一个注释.

I'm trying to perform object detection with RCNN on my own dataset following the tutorial on Matlab webpage. Based on the picture below:

I'm supposed to put image paths in the first column and the bounding box of each object in the following columns. But in each of my images, there is more than one object of each kind. For example there are 20 vehicles in one image. How should I deal with that? Should I create a separate row for each instance of vehicle in an image?

解决方案

The example found on the website finds the pixel neighbourhood with the largest score and draws a bounding box around that region in the image. When you have multiple objects now, that complicates things. There are two approaches that you can use to facilitate finding multiple objects.

Find all bounding boxes with scores that surpass some global threshold.
Find the bounding box with the largest score and find those bounding boxes that surpass a percentage of this threshold. This percentage is arbitrary but from experience and what I have seen in practice, people tend to choose between 80% to 95% of the largest score found in the image. This will of course give you false positives if you submit an image as the query with objects not trained to be detected by the classifier but you will have to implement some more post-processing logic on your end.

An alternative approach would be to choose some value k and you would display the top k bounding boxes associated with the k highest scores. This of course requires that you know what the value of k is before hand and it will always assume that you have found an object in the image like the second approach.

In addition to the above logic, the approach that you state where you need to create a separate row for each instance of vehicle in the image is correct. This means that if you have multiple candidates of an object in a single image, you would need to introduce one row per instance while keeping the image filename the same. Therefore, if you had for example 20 vehicles in one image, you would need to create 20 rows in your table where the filename is all the same and you would have a single bounding box specification for each distinct object in that image.

Once you have done this, assuming that you have already trained the R-CNN detector and you want to use it, the original code to detect objects is the following referencing the website:

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% Display the detection results
[score, idx] = max(score);

bbox = bboxes(idx, :);
annotation = sprintf('%s: (Confidence = %f)', label(idx), score);

outputImage = insertObjectAnnotation(testImage, 'rectangle', bbox, annotation);

figure
imshow(outputImage)

This only works for one object which has the highest score. If you wanted to do this for multiple objects, you would use the score that is output from the detect method and find those locations that either accommodate situation 1 or situation 2.

If you had situation 1, you would modify it to look like the following.

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = score >= T;

% Retrieve those scores that surpassed the threshold
s = score(idx);

% Do the same for the labels as well
lbl = label(idx);

bbox = bboxes(idx, :); % This logic doesn't change

% New - Loop through each box and print out its confidence on the image
outputImage = testImage; % Make a copy of the test image to write to
for ii = 1 : size(bbox, 1)
    annotation = sprintf('%s: (Confidence = %f)', lbl(ii), s(ii)); % Change    
    outputImage = insertObjectAnnotation(outputImage, 'rectangle', bbox(ii,:), annotation); % New - Choose the right box
end

figure
imshow(outputImage)

Note that I've stored the original bounding boxes, labels and scores in their original variables while the subset of the ones that surpassed the threshold in separate variables in case you want to cross-reference between the two. If you wanted to accommodate for situation 2, the code remains the same as situation 1 with the exception of defining the threshold.

The code from:

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = scores >= T;
% [score, idx] = max(score);

... would now change to:

% New - Find those bounding boxes that surpassed a threshold
perc = 0.85; % 85% of the maximum threshold
T = perc * max(score); % Define threshold here
idx = score >= T;

The end result will be multiple bounding boxes of the detected objects in the image - one annotation per detected object.

这篇关于如何在自定义数据集上执行 RCNN 对象检测?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在自定义数据集上执行 RCNN 对象检测? [英] How to perform RCNN object detection on custom dataset?

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何在自定义数据集上执行 RCNN 对象检测? [英] How to perform RCNN object detection on custom dataset?

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭