使用降采样特征图确定锚框在原始图像中的位置 [英] Determining position of anchor boxes in original image using downsampled feature map

查看:63
本文介绍了使用降采样特征图确定锚框在原始图像中的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,我了解到在更快的RCNN和SSD中使用的方法涉及生成一组锚定框.我们首先使用CNN对训练图像进行降采样,然后对降采样后的特征图中的每个像素(将形成锚框的中心)将其投影回训练图像上.然后,我们使用预定的比例和比率绘制以该像素为中心的锚点框.我不明白的是,为什么我们不以适当的步幅直接假设训练图像上的锚框中心,而使用CNN仅输出分类和回归值.通过使用CNN确定锚框的中心,最终将它们均匀分布在训练图像上,我们会得到什么?

From what I have read, I understand that methods used in faster-RCNN and SSD involve generating a set of anchor boxes. We first downsample the training image using a CNN and for every pixel in the downsampled feature map (which will form the center for our anchor boxes) we project it back onto the training image. We then draw the anchor boxes centered around that pixel using our pre-determined scales and ratios. What I dont understand is why dont we directly assume the centers of our anchor boxes on the training image with a suitable stride and use the CNN to only output the classification and regression values. What are we gaining by using the CNN to determine the centers of our anchor boxes which are ultimately going to be distributed evenly on the training image ?

更清楚地说-

在我们第一次预测偏移值之前,锚框的中心将在训练图像上的什么位置,我们如何确定这些值?

Where will the centers of our anchor boxes be on the training image before our first prediction of the offset values and how do we decide those?

推荐答案

我认为混乱来自于此:

使用CNN来确定锚点盒的中心会得到什么,这些锚点盒的中心最终将均匀分布在训练图像上

What are we gaining by using the CNN to determine the centers of our anchor boxes which are ultimately going to be distributed evenly on the training image

网络通常不预测中心,而是对先验信念的修正.初始锚点中心在整个图像上均匀分布,因此无法足够紧密地适合场景中的对象.这些锚点只是概率意义上的先验.您的网络将确切输出的内容取决于实现方式,但可能只是更新,即对这些初始先验的修正.这意味着您的网络预测的中心是一些 delta_x,delta_y ,它们会调整边界框.

The network usually doesn't predict centers but corrections to a prior belief. The initial anchor centers are distributed evenly across the image, and as such don't fit the objects in the scene tightly enough. Those anchors just constitute a prior in the probabilistic sense. What your network will exactly output is implementation dependent, but will likely just be updates, i.e. corrections to those initial priors. This means that the centers that are predicted by your network are some delta_x, delta_y that adjust the bounding boxes.

关于这部分:

为什么我们不以合适的步幅直接假设训练图像上的锚框中心,而使用CNN仅输出分类和回归值

why dont we directly assume the centers of our anchor boxes on the training image with a suitable stride and use the CNN to only output the classification and regression values

回归值仍应包含足够的信息,以独特的方式确定边界框.预测宽度,高度和中心偏移量(校正)是一种简单的方法,但这当然不是唯一的方法.例如,您可以修改网络以为每个像素预测到其最近的对象中心的距离矢量,或者可以使用参数曲线.但是,粗略的固定锚点中心不是一个好主意,因为当您使用它们合并代表对象的要素时,它们也会引起分类问题.

The regression values should still contain sufficient information to determine a bounding box in a unique way. Predicting width, height and center offsets (corrections) is a straightforward way to do it, but it's certainly not the only way. For example, you could modify the network to predict for each pixel, the distance vector to its nearest object center, or you could use parametric curves. However, crude, fixed anchor centers are not a good idea since they will also cause problems in classification, as you use them to pool features that are representative of the object.

这篇关于使用降采样特征图确定锚框在原始图像中的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆