Tensorflow 检测 API 中的 SSD 锚点 [英] SSD anchors in Tensorflow detection API

查看:60
本文介绍了Tensorflow 检测 API 中的 SSD 锚点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 N x N 图像的自定义数据集上训练

在上图中,每行代表 3x3 特征图中的一个不同单元格,而每列代表一个特定的纵横比.

您最初的假设是正确的,例如,最高层(具有最低分辨率特征图)中宽高比为 1.0 的锚框的高度/宽度将等于输入图像大小的 0.9,而最低层中的锚框的高度/宽度将为具有等于​​输入图像大小的 0.2 的高度/宽度.中间层的锚点大小在这些限制之间线性插值.

然而,关于 TensorFlow 锚点生成的一些微妙之处值得注意:

  1. 请注意,在图像示例中,我们每个网格单元有 6 个锚点,但我们只指定了 5 个纵横比.这是因为为每一层添加了一个额外的锚点,其大小介于当前层的锚点大小和下一层的锚点大小之间.这可以通过使用上述anchor_kwargs 中的interpolated_scale_aspect_ratio 参数来修改(或删除),或者在您的配置中类似.
  2. 默认情况下,在对象检测特征图(具有最高分辨率)的最低层中,预先指定的纵横比列表将被忽略,而仅替换为 3 个纵横比.这可以用 reduce_boxes_in_lowest_layer 布尔参数覆盖.
  3. 正如您正确指出的那样,默认情况下base_anchor_height = base_anchor_width = 1.但是,如果您的输入图像不是方形的并且在预处理期间被重新整形,那么具有方面 1.0 的方形"锚点实际上不会针对原始图像中方形的锚定对象进行优化(尽管它当然可以学习预测这些训练期间的形状).

可以在此处找到完整的要点.

I want to train an SSD detector on a custom dataset of N by N images. So I dug into Tensorflow object detection API and found a pretrained model of SSD300x300 on COCO based on MobileNet v2.

When looking at the config file used for training: the field anchor_generator looks like this: (which follows the paper)

anchor_generator {
  ssd_anchor_generator {
    num_layers: 6
    min_scale: 0.2
    max_scale: 0.9
    aspect_ratios: 1.0
    aspect_ratios: 2.0
    aspect_ratios: 0.5
    aspect_ratios: 3.0
    aspect_ratios: 0.33
    }
}

When looking at SSD anchor generator proto am I correct in assuming that therefore: base_anchor_height=base_anchor_width=1 ?

If yes I assume the resulting anchors one gets are by reading Multiple Grid anchors generator (if the image is a 300x300 square ) are: of size ranging from 0.2*300=60*60 pixels to 0.9*300=270*270 pixels (with different aspect ratios) ?

Hence if one wanted to train on NxN images by fixing the field:

fixed_shape_resizer {
  height: N
  width: N
}

He would get using the same config file anchors ranging from (0.2*N,0.2*N) pixels to (0.9*N,0.9*N) pixels (with different aspect ratios) ?

I did a lot of assuming because the code is hard to grasp and there seems to be close to no doc yet. Am I correct ? Is there an easy way to visualize the anchors used without training a model ?

解决方案

Here are some functions which can be used to generate and visualize the anchor box co-ordinates without training the model. All we are doing here is calling the relevant operations which are used in the graph during training/inference.

First we need to know what is the resolution (shape) of the feature maps which make up our object detection layers for an input image of a given size.

import tensorflow as tf 
from object_detection.anchor_generators.multiple_grid_anchor_generator import create_ssd_anchors
from object_detection.models.ssd_mobilenet_v2_feature_extractor_test import SsdMobilenetV2FeatureExtractorTest

def get_feature_map_shapes(image_height, image_width):
    """
    :param image_height: height in pixels
    :param image_width: width in pixels
    :returns: list of tuples containing feature map resolutions
    """
    feature_extractor = SsdMobilenetV2FeatureExtractorTest()._create_feature_extractor(
        depth_multiplier=1,
        pad_to_multiple=1,
    )
    image_batch_tensor = tf.zeros([1, image_height, image_width, 1])

    return [tuple(feature_map.get_shape().as_list()[1:3])
            for feature_map in feature_extractor.extract_features(image_batch_tensor)]

This will return a list of feature map shapes, for example [(19,19), (10,10), (5,5), (3,3), (2,2), (1,1)] which you can pass to a second function which returns the co-ordinates of the anchor boxes.

def get_feature_map_anchor_boxes(feature_map_shape_list, **anchor_kwargs):
    """
    :param feature_map_shape_list: list of tuples containing feature map resolutions
    :returns: dict with feature map shape tuple as key and list of [ymin, xmin, ymax, xmax] box co-ordinates
    """
    anchor_generator = create_ssd_anchors(**anchor_kwargs)

    anchor_box_lists = anchor_generator.generate(feature_map_shape_list)

    feature_map_boxes = {}

    with tf.Session() as sess:
        for shape, box_list in zip(feature_map_shape_list, anchor_box_lists):
            feature_map_boxes[shape] = sess.run(box_list.data['boxes'])

    return feature_map_boxes

In your example you can call it like this:

boxes = get_feature_map_boxes(
    min_scale=0.2,
    max_scale=0.9,
    feature_map_shape_list=get_feature_map_shapes(300, 300)
)

You do not need to specify the aspect ratios as the ones in your config are identical to the defaults of create_ssd_anchors.

Lastly we plot the anchor boxes on a grid that reflects the resolution of a given layer. Note that the co-ordinates of the anchor boxes and prediction boxed from the model are normalized between 0 and 1.

def draw_boxes(boxes, figsize, nrows, ncols, grid=(0,0)):

    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize) 

    for ax, box in zip(axes.flat, boxes):
        ymin, xmin, ymax, xmax = box
        ax.add_patch(patches.Rectangle((xmin, ymin), xmax-xmin, ymax-ymin, 
                                fill=False, edgecolor='red', lw=2))

        # add gridlines to represent feature map cells
        ax.set_xticks(np.linspace(0, 1, grid[0] + 1), minor=True)
        ax.set_yticks(np.linspace(0, 1, grid[1] + 1), minor=True)
        ax.grid(True, which='minor', axis='both')

    fig.tight_layout()

    return fig

If we were to take the fourth layer which has a 3x3 feature map as an example

draw_boxes(feature_map_boxes[(3,3)], figsize=(12,16), nrows=9, ncols=6, grid=(3,3))

In the image above each row represents a different cell in the 3x3 feature map, whilst each column represents a specific aspect ratio.

You initial assumptions were correct, for example the anchor box with aspect 1.0 in the highest layer (with lowest resolution feature map) will have a height/width equal to 0.9 of the input image size, whilst those in the lowest layer will have an height/width equal to 0.2 of the input image size. The anchor sizes of the layers in the middle are linearly interpolated between those limits.

However there are few subtleties regarding the TensorFlow anchor generation that are worth being aware of:

  1. Note in the image example we have 6 anchors per grid cell, but we only specify 5 aspect ratios. This is because an additional anchor is added for each layer which has a size mid-way between the anchor size of the current layer and the anchor size of the next layer. This can be modified (or removed) by using the interpolated_scale_aspect_ratio parameter in anchor_kwargs above, or likewise in your config.
  2. By default the list of pre-specified aspect ratios are ignored in the lowest layer of your object detection feature maps (with the finest resolution), and replaced with just 3 aspect ratios. This can be overridden with the reduce_boxes_in_lowest_layer boolean parameter.
  3. As you correctly pointed out, by default base_anchor_height = base_anchor_width = 1. However if your input image was not square and was reshaped during pre-processing, then a "square" anchor with aspect 1.0 will not actually be optimized for anchoring objects which were square in the original image (although of course it can learn to predict these shapes during training).

The full gist can be found here.

这篇关于Tensorflow 检测 API 中的 SSD 锚点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆