使用对象检测api的默认配置时,不同尺寸的图像缩放器有什么影响 [英] What the impact of different dimension of image resizer when using default config of object detection api

查看:62
本文介绍了使用对象检测api的默认配置时,不同尺寸的图像缩放器有什么影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用 Tensorflow 的对象检测 API 来训练模型.我正在使用更快的 rcnn resnet101 的示例配置(https://github.com/tensorflow/models/blob/master/object_detection/samples/configs/faster_rcnn_resnet101_voc07.config).
以下代码是我不太明白的配置文件的一部分:

I was trying to use the object detection API of Tensorflow to train a model. And I was using the sample config of faster rcnn resnet101 (https://github.com/tensorflow/models/blob/master/object_detection/samples/configs/faster_rcnn_resnet101_voc07.config).
The following code was part of the config file I didn't quite understand:

image_resizer {
  keep_aspect_ratio_resizer {
    min_dimension: 600
    max_dimension: 1024
  }
}

我的问题是:

  1. min_dimensionmax_dimension 的确切含义是什么?这是否意味着将输入图像的大小调整为 600x1024 或 1024x600?
  2. 如果我有不同大小的图像,并且其中一些图像相对大于 600x1024(或 1024x600),我可以/应该增加 min_dimensionmax_dimension 的值?
  1. What was the exact meaning of min_dimension and max_dimension? Did it mean the size of input image would be resized to 600x1024 or 1024x600?
  2. If I had different size of image and maybe some of them are relatively larger than 600x1024 (or 1024x600), could/should I increase the value of min_dimension and max_dimension?

我有这个问题的原因来自这篇文章:TensorFlow 对象检测 API 异常行为

The reason why I had such question was from this post: TensorFlow Object Detection API Weird Behaviour

在这篇文章中,作者自己给出了一个问题的答案:

In this post, the author itself gave an answer to the question:

然后我决定裁剪输入图像并将其作为输入提供.只是为了看看结果是否有所改善,确实如此!
事实证明,输入图像的尺寸远大于模型接受的 600 x 1024.因此,它将这些图像缩小到 600 x 1024,这意味着烟盒丢失了它们的细节:)

Then I decided to crop the input image and provide that as an input. Just to see if the results improve and it did!
It turns out that the dimensions of the input image were much larger than the 600 x 1024 that is accepted by the model. So, it was scaling down these images to 600 x 1024 which meant that the cigarette boxes were losing their details :)

它使用的配置与我使用的相同.我不确定是否可以更改这些参数,如果它们是这个特殊模型的默认设置或推荐设置,faster_rcnn_resnet101.

It used the same config as I used. And I was not sure if I could change these parameters if they were default or recommended setting to this special model, faster_rcnn_resnet101.

推荐答案

经过一些测试,我想我找到了答案.如有不对请指正.

After some tests, I guess I find the answer. Please correct me if there is anything wrong.

在 .config 文件中:

In .config file:

image_resizer {
  keep_aspect_ratio_resizer {
    min_dimension: 600
    max_dimension: 1024
  }
}

根据'object_detection/builders/image_resizer_builder.py'的image resizer设置

According to the image resizer setting of 'object_detection/builders/image_resizer_builder.py'

if image_resizer_config.WhichOneof(
    'image_resizer_oneof') == 'keep_aspect_ratio_resizer':
  keep_aspect_ratio_config = image_resizer_config.keep_aspect_ratio_resizer
  if not (keep_aspect_ratio_config.min_dimension
          <= keep_aspect_ratio_config.max_dimension):
    raise ValueError('min_dimension > max_dimension')
  return functools.partial(
      preprocessor.resize_to_range,
      min_dimension=keep_aspect_ratio_config.min_dimension,
      max_dimension=keep_aspect_ratio_config.max_dimension)

然后它尝试使用'object_detection/core/preprocessor.py'的'resize_to_range'函数

Then it tries to use 'resize_to_range' function of 'object_detection/core/preprocessor.py'

  with tf.name_scope('ResizeToRange', values=[image, min_dimension]):
    image_shape = tf.shape(image)
    orig_height = tf.to_float(image_shape[0])
    orig_width = tf.to_float(image_shape[1])
    orig_min_dim = tf.minimum(orig_height, orig_width)

    # Calculates the larger of the possible sizes
    min_dimension = tf.constant(min_dimension, dtype=tf.float32)
    large_scale_factor = min_dimension / orig_min_dim
    # Scaling orig_(height|width) by large_scale_factor will make the smaller
    # dimension equal to min_dimension, save for floating point rounding errors.
    # For reasonably-sized images, taking the nearest integer will reliably
    # eliminate this error.
    large_height = tf.to_int32(tf.round(orig_height * large_scale_factor))
    large_width = tf.to_int32(tf.round(orig_width * large_scale_factor))
    large_size = tf.stack([large_height, large_width])

    if max_dimension:
      # Calculates the smaller of the possible sizes, use that if the larger
      # is too big.
      orig_max_dim = tf.maximum(orig_height, orig_width)
      max_dimension = tf.constant(max_dimension, dtype=tf.float32)
      small_scale_factor = max_dimension / orig_max_dim
      # Scaling orig_(height|width) by small_scale_factor will make the larger
      # dimension equal to max_dimension, save for floating point rounding
      # errors. For reasonably-sized images, taking the nearest integer will
      # reliably eliminate this error.
      small_height = tf.to_int32(tf.round(orig_height * small_scale_factor))
      small_width = tf.to_int32(tf.round(orig_width * small_scale_factor))
      small_size = tf.stack([small_height, small_width])

      new_size = tf.cond(
          tf.to_float(tf.reduce_max(large_size)) > max_dimension,
          lambda: small_size, lambda: large_size)
    else:
      new_size = large_size

    new_image = tf.image.resize_images(image, new_size,
                                       align_corners=align_corners)

从上面的代码,我们可以知道我们是否有一个大小为 800*1000 的图像.最终输出图像的大小为 600*750.

From the above code, we can know if we have an image whose size is 800*1000. The size of final output image will be 600*750.

也就是说,此图像调整器将始终根据 'min_dimension' 和 'max_dimension' 的设置调整输入图像的大小.

That is, this image resizer will always resize your input image according to the setting of 'min_dimension' and 'max_dimension'.

这篇关于使用对象检测api的默认配置时,不同尺寸的图像缩放器有什么影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆