使用图像裁剪作为训练数据集的 TensorFlow 对象检测 API [英] TensorFlow Object Detection API using image crops as training dataset

查看:39
本文介绍了使用图像裁剪作为训练数据集的 TensorFlow 对象检测 API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 Tensorflow 对象检测 API 训练一个 ssd-inception-v2 模型.我想使用的训练数据集是一堆不同大小的裁剪图像,没有边界框,因为裁剪本身就是边界框.

I want to train a ssd-inception-v2 model from Tensorflow Object Detection API. The training dataset I want to use is a bunch of cropped images with different sizes without bounding boxes, as the crop itself is the bounding boxes.

我按照 create_pascal_tf_record.py 示例相应地替换了边界框和分类部分以生成如下 TFRecord:

I followed the create_pascal_tf_record.py example replacing the bounding boxes and classifications portion accordingly to generate the TFRecords as follows:

def dict_to_tf_example(imagepath, label):
    image = Image.open(imagepath)
    if image.format != 'JPEG':
         print("Skipping file: " + imagepath)
         return
    img = np.array(image)
    with tf.gfile.GFile(imagepath, 'rb') as fid:
        encoded_jpg = fid.read()
    # The reason to store image sizes was demonstrated
    # in the previous example -- we have to know sizes
    # of images to later read raw serialized string,
    # convert to 1d array and convert to respective
    # shape that image used to have.
    height = img.shape[0]
    width = img.shape[1]
    key = hashlib.sha256(encoded_jpg).hexdigest()
    # Put in the original images into array
    # Just for future check for correctness

    xmin = [5.0/100.0]
    ymin = [5.0/100.0]
    xmax = [95.0/100.0]
    ymax = [95.0/100.0]
    class_text = [label['name'].encode('utf8')]
    classes = [label['id']]
    example = tf.train.Example(features=tf.train.Features(feature={
        'image/height':dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(imagepath.encode('utf8')),
        'image/source_id': dataset_util.bytes_feature(imagepath.encode('utf8')),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
        'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),        
        'image/object/class/text': dataset_util.bytes_list_feature(class_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymax)
    }))

    return example


def main(_):

  data_dir = FLAGS.data_dir
  output_path = os.path.join(data_dir,FLAGS.output_path + '.record')
  writer = tf.python_io.TFRecordWriter(output_path)
  label_map = label_map_util.load_labelmap(FLAGS.label_map_path)
  categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=80, use_display_name=True)
  category_index = label_map_util.create_category_index(categories)
  category_list = os.listdir(data_dir)
  gen = (category for category in categories if category['name'] in category_list)
  for category in gen:
    examples_path = os.path.join(data_dir,category['name'])
    examples_list = os.listdir(examples_path)
    for example in examples_list:
        imagepath = os.path.join(examples_path,example)

        tf_example = dict_to_tf_example(imagepath,category)
        writer.write(tf_example.SerializeToString())
 #       print(tf_example)

  writer.close()

边界框是硬编码的,包含整个图像.标签是根据其相应的目录给出的.我使用 mscoco_label_map.pbxt 进行标记,使用 ssd_inception_v2_pets.config 作为管道的基础.

The bounding box is hard coded encompassing the whole image. The labels are given accordingly to its corresponding directory. I am using the mscoco_label_map.pbxt for labeling and the ssd_inception_v2_pets.config as base for my pipeline.

我训练并冻结了模型以用于 jupyter notebook 示例.然而,最终的结果是围绕整个图像的单个框.知道出了什么问题吗?

I trained and froze the model to use with the jupyter notebook example. However, the final result is a single box surrounding the whole image. Any idea on what went wrong?

推荐答案

对象检测算法/网络通常通过预测边界框和类的位置来工作.出于这个原因,训练数据通常需要包含边界框数据.通过为您的模型提供带有始终与图像大小相同的边界框的训练数据,您很可能会得到垃圾预测,包括始终勾勒出图像轮廓的框.

Object detection algorithms/networks often work by predicting the location of a bounding box as well as the class. For this reason the training data often needs to contain bounding box data. By feeding your model with training data with a bounding box that is always the size of the image then it's likely you'll get garbage predictions out including a box that always outlines the image.

这听起来像是您的训练数据有问题.您不应该提供裁剪的图像,而是提供带有注释对象的完整图像/场景.此时您基本上是在训练分类器.

This sounds like a problem with your training data. You shouldn't give cropped images but instead full images/scenes with your object annotated. You're basically training a classifier at this point.

尝试使用未裁剪的正确图像风格进行训练,看看效果如何.

Try training with the correct style of images that are not cropped and see how you get on.

这篇关于使用图像裁剪作为训练数据集的 TensorFlow 对象检测 API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆