使用图像裁剪作为训练数据集的TensorFlow对象检测API [英] TensorFlow Object Detection API using image crops as training dataset

查看:298
本文介绍了使用图像裁剪作为训练数据集的TensorFlow对象检测API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从Tensorflow对象检测API训练ssd-inception-v2模型。我要使用的训练数据集是一堆大小不同的裁剪图像,没有边界框,因为裁剪本身就是边界框。

I want to train a ssd-inception-v2 model from Tensorflow Object Detection API. The training dataset I want to use is a bunch of cropped images with different sizes without bounding boxes, as the crop itself is the bounding boxes.

我遵循了create_pascal_tf_record.py示例相应地替换边界框和分类部分以生成TFRecord,如下所示:

I followed the create_pascal_tf_record.py example replacing the bounding boxes and classifications portion accordingly to generate the TFRecords as follows:

def dict_to_tf_example(imagepath, label):
    image = Image.open(imagepath)
    if image.format != 'JPEG':
         print("Skipping file: " + imagepath)
         return
    img = np.array(image)
    with tf.gfile.GFile(imagepath, 'rb') as fid:
        encoded_jpg = fid.read()
    # The reason to store image sizes was demonstrated
    # in the previous example -- we have to know sizes
    # of images to later read raw serialized string,
    # convert to 1d array and convert to respective
    # shape that image used to have.
    height = img.shape[0]
    width = img.shape[1]
    key = hashlib.sha256(encoded_jpg).hexdigest()
    # Put in the original images into array
    # Just for future check for correctness

    xmin = [5.0/100.0]
    ymin = [5.0/100.0]
    xmax = [95.0/100.0]
    ymax = [95.0/100.0]
    class_text = [label['name'].encode('utf8')]
    classes = [label['id']]
    example = tf.train.Example(features=tf.train.Features(feature={
        'image/height':dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(imagepath.encode('utf8')),
        'image/source_id': dataset_util.bytes_feature(imagepath.encode('utf8')),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
        'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),        
        'image/object/class/text': dataset_util.bytes_list_feature(class_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymax)
    }))

    return example


def main(_):

  data_dir = FLAGS.data_dir
  output_path = os.path.join(data_dir,FLAGS.output_path + '.record')
  writer = tf.python_io.TFRecordWriter(output_path)
  label_map = label_map_util.load_labelmap(FLAGS.label_map_path)
  categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=80, use_display_name=True)
  category_index = label_map_util.create_category_index(categories)
  category_list = os.listdir(data_dir)
  gen = (category for category in categories if category['name'] in category_list)
  for category in gen:
    examples_path = os.path.join(data_dir,category['name'])
    examples_list = os.listdir(examples_path)
    for example in examples_list:
        imagepath = os.path.join(examples_path,example)

        tf_example = dict_to_tf_example(imagepath,category)
        writer.write(tf_example.SerializeToString())
 #       print(tf_example)

  writer.close()

边界框是硬编码的,包含了整个图像。标签会相应地提供给其相应目录。我正在使用mscoco_label_map.pbxt进行标记,并使用ssd_inception_v2_pets.config作为管道的基础。

The bounding box is hard coded encompassing the whole image. The labels are given accordingly to its corresponding directory. I am using the mscoco_label_map.pbxt for labeling and the ssd_inception_v2_pets.config as base for my pipeline.

我训练并冻结了与jupyter笔记本示例结合使用的模型。但是,最终结果是单个框包围了整个图像。对发生问题的任何想法吗?

I trained and froze the model to use with the jupyter notebook example. However, the final result is a single box surrounding the whole image. Any idea on what went wrong?

推荐答案

对象检测算法/网络通常可以通过预测边界框的位置以及班级。因此,训练数据通常需要包含边界框数据。通过为模型提供训练数据并带有始终是图像大小的边界框,那么很可能会产生垃圾预测,包括始终勾勒出图像轮廓的框。

Object detection algorithms/networks often work by predicting the location of a bounding box as well as the class. For this reason the training data often needs to contain bounding box data. By feeding your model with training data with a bounding box that is always the size of the image then it's likely you'll get garbage predictions out including a box that always outlines the image.

这听起来像您的训练数据有问题。您不应该提供裁切后的图像,而应标注带有对象的完整图像/场景。此时,您基本上是在训练分类器。

This sounds like a problem with your training data. You shouldn't give cropped images but instead full images/scenes with your object annotated. You're basically training a classifier at this point.

尝试使用未裁剪的正确样式的图像进行训练,看看您的情况如何。

Try training with the correct style of images that are not cropped and see how you get on.

这篇关于使用图像裁剪作为训练数据集的TensorFlow对象检测API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆