使用 Tensorflow 对象检测 api 打乱训练数据集 [英] Shuffling the training dataset with Tensorflow object detection api

查看:299
本文介绍了使用 Tensorflow 对象检测 api 打乱训练数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究使用 Faster-RCNN 模型和 Tensorflow 对象检测 API 的徽标检测算法.我的数据集是按字母顺序排列的(所以有一百个阿迪达斯标志,然后是一百个苹果标志等等).我希望它在训练时被洗牌.

I'm working on a logo detection algorithm using the Faster-RCNN model with the Tensorflow object detection api. My dataset is alphabetically ordered (so there are a hundred adidas logo, then hundred apple logo etc.). And i would like it to be shuffled while training.

我在配置文件中添加了一些值:

I've put some values in the config file:

train_input_reader:{
          shuffle: true
          queue_capacity: some value
          min_after_dequeue : some other value}

不管我输入的是什么值,算法首先是训练 a 的所有标志(阿迪达斯、苹果等),在开始看到 b 的标志(bmw 等)后只是一段时间.) 和 c 的一个等.

However whatever are the values, I'm putting in, algorithm is at first training on all of the a's logos (adidas, apple and so on) and only a lapse of time after starting to see the b's logos (bmw etc.) and the c's one etc.

当然我可以直接打乱我的输入数据集,但我想了解它背后的逻辑.

Of course I could just shuffle my input dataset directly, but I would like to understand the logic behind it.

PS:我见过这个 post 关于 shuffling 和 min_after_dequeue,但我还是不太明白.我的批量大小是 1,所以它不应该使用 tf.train.shuffle_batch() 而应该使用 tf.RandomShuffleQueue

PS: I've seen this post about shuffling and min_after_dequeue, but I still dont quite get it. My batch size is 1 so it shouldn't be using tf.train.shuffle_batch() but only tf.RandomShuffleQueue

我的训练数据集大小是 5000,如果我写 min_after_dequeue: 4000 or 5000 它仍然没有正确洗牌.为什么?

My training dataset size is 5000 and if I write min_after_dequeue: 4000 or 5000 it is still not shuffled right. Why though?

更新:@AllenLavoie 对我来说有点难;因为有很多依赖项,而且我是 Tensorflow 的新手.但最终队列是由

Update: @AllenLavoie It's a bit hard for me; as there is a lot of dependencies and I'm new to Tensorflow. But in the end the queue is constructed by

tf.contrib.slim.parallel_reader.parallel_read(    _, string_tensor = parallel_reader.parallel_read(
        config.input_path,
        reader_class=tf.TFRecordReader,
        num_epochs=(input_reader_config.num_epochs
                    if input_reader_config.num_epochs else None),
        num_readers=input_reader_config.num_readers,
        shuffle=input_reader_config.shuffle,
        dtypes=[tf.string, tf.string],
        capacity=input_reader_config.queue_capacity,
        min_after_dequeue=input_reader_config.min_after_dequeue)

似乎当我将 num_readers = 1 放在配置文件中时,数据集终于按照我的意愿进行了改组(至少在开始时),但是当有更多开始徽标按字母顺序排列.

It seems that when I'm putting num_readers = 1 in the config file the dataset is finally shuffling as I want, (at least in the beginning), but when there are more somehow on the start the logos are getting in the alphabetical order.

推荐答案

我建议在训练之前对数据集进行混洗.目前发生的改组方式并不完美,我对正在发生的事情的猜测是,在开始时队列开始为空,并且只得到以A"开头的示例——一段时间后它可能会更加改组,但没有在队列尚未填满时绕过开始部分.

I recommend shuffling the dataset prior to training. The way shuffling currently happens is imperfect and my guess at what is happening is that at the beginning the queue starts off empty and only gets examples that start with 'A' --- after a while it may be more shuffled, but there is no getting around the beginning part when the queue hasn't been filled yet.

这篇关于使用 Tensorflow 对象检测 api 打乱训练数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆