Tensorflow 队列 - 在训练数据和验证数据之间切换 [英] Tensorflow Queues - Switching between train and validation data

查看:20
本文介绍了Tensorflow 队列 - 在训练数据和验证数据之间切换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用队列从 Tensorflow 中的文件加载数据.

I am trying to make use of queues for loading data from files in Tensorflow.

我想在每个 epoch 结束时使用验证数据运行图表,以便更好地了解训练的进展情况.

I would like to to run the graph with validation data at the end of each epoch to get a better feel for how the training is going.

这就是我遇到问题的地方.我似乎无法弄清楚如何使用队列时在训练数据和验证数据之间进行切换.

That is where i am running into problems. I cant seem to figure out how to make the switch between training data and validation data when using queues.

我已将我的代码精简为一个最小的玩具示例,以便更容易得到帮助.我没有包含加载图像文件、执行推理和训练的所有代码,而是在文件名加载到队列中的位置.

I have stripped down my code to a bare minimum toy example to make it easier to get help. Instead of including all the code that loads the image files, performs inference, and training, I have chopped it off at the point where the filenames are loaded into the queue.

import tensorflow as tf

#  DATA
train_items = ["train_file_{}".format(i) for i in range(6)]
valid_items = ["valid_file_{}".format(i) for i in range(3)]

# SETTINGS
batch_size = 3
batches_per_epoch = 2
epochs = 2

# CREATE GRAPH
graph = tf.Graph()
with graph.as_default():
    file_list = tf.placeholder(dtype=tf.string, shape=None)
    
    # Create a queue consisting of the strings in `file_list`
    q = tf.train.string_input_producer(train_items, shuffle=False, num_epochs=None)
    
    # Create batch of items.
    x = q.dequeue_many(batch_size)
    
    # Inference, train op, and accuracy calculation after this point
    # ...


# RUN SESSION
with tf.Session(graph=graph) as sess:
    # Initialize variables
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    
    # Start populating the queue.
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    
    try:
        for epoch in range(epochs):
            print("-"*60)
            for step in range(batches_per_epoch):
                if coord.should_stop():
                    break
                train_batch = sess.run(x, feed_dict={file_list: train_items})
                print("TRAIN_BATCH: {}".format(train_batch))
    
            valid_batch = sess.run(x, feed_dict={file_list: valid_items})
            print("
VALID_BATCH : {} 
".format(valid_batch))
    
    except Exception, e:
        coord.request_stop(e)
    finally:
        coord.request_stop()
        coord.join(threads)

变化和实验

num_epochs

尝试不同的值

num_epochs=无

如果我将 tf.train.string_input_producer() 中的 num_epochs 参数设置为None 它给出以下输出,这表明它正在按预期运行两个时期,但它正在使用数据运行评估时从训练集中获取.

Variations and experiments

Trying different values for num_epochs

num_epochs=None

If i set the num_epochs argument in tf.train.string_input_producer()to None it gives be the following output, which shows that it is running two epochs as intended, but it is using data from the training set when running evaluation.

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']

------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']

VALID_BATCH : ['train_file_3' 'train_file_4' 'train_file_5']

num_epochs=2

如果我将 tf.train.string_input_producer() 中的 num_epochs 参数设置为 2它给出了以下输出,这表明它甚至根本没有运行完整的两个批次(并且评估仍在使用训练数据)

num_epochs=2

If i set the num_epochs argument in tf.train.string_input_producer() to 2 it gives be the following output, which shows that it is not even running the full two batches at all (and evaliation is still using training data)

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']

------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

num_epochs=1

如果我将 tf.train.string_input_producer() 中的 num_epochs 参数设置为 1希望它会被冲走队列中的任何其他训练数据,以便它可以利用验证数据,我得到以下输出,这表明它正在终止它通过了一个时期的训练数据,并且没有通过加载评估数据.

num_epochs=1

If i set the num_epochs argument in tf.train.string_input_producer() to 1 in the hopes that it will flush out any aditional training data from the queue so it can make use of the validation data, i get the following output, which shows that it is terminating as soon as it gets through one epoch of training data, and does not get to go through loading evaluation data.

------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']

capacity 参数设置为各种值

我也试过设置 capacity 参数tf.train.string_input_producer() 到小的值,例如 3 和 1.但是这些对结果没有影响.

Setting capacity argument to various values

I have also tried setting the capacity argument in tf.train.string_input_producer() to small values, such as 3, and 1. But these had no effect on the results.

我还可以采取哪些其他方法在训练数据和验证数据之间切换?我必须创建单独的队列吗?我不知道如何做到这一点工作.我是否还必须创建额外的协调器和队列运行器?

What other approach could i take to switch between training and validation data? Would i have to create separate queues? I am at a loss as to how to get that to work. Would i have to create additional coordinators and queue runners as well?

推荐答案

我在这里整理了一份可能解决此问题的潜在方法列表.其中大部分只是模糊的建议,没有实际的代码示例来展示如何使用它们.

I am compiling a list of potential approaches that might solve this issue here. Most of these are just vague suggestions, with no actual code examples to show how to make use of them.

建议这里

建议这里

sygi 在这个 stackoverflow 线程上也提出了建议.链接

Also suggested by sygi on this very stackoverflow thread. link

建议这里

建议这里这里

由 sygi 在这个 stackoverflow 线程中建议(链接).这可能与 make_template() 方法相同.

suggested by sygi in this very stackoverflow thread (link). This might be the same as make_template() method.

建议 这里 带有示例代码 这里在这个线程上适应我的问题的代码.链接

Suggested here with sample code here Code adapted to my problem here on this thread. link

建议这里

这篇关于Tensorflow 队列 - 在训练数据和验证数据之间切换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆