如何使用TensorFlow阅读器和队列同时读取两个文件? [英] How to use TensorFlow reader and queue to read two file at same time?

查看:27
本文介绍了如何使用TensorFlow阅读器和队列同时读取两个文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的训练集包含两种文件:文件名为1.png"的训练图像和名为1.label.txt"的标签文件.

My training set contains two kinds of file: training image with file name like "1.png" and label file with name like "1.label.txt".

我在这样的教程中发现了 Queue 和 Reader 的一些用法:

I found some usage of Queue and Reader in tutorials like this:

filename_queue = tf.train.string_input_producer(filenames)
result.key, value = reader.read(filename_queue)

但是,因为我的训练集包含两种文件,一种对应一种.如何像上面的代码一样使用 Queue 和 Reader?

However, because my training set contains two kinds of file, one correspond to one. How can I make use of Queue and Reader like code above?

编辑

我正在考虑使用一个包含基本名称的队列来提供给另外两个队列,分别是图像和标签.像这样的代码:

I am thinking about using one queue containing base names to feed to another two queue, which is image and label respectively. Code like this:

with tf.Session() as sess:
  base_name_queue = tf.train.string_input_producer(['image_names'], num_epochs=20)
  base_name = base_name_queue.dequeue()
  image_name = base_name + ".png"
  image_name_queue = data_flow_ops.FIFOQueue(32, image_name.dtype.base_dtype)
  image_name_queue.enqueue([image_name])
  x = image_name_queue.dequeue()
  print_op = tf.Print(image_name, [image_name])

  qr = tf.train.QueueRunner(base_name_queue, [base_name_queue] * 4)
  coord = tf.train.Coordinator()
  enqueue_threads = qr.create_threads(sess, coord=coord, start=True)

  for step in range(1000000):
    if coord.should_stop():
      break
    print(sess.run(print_op))

  coord.request_stop()
  coord.join(enqueue_threads)

但运行此代码会导致错误:

But running this code would result in an error:

TypeError: Fetch argument of has invalid type ,必须是字符串或张量.(无法将 FIFOQueue 转换为张量或操作.)

TypeError: Fetch argument of has invalid type , must be a string or Tensor. (Can not convert a FIFOQueue into a Tensor or Operation.)

并且错误指向这一行:

coord.join(enqueue_threads)

我想我一定误解了 TensorFlow 队列的工作原理.

I think I must misunderstand how TensorFlow queue works.

推荐答案

我已经找到了解决我的问题的方法.我想在这里发布答案而不是删除我的问题,希望这对 TensorFlow 的新手有所帮助.

答案包含两部分:

解决方案很简单:

  1. 使用 2 个队列来存储两组文件.请注意,这两个集合的顺序应相同.
  2. 分别使用dequeue做一些预处理.
  3. 将两个预处理过的张量合并为一个列表,并将列表传递给shuffle_batch
  1. Use 2 queue to store two set of files. Note that the two set should be ordered in the same way.
  2. Do some preprocessing respectively using dequeue.
  3. Combine two preprocessed tensor into one list and pass the list to shuffle_batch

代码在这里:

base_names = ['file1', 'file2']
base_tensor = tf.convert_to_tensor(base_names)
image_name_queue = tf.train.string_input_producer(
  tensor + '.png',
  shuffle=False # Note: must set shuffle to False
)
label_queue = tf.train.string_input_producer(
  tensor + '.lable.txt',
  shuffle=False # Note: must set shuffle to False
)

# use reader to read file
image_reader = tf.WholeFileReader()
image_key, image_raw = image_reader.read(image_name_queue)
image = tf.image.decode_png(image_raw)
label_reader = tf.WholeFileReader()
label_key, label_raw = label_reader.read(label_queue)
label = tf.image.decode_raw(label_raw)

# preprocess image
processed_image = tf.image.per_image_whitening(image)
batch = tf.train.shuffle_batch([processed_image, label], 10, 100, 100)

# print batch
queue_threads = queue_runner.start_queue_runners()
print(sess.run(batch))

第 2 部分:队列、QueueRunner、协调器和辅助函数

Queue 真的是一个队列(看起来毫无意义).队列有两种方法:enqueuedequeue.enqueue的输入是Tensor(好吧,你可以把普通数据入队,但是内部会转换成Tensor).dequeue 的返回值是一个 Tensor.所以你可以像这样制作队列管道:

Part 2: Queue, QueueRunner, Coordinator and helper functions

Queue is really a queue (seems meaningless). A queue has two method: enqueue and dequeue. The input of enqueue is Tensor (well, you can enqueue normal data, but it will be converted to Tensor internally). The return value of dequeue is a Tensor. So you can make pipeline of queues like this:

q1 = data_flow_ops.FIFOQueue(32, tf.int)
q2 = data_flow_ops.FIFOQueue(32, tf.int)
enq1 = q1.enqueue([1,2,3,4,5])
v1 = q1.dequeue()
enq2 = q2.enqueue(v1)

在 TensorFlow 中使用队列的好处是异步加载数据,这将提高性能并节省内存.上面的代码不可运行,因为没有线程运行这些操作.QueueRunner 旨在描述如何并行enqueue 数据.所以初始化QueueRunner的参数是一个enqueue操作(enqueue的输出).

The benefit of using queue in TensorFlow is to asynchronously load data, which will improve performance and save memory. The code above is not runnable, because there is no thread running those operations. QueueRunner is designed to describe how to enqueue data in parallel. So the parameter of initializing QueueRunner is an enqueue operation (the output of enqueue).

设置完所有QueueRunner后,您必须启动所有线程.一种方法是在创建它们时启动它们:

After setting up all the QueueRunners, you have to start all the threads. One way is to start them when creating them:

enqueue_threads = qr.create_threads(sess, coord=coord, start=True)

或者,您可以在所有设置工作完成后启动所有线程:

or, you can start all threads after all the setting up works done:

# add queue runner
queue_runner.add_queue_runner(queue_runner.QueueRunner(q, [enq]))

# start all queue runners
queue_threads = queue_runner.start_queue_runners()

当所有线程开始时,您必须决定何时退出.协调员是来做这件事的.Coordinator 就像所有正在运行的线程之间的共享标志.如果其中一个完成或遇到错误,它会调用coord.request_stop(),然后所有线程在调用coord.should_stop()时都会得到True.所以使用Coordinator的模式是:

When all the threads started, you have to decide when to exit. Coordinator is here to do this. Coordinator is like a shared flag between all the running threads. if one of them finished or run into error, it will call coord.request_stop(), then all the thread will get True when calling coord.should_stop(). So the pattern of using Coordinator is:

coord = tf.train.Coordinator()

for step in range(1000000):
  if coord.should_stop():
    break
  print(sess.run(print_op))

coord.request_stop()
coord.join(enqueue_threads)

这篇关于如何使用TensorFlow阅读器和队列同时读取两个文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆