从张量流中的文件队列访问文件名 [英] Accessing filename from file queue in Tensor Flow
问题描述
我有一个图像目录,以及一个将图像文件名与标签匹配的单独文件.所以图像目录有像'train/001.jpg'这样的文件,标签文件看起来像:
I have a directory of images, and a separate file matching image filenames to labels. So the directory of images has files like 'train/001.jpg' and the labeling file looks like:
train/001.jpg 1
train/002.jpg 2
...
通过从文件名创建文件队列,我可以轻松地从 Tensor Flow 中的图像目录加载图像:
I can easily load images from the image directory in Tensor Flow by creating a filequeue from the filenames:
filequeue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
img = reader.read(filequeue)
但是我不知道如何将这些文件与标签文件中的标签结合起来.似乎我需要在每一步访问队列中的文件名.有没有办法得到它们?此外,一旦我有了文件名,我就需要能够查找由文件名键入的标签.似乎标准 Python 字典不起作用,因为这些计算需要在图中的每一步进行.
But I'm at a loss for how to couple these files with the labels from the labeling file. It seems I need access to the filenames inside the queue at each step. Is there a way to get them? Furthermore, once I have the filename, I need to be able to look up the label keyed by the filename. It seems like a standard Python dictionary wouldn't work because these computations need to happen at each step in the graph.
推荐答案
鉴于您的数据不是太大而无法将文件名列表作为 Python 数组提供,我建议您只在 Python 中进行预处理.创建文件名和标签的两个列表(相同顺序),并将它们插入到 randomshufflequeue 或队列中,然后从中出列.如果您想要 string_input_producer 的无限循环"行为,您可以在每个 epoch 开始时重新运行入队".
Given that your data is not too large for you to supply the list of filenames as a python array, I'd suggest just doing the preprocessing in Python. Create two lists (same order) of the filenames and the labels, and insert those into either a randomshufflequeue or a queue, and dequeue from that. If you want the "loops infinitely" behavior of the string_input_producer, you could re-run the 'enqueue' at the start of every epoch.
一个非常有趣的例子:
import tensorflow as tf
f = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8"]
l = ["l1", "l2", "l3", "l4", "l5", "l6", "l7", "l8"]
fv = tf.constant(f)
lv = tf.constant(l)
rsq = tf.RandomShuffleQueue(10, 0, [tf.string, tf.string], shapes=[[],[]])
do_enqueues = rsq.enqueue_many([fv, lv])
gotf, gotl = rsq.dequeue()
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
sess.run(do_enqueues)
for i in xrange(2):
one_f, one_l = sess.run([gotf, gotl])
print "F: ", one_f, "L: ", one_l
关键是当您执行 enqueue
时,您有效地将成对的文件名/标签入队,并且这些对由 dequeue
返回.
The key is that you're effectively enqueueing pairs of filenames/labels when you do the enqueue
, and those pairs are returned by the dequeue
.
这篇关于从张量流中的文件队列访问文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!