从张量流中的文件队列访问文件名 [英] Accessing filename from file queue in Tensor Flow

查看:29
本文介绍了从张量流中的文件队列访问文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个图像目录,以及一个将图像文件名与标签匹配的单独文件.所以图像目录有像'train/001.jpg'这样的文件,标签文件看起来像:

I have a directory of images, and a separate file matching image filenames to labels. So the directory of images has files like 'train/001.jpg' and the labeling file looks like:

train/001.jpg 1
train/002.jpg 2
...

通过从文件名创建文件队列,我可以轻松地从 Tensor Flow 中的图像目录加载图像:

I can easily load images from the image directory in Tensor Flow by creating a filequeue from the filenames:

filequeue = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
img = reader.read(filequeue)

但是我不知道如何将这些文件与标签文件中的标签结合起来.似乎我需要在每一步访问队列中的文件名.有没有办法得到它们?此外,一旦我有了文件名,我就需要能够查找由文件名键入的标签.似乎标准 Python 字典不起作用,因为这些计算需要在图中的每一步进行.

But I'm at a loss for how to couple these files with the labels from the labeling file. It seems I need access to the filenames inside the queue at each step. Is there a way to get them? Furthermore, once I have the filename, I need to be able to look up the label keyed by the filename. It seems like a standard Python dictionary wouldn't work because these computations need to happen at each step in the graph.

推荐答案

鉴于您的数据不是太大而无法将文件名列表作为 Python 数组提供,我建议您只在 Python 中进行预处理.创建文件名和标签的两个列表(相同顺序),并将它们插入到 randomshufflequeue 或队列中,然后从中出列.如果您想要 string_input_producer 的无限循环"行为,您可以在每个 epoch 开始时重新运行入队".

Given that your data is not too large for you to supply the list of filenames as a python array, I'd suggest just doing the preprocessing in Python. Create two lists (same order) of the filenames and the labels, and insert those into either a randomshufflequeue or a queue, and dequeue from that. If you want the "loops infinitely" behavior of the string_input_producer, you could re-run the 'enqueue' at the start of every epoch.

一个非常有趣的例子:

import tensorflow as tf

f = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8"]
l = ["l1", "l2", "l3", "l4", "l5", "l6", "l7", "l8"]

fv = tf.constant(f)
lv = tf.constant(l)

rsq = tf.RandomShuffleQueue(10, 0, [tf.string, tf.string], shapes=[[],[]])
do_enqueues = rsq.enqueue_many([fv, lv])

gotf, gotl = rsq.dequeue()

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    tf.train.start_queue_runners(sess=sess)
    sess.run(do_enqueues)
    for i in xrange(2):
        one_f, one_l = sess.run([gotf, gotl])
        print "F: ", one_f, "L: ", one_l

关键是当您执行 enqueue 时,您有效地将成对的文件名/标签入队,并且这些对由 dequeue 返回.

The key is that you're effectively enqueueing pairs of filenames/labels when you do the enqueue, and those pairs are returned by the dequeue.

这篇关于从张量流中的文件队列访问文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆