将numpy数组传递给张量流队列 [英] Passing a numpy array to a tensorflow Queue

查看:75
本文介绍了将numpy数组传递给张量流队列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个NumPy数组,并想使用队列.我希望队列返回经过改组的整个数据,一些指定的时期并在此之后引发错误.如果我不需要对示例的大小或示例数进行硬编码,那将是最好的选择. 我认为 shuffle批次就是为了解决这个问题目的.我尝试过如下使用它:

I have a NumPy array and would like to read it in TensorFlow's code using a Queue. I would like the queue to return the whole data shuffled, some specified number of epochs and throw an error after that. It would be best if I'd not need to hardcode the size of an example nor the number of examples. I think shuffle batch is meant to serve that purpose. I have tried using it as follows:

data = tf.constant(train_np) # train_np is my numpy array of shape (num_examples, example_size)
batch = tf.train.shuffle_batch([data], batch_size=5, capacity=52200, min_after_dequeue=10, num_threads=1, seed=None, enqueue_many=True)

sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
batch.eval()

该方法的问题在于它会连续读取所有数据,并且我无法指定它在经过一定时期后才能完成.我知道我可以使用 RandomShuffleQueue 和将数据插入几次,但是: a)我不想浪费纪元*的内存数据,并且b)它将允许队列在纪元之间随机播放.

The problem with that approach is that it reads all the data continuously and I cannot specify it to finish after some number of epochs. I am aware I could use the RandomShuffleQueue and insert the data into it few times, but: a) I don't want to waste epoch*data of memory and b) it will allow the queue to shuffle between epochs.

在没有编写自己的队列的情况下,是否有一种很好的方法可以读取Tensorflow中的新纪元数据?

Is there a nice way to read the shuffled data in epochs in Tensorflow without writing your own Queue?

推荐答案

您可以创建另一个队列,将数据排入队列num_epoch次,关闭它,然后将其连接到您的batch.为了节省内存,可以使此队列变小,并并行将项目放入队列中.各个时期之间会有一些混淆.为了完全避免混淆,您可以在下面的代码中使用num_epochs=1并将其命名为num_epochs次.

You could create another queue, enqueue your data onto it num_epoch times, close it, and then hook it up to your batch. To save memory, you can make this queue small, and enqueue items onto it in parallel. There will be a bit of mixing between epochs. To fully prevent mixing, you could take code below with num_epochs=1 and call it num_epochs times.

tf.reset_default_graph()
data = np.array([1, 2, 3, 4])
num_epochs = 5
queue1_input = tf.placeholder(tf.int32)
queue1 = tf.FIFOQueue(capacity=10, dtypes=[tf.int32], shapes=[()])

def create_session():
    config = tf.ConfigProto()
    config.operation_timeout_in_ms=20000
    return tf.InteractiveSession(config=config)

enqueue_op = queue1.enqueue_many(queue1_input)
close_op = queue1.close()
dequeue_op = queue1.dequeue()
batch = tf.train.shuffle_batch([dequeue_op], batch_size=4, capacity=5, min_after_dequeue=4)

sess = create_session()

def fill_queue():
    for i in range(num_epochs):
        sess.run(enqueue_op, feed_dict={queue1_input: data})
    sess.run(close_op)

fill_thread = threading.Thread(target=fill_queue, args=())
fill_thread.start()

# read the data from queue shuffled
tf.train.start_queue_runners()
try:
    while True:
        print batch.eval()
except tf.errors.OutOfRangeError:
    print "Done"

当队列的大小不足以将整个numpy数据集加载到队列中时,上面的

BTW,enqueue_many模式将挂起.您可以通过按以下方式分块加载数据来使自己拥有较小的队列的灵活性.

BTW, enqueue_many pattern above will hang when the queue is not large enough to load the entire numpy dataset into it. You could give yourself flexibility to have a smaller queue by loading the data in chunks as below.

tf.reset_default_graph()
data = np.array([1, 2, 3, 4])
queue1_capacity = 2
num_epochs = 2
queue1_input = tf.placeholder(tf.int32)
queue1 = tf.FIFOQueue(capacity=queue1_capacity, dtypes=[tf.int32], shapes=[()])

enqueue_op = queue1.enqueue_many(queue1_input)
close_op = queue1.close()
dequeue_op = queue1.dequeue()

def dequeue():
    try:
        while True:
            print sess.run(dequeue_op)
    except:
        return 

def enqueue():
    for i in range(num_epochs):
        start_pos = 0
        while start_pos < len(data):
            end_pos = start_pos+queue1_capacity
            data_chunk = data[start_pos: end_pos]
            sess.run(enqueue_op, feed_dict={queue1_input: data_chunk})
            start_pos += queue1_capacity
    sess.run(close_op)

sess = create_session()

enqueue_thread = threading.Thread(target=enqueue, args=())
enqueue_thread.start()

dequeue_thread = threading.Thread(target=dequeue, args=())
dequeue_thread.start()

这篇关于将numpy数组传递给张量流队列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆