像队列一样将数据馈入 tf.contrib.data.Dataset [英] feed data into a tf.contrib.data.Dataset like a queue

查看:17
本文介绍了像队列一样将数据馈入 tf.contrib.data.Dataset的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于 tf.contrib.data.Dataset(来自 TensorFlow 1.2,参见 这里此处) 用法:如何获取数据的方式并不适合我通常获取数据的方式.就我而言,我有一个线程,我在那里接收数据,我不知道它什么时候结束,但我知道它什么时候结束.然后我等到我处理完所有缓冲区,然后我完成了一个时代.如何使用 Dataset 获取此逻辑?

About the tf.contrib.data.Dataset (from TensorFlow 1.2, see here and here) usage: The way how to get data doesn't really fit any way how I get the data usually. In my case, I have a thread and I receive data there and I don't know in advance when it will end but I see when it ends. Then I wait until I processed all the buffers and then I have finished one epoch. How can I get this logic with the Dataset?

请注意,我更喜欢 Dataset 接口而不是 QueueBase 接口,因为它为我提供了迭代器接口,我可以重新初始化甚至重置为不同的 Dataset.与关闭后当前无法重新打开的队列相比,这更强大(参见 此处此处).

Note that I prefer the Dataset interface over the QueueBase interface because it gives me the iterator interface which I can reinitialize and even reset to a different Dataset. This is more powerful compared to queues which cannot be reopened currently after they are closed (see here and here).

也许是一个类似的问题,或者同样的问题:如何将 Dataset 环绕在队列上?我有一些线程可以从某处读取一些数据,并且可以提供它并以某种方式对其进行排队.如何将数据放入Dataset?我可以无限次重复一些虚拟张量,然后使用 map 来返回我的 queue.dequeue() 但这实际上只会让我回到队列的所有原始问题,即如何重新打开队列.

Maybe a similar question, or the same question: How can I wrap around a Dataset over a queue? I have some thread with reads some data from somewhere and which can feed it and queue it somehow. How do I get the data into the Dataset? I could repeat some dummy tensor infinite times and then use map to just return my queue.dequeue() but that really only gets me back to all the original problems with the queue, i.e. how to reopen the queue.

推荐答案

新的 Dataset.from_generator() 方法允许您定义由 Python 生成器提供的 Dataset.(要目前使用此功能,您必须下载 TensorFlow 的夜间构建版或自己从源代码构建它.它将成为 TensorFlow 1.4 的一部分.)

The new Dataset.from_generator() method allows you to define a Dataset that is fed by a Python generator. (To use this feature at present, you must download a nightly build of TensorFlow or build it yourself from source. It will be part of TensorFlow 1.4.)

实现示例的最简单方法是用生成器替换接收线程,伪代码如下:

The easiest way to implement your example would be to replace your receiving thread with a generator, with pseudocode as follows:

def receiver():
  while True:
    next_element = ...  # Receive next element from external source.
                        # Note that this method may block.

    end_of_epoch = ...  # Decide whether or not to stop based on next_element.

    if not end_of_epoch:
      yield next_element  # Note: you may need to convert this to an array.
    else:
      return  # Returning will signal OutOfRangeError on downstream iterators.

dataset = tf.contrib.data.Dataset.from_generator(receiver, output_types=...)

# You can chain other `Dataset` methods after the generator. For example:
dataset = dataset.prefetch(...)  # This will start a background thread
                                 # to prefetch elements from `receiver()`.

dataset = dataset.repeat(...)  # Note that each repetition will call
                               # `receiver()` again, and start from
                               # a fresh state.

dataset = dataset.batch(...)

更复杂的拓扑结构是可能的.例如,您可以使用 Dataset.interleave()并行创建多个接收器.

More complicated topologies are possible. For example, you can use Dataset.interleave() to create many receivers in parallel.

这篇关于像队列一样将数据馈入 tf.contrib.data.Dataset的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆