从迭代器创建`input_fn` [英] Creating `input_fn` from iterator
问题描述
大多数教程都关注整个训练数据集适合内存的情况.但是,我有一个迭代器,它充当(特征、标签)元组的无限流(即时创建它们的成本很低).
Most tutorials focus on the case where the entire training dataset fits into memory. However, I have an iterator which acts as an infinite stream of (features, labels)-tuples (creating them cheaply on the fly).
当为张量流实现 input_fn
时 estimator,我可以从迭代器返回一个实例作为
When implementing the input_fn
for tensorflows estimator, can I return an instance from the iterator as
def input_fn():
(feature_batch, label_batch) = next(it)
return tf.constant(feature_batch), tf.constant(label_batch)
还是 input_fn
必须在每次调用时返回相同的(特征、标签)元组?
or does input_fn
has to return the same (features, labels)-tuples on each call?
此外,这个函数在训练过程中被多次调用,我希望它像下面的伪代码一样:
Moreover is this function called multiple times during training as I hope it is like in the following pseudocode:
for i in range(max_iter):
learn_op(input_fn())
推荐答案
input_fn
的参数在整个训练过程中都会使用,但函数本身只调用一次.因此,创建一个复杂的 input_fn
不仅仅是返回一个常量数组,如 教程中所述 没有那么简单.
The argument of input_fn
are used throughout training but the function itself is called once. So creating a sophisticated input_fn
that goes beyond returning a constant array as explained in the tutorial is not as straightforward.
Tensorflow 为 numpy 和 panda 数组,但它们从内存中的数组开始,因此这对您的问题没有帮助.
Tensorflow proposes two examples of such non-trivial input_fn
for numpy and panda arrays, but they start from an array in memory, so this does not help you with your problem.
您也可以通过上面的链接查看他们的代码,了解他们如何实现高效的非平凡input_fn
,但您可能会发现它需要更多您想要的代码.
You could also have a look at their code by following the links above, to see how they implement an efficient non-trivial input_fn
, but you may find that it requires more code that you would like.
如果你愿意使用 Tensorflow 的低级接口,恕我直言,事情更简单、更灵活.有一个 教程 涵盖了大多数需求,并且建议的解决方案很容易(-er)实施.
If you are willing to use the less-high level interface of Tensorflow, things are IMHO simpler and more flexible. There is a tutorial that covers most needs and the proposed solutions are easy(-er) to implement.
特别是,如果您已经有一个迭代器可以返回您在问题中描述的数据,那么使用占位符(上一个链接中的Feeding"部分)应该很简单.
In particular, if you already have an iterator that returns data as you described in your question, using placeholders (section "Feeding" in the previous link) should be straightforward.
这篇关于从迭代器创建`input_fn`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!