`tf.data.Dataset.repeat()` 是否在内存中缓冲整个数据集? [英] Does `tf.data.Dataset.repeat()` buffer the entire dataset in memory?

查看:30
本文介绍了`tf.data.Dataset.repeat()` 是否在内存中缓冲整个数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查看 TF 文档中的此代码示例:

Looking at this code example from the TF documentation:

filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(...)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.repeat(num_epochs)
iterator = dataset.make_one_shot_iterator()

dataset.repeat(num_epochs) 是否需要将整个数据集加载到内存中?还是在收到数据集结束异常时重新初始化它之前的数据集?

Does the dataset.repeat(num_epochs) require that the entire dataset be loaded into memory? Or is it re-initializing the dataset(s) that came before it when it receives an end-of-dataset exception?

文档在这一点上模棱两可.

The documentation is ambiguous about this point.

推荐答案

根据这个简单的测试,repeat 似乎缓冲数据集,它必须重新初始化上游数据集.

Based on this simple test it appears that repeat does not buffer the dataset, it must be re-initializing the upstream datasets.

n = tf.data.Dataset.range(5).shuffle(buffer_size=5).repeat(2).make_one_shot_iterator().get_next()
[sess.run(n) for _ in range(10)]
Out[83]: [2, 0, 3, 1, 4, 3, 1, 0, 2, 4]

逻辑表明,如果 repeat 正在缓冲其输入,那么在这个简单的实验中就会重复相同的随机洗牌模式.

Logic suggests that if repeat were buffering its input, the same random shuffle pattern would have have been repeated in this simple experiment.

这篇关于`tf.data.Dataset.repeat()` 是否在内存中缓冲整个数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆