TensorFlow dataset.shuffle() 与 repeat() 和 batch() 一起使用时的行为 [英] TensorFlow dataset.shuffle() behavior when used with repeat() and batch()

查看:24
本文介绍了TensorFlow dataset.shuffle() 与 repeat() 和 batch() 一起使用时的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这究竟会做什么?

dataset = tf.data.Dataset.from_tensor_slices([0, 0, 0, 1, 1, 1, 2, 2, 2])
dataset.shuffle(buffer_size=5).repeat().batch(3)

我注意到了几个相关的问题,但没有一个能准确回答我的问题.我对 shuffle(buffer_size) 正在做什么感到困惑.我知道它将需要 5 个第一个示例 [0, 0, 0, 1, 1] 进入内存,但是接下来它会用这个缓冲区做什么呢?以及这个缓冲区如何与 repeat()batch() 交互?

I've noticed several related questions but none of them answered exactly my concern. I'm confused with what shuffle(buffer_size) is doing. I understand it will take 5 first examples [0, 0, 0, 1, 1] into memory, but what will it do next with this buffer? And how does this buffer interact with repeat() and batch()?

推荐答案

shuffle 的工作方式很复杂,但你可以假装它工作,首先填充一个大小为 buffer_size 的缓冲区,然后每次请求一个元素时,采样该缓冲区中的均匀随机位置并用新元素替换它.

The way shuffle works is complicated, but you can pretend it works by first filling a buffer of size buffer_size and then, every time you ask for an element, sampling a uniformly random position in that buffer and replacing that with a fresh element.

在洗牌前进行批处理意味着您将对预先制作的小批次进行洗牌(因此小批本身不会改变,只会改变它们的顺序),而在洗牌后进行批处理则可以让您随机更改批次的内容.同样,在洗牌前重复意味着您将洗牌一个无限流示例(因此第二个时期将与第一个时期具有不同的顺序),而在洗牌后重复意味着您将在每个时期中始终看到相同的示例.

Batching before shuffling means you'll shuffle pre-made minibatches (so the minibatches themselves won't change, just their order) while batching after shuffling lets you change the contents of the batches themselves randomly. Similarly, repeat before shuffling means you will shuffle an infinite stream examples (so the second epoch will have a different order than the first epoch) while repeating after shuffling means you'll always see the same examples in each epoch.

这篇关于TensorFlow dataset.shuffle() 与 repeat() 和 batch() 一起使用时的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆