如何从张量流数据集中取回批量大小? [英] How to get batch size back from a tensorflow dataset?

查看:35
本文介绍了如何从张量流数据集中取回批量大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

推荐使用tensorflow数据集作为输入管道,可以设置如下:

#指定数据集数据集 = tf.data.Dataset.from_tensor_slices((特征,标签))# 舒芙蕾数据集 = dataset.shuffle(buffer_size=1e5)# 指定批量大小数据集 = dataset.batch(128)# 创建迭代器迭代器 = dataset.make_one_shot_iterator()# 获取下一批next_batch = iterator.get_next()

我应该能够获得批量大小(从数据集本身或从它创建的迭代器,即 iteratornext_batch).也许有人想知道数据集或其迭代器中有多少批次.或者已经调用了多少批次以及迭代器中剩余多少批次?人们可能还想一次性获取特定元素,甚至整个数据集.

我在 tensorflow 文档中找不到任何内容.这可能吗?如果没有,有谁知道这是否已在 tensorflow GitHub 上作为问题被请求?

解决方案

至少在 TF2 中,数据集的类型是静态定义的,可以通过 tf.data.Dataset.element_spec 访问.>

这是一个有点复杂的返回类型,因为它具有与您的数据集匹配的元组嵌套.

>>>tf.data.Dataset.from_tensor_slices([[[1]],[[2]]]).element_spec.shapeTensorShape([1, 1])

如果你的数据被组织成一个元组[图像,标签],那么你会得到一个 TensorSpecs 元组.如果您确定返回类型的嵌套,则可以对其进行索引.例如

>>>图像 = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)>>>label = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)>>>train = tf.data.Dataset.zip((图像,标签))>>>train.element_spec[0].shape[0]2

It is recommended to use tensorflow dataset as the input pipeline which can be set up as follows:

# Specify dataset
dataset  = tf.data.Dataset.from_tensor_slices((features, labels))
# Suffle
dataset  = dataset.shuffle(buffer_size=1e5)
# Specify batch size
dataset  = dataset.batch(128)
# Create an iterator
iterator = dataset.make_one_shot_iterator()
# Get next batch
next_batch = iterator.get_next()

I should be able to get the batch size (either from dataset itself or from an iterator created from it, i.e. both iterator and next_batch). Maybe someone wants to know how many batches there are in the dataset or its iterators. Or how many batches have been called and how many remain in the iterator? One might also want to get particular elements, or even the entire dataset at once.

I wasn't able to find anything on the tensorflow documentation. Is this possible? If not, does anyone know if this has been requested as an issue on tensorflow GitHub?

解决方案

In TF2 at least, the type of a dataset is statically defined and accessible via tf.data.Dataset.element_spec.

This is a somewhat complex return type because it has tuple nesting that matches your Dataset.

>>> tf.data.Dataset.from_tensor_slices([[[1]],[[2]]]).element_spec.shape
TensorShape([1, 1])

If your data is organized as a tuple[image, label], then you'd get a tuple of TensorSpecs. You can index into it if you are certain of the nesting of the return type. E.g.

>>> image = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)
>>> label = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)
>>> train = tf.data.Dataset.zip((image, label))
>>> train.element_spec[0].shape[0]
2

这篇关于如何从张量流数据集中取回批量大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆