如何从张量流数据集中取回批量大小? [英] How to get batch size back from a tensorflow dataset?
问题描述
推荐使用tensorflow数据集作为输入管道,可以设置如下:
#指定数据集数据集 = tf.data.Dataset.from_tensor_slices((特征,标签))# 舒芙蕾数据集 = dataset.shuffle(buffer_size=1e5)# 指定批量大小数据集 = dataset.batch(128)# 创建迭代器迭代器 = dataset.make_one_shot_iterator()# 获取下一批next_batch = iterator.get_next()
我应该能够获得批量大小(从数据集本身或从它创建的迭代器,即 iterator
和 next_batch
).也许有人想知道数据集或其迭代器中有多少批次.或者已经调用了多少批次以及迭代器中剩余多少批次?人们可能还想一次性获取特定元素,甚至整个数据集.
我在 tensorflow 文档中找不到任何内容.这可能吗?如果没有,有谁知道这是否已在 tensorflow GitHub 上作为问题被请求?
至少在 TF2 中,数据集的类型是静态定义的,可以通过 tf.data.Dataset.element_spec
访问.>
这是一个有点复杂的返回类型,因为它具有与您的数据集匹配的元组嵌套.
>>>tf.data.Dataset.from_tensor_slices([[[1]],[[2]]]).element_spec.shapeTensorShape([1, 1])
如果你的数据被组织成一个元组[图像,标签],那么你会得到一个 TensorSpecs 元组.如果您确定返回类型的嵌套,则可以对其进行索引.例如
>>>图像 = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)>>>label = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)>>>train = tf.data.Dataset.zip((图像,标签))>>>train.element_spec[0].shape[0]2
It is recommended to use tensorflow dataset as the input pipeline which can be set up as follows:
# Specify dataset
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
# Suffle
dataset = dataset.shuffle(buffer_size=1e5)
# Specify batch size
dataset = dataset.batch(128)
# Create an iterator
iterator = dataset.make_one_shot_iterator()
# Get next batch
next_batch = iterator.get_next()
I should be able to get the batch size (either from dataset itself or from an iterator created from it, i.e. both iterator
and next_batch
). Maybe someone wants to know how many batches there are in the dataset or its iterators. Or how many batches have been called and how many remain in the iterator? One might also want to get particular elements, or even the entire dataset at once.
I wasn't able to find anything on the tensorflow documentation. Is this possible? If not, does anyone know if this has been requested as an issue on tensorflow GitHub?
In TF2 at least, the type of a dataset is statically defined and accessible via tf.data.Dataset.element_spec
.
This is a somewhat complex return type because it has tuple nesting that matches your Dataset.
>>> tf.data.Dataset.from_tensor_slices([[[1]],[[2]]]).element_spec.shape
TensorShape([1, 1])
If your data is organized as a tuple[image, label], then you'd get a tuple of TensorSpecs. You can index into it if you are certain of the nesting of the return type. E.g.
>>> image = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)
>>> label = tf.data.Dataset.from_tensor_slices([[1],[2],[3],[4]]).batch(2, drop_remainder=True)
>>> train = tf.data.Dataset.zip((image, label))
>>> train.element_spec[0].shape[0]
2
这篇关于如何从张量流数据集中取回批量大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!