如何将多个数据集合并为一个数据集? [英] How to combine multiple datasets into one dataset?

查看：134 发布时间：2021/9/5 19:57:50 python tensorflow tfrecord tf.keras eager-execution

本文介绍了如何将多个数据集合并为一个数据集?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有 3 个 tfrecord 文件，分别是 neg.tfrecord、pos1.tfrecord、pos2.tfrecord.

Suppose I have 3 tfrecord files, namely neg.tfrecord, pos1.tfrecord, pos2.tfrecord.

我用

dataset = tf.data.TFRecordDataset(tfrecord_file)

这段代码创建了 3 个 Dataset 对象.

this code creates 3 Dataset objects.

我的批量大小为 400，包括 200 个负数据、100 个 pos1 数据和 100 个 pos2 数据.如何获得所需的数据集?

My batch size is 400, including 200 neg data, 100 pos1 data, and 100 pos2 data. How can I get the desired dataset?

我将在 keras.fit() (Eager Execution) 中使用这个数据集对象.

I will use this dataset object in keras.fit() (Eager Execution).

我的 tensorflow 版本是 1.13.1.

My tensorflow's version is 1.13.1.

之前尝试获取每个数据集的迭代器，获取数据后手动concat，但是效率低下，GPU利用率不高.

Before, I tried to get the iterator for each dataset, and then manually concat after getting the data, but it was inefficient and the GPU utilization was not high.

推荐答案

可以使用interleave

filenames = [tfrecord_file1, tfrecord_file2]
dataset = (Dataset.from_tensor_slices(filenames).interleave(lambda x:TFRecordDataset(x)
dataset = dataset.map(parse_fn)
...

或者您甚至可以尝试并行交错.请参阅 https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset#interleave https://www.tensorflow.org/api_docs/python/tf/数据/实验/parallel_interleave

Or you can even try parallel interleave. See https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset#interleave https://www.tensorflow.org/api_docs/python/tf/data/experimental/parallel_interleave

这篇关于如何将多个数据集合并为一个数据集?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将多个数据集合并为一个数据集? [英] How to combine multiple datasets into one dataset?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将多个数据集合并为一个数据集? [英] How to combine multiple datasets into one dataset?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭