如何使用 Tensorflow 中的其他示例转换扩展 tf.data.Dataset [英] How to expand tf.data.Dataset with additional example transformations in Tensorflow

查看:28
本文介绍了如何使用 Tensorflow 中的其他示例转换扩展 tf.data.Dataset的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过向其添加随机噪声来将我用于在 tensorflow 中动态训练神经网络的现有数据集的大小增加一倍.因此,当我完成后,我将拥有所有现有示例以及所有添加了噪音的示例.我还想在转换它们时交错这些,所以它们按以下顺序出现:没有噪音的示例 1,有噪音的示例 1,没有噪音的示例 2,有噪音的示例 2,等等.我正在努力完成这个使用数据集 API.我尝试使用 unbatch 来完成此操作:

I would like to double the size of an existing dataset I'm using to train a neural network in tensorflow on the fly by adding random noise to it. So when I'm done I'll have all the existing examples and also all the examples with noise added to them. I'd also like to interleave these as I transform them, so they come out in this order: example 1 without noise, example 1 with noise, example 2 without noise, example 2 with noise, etc. I'm struggling to accomplish this using the Dataset api. I've tried to use unbatch to accomplish this like so:

def generate_permutations(features, labels):
    return [
        [features, labels],
        [add_noise(features), labels]
    ]

dataset.map(generate_permutations).apply(tf.contrib.data.unbatch())

但我收到一个错误,说Shapes must be equal rank, but are 2 and 1.我猜 tensorflow 正试图从我返回的那个批次中生成一个张量,但是 featureslabels 是不同的形状,所以这是行不通的.我可能只需制作两个数据集并将它们连接在一起就可以做到这一点,但我担心这会导致非常偏斜的训练,我在一半的时期内进行了很好的训练,突然间所有的数据都对它进行了新的转换.一半.在输入 tensorflow 之前,如何在不将这些转换写入磁盘的情况下即时完成此操作?

but I get an error saying Shapes must be equal rank, but are 2 and 1. I'm guessing tensorflow is trying to make a tensor out of that batch I'm returning, but features and labels are different shapes, so that doesn't work. I could probably do this by just making two datasets and concating them together, but I'm worried that would result in very skewed training where I train nicely for half the epoch and suddenly all of the data has this new transformation to it for the second half. How can I accomplish this on the fly without writing these transformations to disk before feeding into tensorflow?

推荐答案

Dataset.flat_map() 转换是您需要的工具:它使您能够将单个输入元素映射到多个元素,然后展平结果.您的代码如下所示:

The Dataset.flat_map() transformation is the tool you need: it enables you to map a single input element into multiple elements, then flattens the result. Your code would look something like the following:

def generate_permutations(features, labels):
    regular_ds = tf.data.Dataset.from_tensors((features, labels))
    noisy_ds = tf.data.Dataset.from_tensors((add_noise(features), labels))
    return regular_ds.concatenate(noisy_ds)

dataset = dataset.flat_map(generate_permutations)

这篇关于如何使用 Tensorflow 中的其他示例转换扩展 tf.data.Dataset的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆