Tensorflow:如何从 CPU tf.data.Dataset (from_generator) 在 GPU 上预取数据 [英] Tensorflow: How to prefetch data on the GPU from CPU tf.data.Dataset (from_generator)

查看:51
本文介绍了Tensorflow:如何从 CPU tf.data.Dataset (from_generator) 在 GPU 上预取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力解决以下问题.我正在使用 from_generator 方法创建一个 tf.data.Dataset .我在 CPU 上执行这些操作,因为我不想让我的 GPU 内存过载.

数据集由元组组成,元组包含固定长度的 tf.bool 一维掩码 (tf.Tensor) 和可变大小的 tf.float 二维矩阵 (tf.Tensor).损失函数是使用以下装饰器装饰的,所以我不认为可变大小是问题所在.

@tf.function(experimental_relax_shapes=True)

理想情况下,数据集保存在 CPU 上,然后预取到 GPU 上.

 def gen():对于 zip(mask_list, wmat_list) 中的 i, j:产量 i, j数据集 = tf.data.Dataset.from_generator(gen, output_types=(tf.bool, tf.float32))

主要的训练循环目前依赖 tf.identity 将数据移动到 gpu,效率低下.如下面 Tensorboard 的屏幕截图所示.大约 70% 的时间用于加载数据并将其移动到 GPU.

 for b, (mask, wmat) in enumerate(dataset):使用 tf.GradientTape() 作为磁带:掩码 = tf.identity(掩码)wmat = tf.identity(wmat)mean_error, loss = self.model.loss(mask, wmat)epoch_loss += loss.numpy()epoch_mean_error += mean_error.numpy()

我已经尝试过prefetch_to_device"功能.但是,它并没有将数据移动到 GPU 上.通过印刷验证,例如训练循环中的 mask.device.

 gpu_transform = tf.data.experimental.prefetch_to_device('/gpu')dataset.apply(gpu_transform)

对我来说它类似于这个错误:

我已进一步调查此问题并提交了错误报告.现在看来,建议的解决方法确实做了一些事情,但不确定它是否及时完全预取.https://github.com/tensorflow/tensorflow/issues/43905

I am struggling with the following. I am creating a tf.data.Dataset using the from_generator method. I perform these actions on CPU as I don't want to overload my GPU memory.

The dataset consists of tuples, which contain a tf.bool 1-D mask (tf.Tensor) with fixed length, and a tf.float 2-D matrix (tf.Tensor) with variable size. The loss function is decorated using the following decorator, so I would not assume the variable size is the problem.

@tf.function(experimental_relax_shapes=True)

Ideally, the dataset is kept on the CPU, but then prefetched onto the GPU.

        def gen():
            for i, j in zip(mask_list, wmat_list):
                yield i, j

        dataset = tf.data.Dataset.from_generator(gen, output_types=(tf.bool, tf.float32))

The main training loop currently relies on tf.identity to move the data to the gpu, which is inefficient. As shown on the screenshot from Tensorboard below. Roughly 70% of the time is spend loading the data and moving it to GPU.

                for b, (mask, wmat) in enumerate(dataset):
                    with tf.GradientTape() as tape:

                        mask = tf.identity(mask)
                        wmat = tf.identity(wmat)

                        mean_error, loss = self.model.loss(mask, wmat)
                        epoch_loss += loss.numpy()
                        epoch_mean_error += mean_error.numpy()

I have tried the "prefetch_to_device" function. However, it did not move the data onto the GPU. As verified by printing e.g. mask.device in the training loop.

        gpu_transform = tf.data.experimental.prefetch_to_device('/gpu')
        dataset.apply(gpu_transform)

For me it resembles to this bug: https://github.com/tensorflow/tensorflow/issues/30929 . However, it is marked as solved and is over a year old.

Running TF 2.3 using the official Docker image.

解决方案

I have found the solution to my own question.

The problem was that the tuples in the dataset did not contain tf.Tensors, but numpy arrays. Therefore, the pipeline was probably limited by the functionality of py_func().

The screenshot below show that the pipeline does not block on the CPU. However there is still a considerable MemCpy. The prefetch_to_device() still does not do anything. This is likely due to a known issue which should be fixed in TF2.4

https://github.com/tensorflow/tensorflow/issues/35563

The (unconfirmed) suggested workaround also did not work for me. (see edit)

with tf.device("/gpu:0"):
    ds = ds.prefetch(1)

EDIT:

I have further investigated this issue and filed a bug report. It does now seem that the suggested workaround does something, but not sure if it completely prefetches in time. https://github.com/tensorflow/tensorflow/issues/43905

这篇关于Tensorflow:如何从 CPU tf.data.Dataset (from_generator) 在 GPU 上预取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆