如何使用 Tensorflow 的数据集 API 将数据移动到多个 GPU 塔 [英] How does one move data to multiple GPU towers using Tensorflow's Dataset API

查看:26
本文介绍了如何使用 Tensorflow 的数据集 API 将数据移动到多个 GPU 塔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在 Tensorflow 上运行多 GPU 作业,并评估从基于队列的模型(使用 string_input_producer 接口)到新的 Tensorflow Dataset API 的迁移.后者似乎提供了一种更简单的方法来同时在训练和验证之间切换.

We are running multi GPU jobs on Tensorflow and evaluating a migration from the queue based model (using the string_input_producer interface) to the new Tensorflow Dataset API. The latter appears to offer an easier way to switch between Train and Validation, concurrently.

下面的一段代码展示了我们是如何做到这一点的.

A snippet of code below shows how we are doing this.

    train_dataset, train_iterator = get_dataset(train_files, batch_size, epochs)
    val_dataset, val_iterator = get_dataset(val_files, batch_size, epochs)


    is_validating = tf.placeholder(dtype=bool, shape=())
    next_batch = tf.cond(is_validating,
               lambda: val_iterator.get_next(),
               lambda: train_iterator.get_next())

    validation_tower = self.num_gpus - 1
    tower_grads = []

    for i in range(self.num_gpus):
        with tf.variable_scope(tf.get_variable_scope(),reuse=(i > 0)):
            with tf.device('/gpu:%d' % i), tf.name_scope('%s_%d' % ('gpu_', i)) as scope:
                if i == validation_tower:
                    images, labels = next_batch
                    # Loss funcs snipped out
                else:
                    images, labels = next_batch
                    # Loss funcs snipped out

get_dataset 函数构建数据集、设置映射函数和批量大小.它还构建了一个迭代器,但不初始化它.迭代器的初始化发生在会话开始之前.

The get_dataset function builds a dataset, sets a map function and a batch size. It also builds an iterator, but doesn't initialize it. Initialization of the iterator occurs before the session starts.

在会话运行时提供 is_validating 布尔值,每隔几步我们通过 feed_dict 将 is_validating 传递为 True 以使用验证数据集

The is_validating boolean is supplied while the session is running, and every few steps we pass is_validating as True via a feed_dict to use the validation dataset

我的问题是:

假设我有 8 个 GPU,所以我们在 7 个 GPU 上运行训练.对于这 7 个 GPU 中的每一个,迭代器是否从同一点前进,从而为所有 7 个 GPU 提供相同的数据?

Lets say I have 8 gpus, so we run training on 7 GPUs. Does the Iterator advance from the same point for each of these 7 GPUs, hence supplying all 7 GPU's with the same data?

推荐答案

目前主要有三个选项,它们具有不同的可用性和性能权衡:

At present there are three main options, which have different usability and performance trade-offs:

  1. Dataset.batch() 转换,创建一个包含所有 GPU 示例的大批量.然后使用 tf.split(..., self.num_gpus)Iterator.get_next() 的输出上为每个 GPU 创建子批次.这可能是最简单的方法,但它确实将拆分置于关键路径上.

  1. In the Dataset.batch() transform, create a single large batch containing examples for all of your GPUs. Then use tf.split(..., self.num_gpus) on the output of Iterator.get_next() to create sub-batches for each GPU. This is probably the easiest approach, but it does place the splitting on the critical path.

Dataset.batch() 转换中,创建一个大小适合单个 GPU 的小批量.然后为每个 GPU 调用一次 Iterator.get_next() 以获得多个不同的批次.(相比之下,在您当前的代码中,next_batch 的相同值被发送到每个 GPU,这可能不是您想要发生的.)

In the Dataset.batch() transform, create a mini-batch that is sized for a single GPU. Then call Iterator.get_next() once per GPU to get multiple different batches. (By contrast, in your current code, the same value of next_batch is sent to each GPU, which is probably not what you wanted to happen.)

创建多个迭代器,每个 GPU 一个.使用 Dataset.shard()<对数据进行分片/a> 在管道的早期(例如,如果您的数据集被分片,则在文件列表中).请注意,此方法会消耗主机上的更多资源,因此您可能需要调低任何缓冲区大小和/或并行度

Create multiple iterators, one per GPU. Shard the data using Dataset.shard() early in the pipeline (e.g. on the list of files if your dataset is sharded). Note that this approach will consume more resources on the host, so you may need to dial down any buffer sizes and/or degrees of parallelism

请注意,当前的 tf.data 管道仅在 CPU 上运行,高效管道的一个重要方面是在上一步仍在运行时将训练输入暂存到 GPU.请参阅 TensorFlow 基准代码示例 CNN 基准代码展示了如何有效地将数据暂存到 GPU.我们目前正致力于将此支持直接添加到 tf.data API.

Note that the current tf.data pipelines run on the CPU only, and an important aspect of an efficient pipeline is staging your training input to the GPU while the previous step is still running. See the TensorFlow CNN benchmarks for example code that shows how to stage data to GPUs efficiently. We are currently working on adding this support to the tf.data API directly.

这篇关于如何使用 Tensorflow 的数据集 API 将数据移动到多个 GPU 塔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆