如何使用 Tensorflow 的数据集 API 将数据移动到多个 GPU 塔 [英] How does one move data to multiple GPU towers using Tensorflow's Dataset API

查看：26 发布时间：2021/9/5 18:43:37 tensorflow tensorflow-gpu tensorflow-datasets

本文介绍了如何使用 Tensorflow 的数据集 API 将数据移动到多个 GPU 塔的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们正在 Tensorflow 上运行多 GPU 作业，并评估从基于队列的模型(使用 string_input_producer 接口)到新的 Tensorflow Dataset API 的迁移.后者似乎提供了一种更简单的方法来同时在训练和验证之间切换.

We are running multi GPU jobs on Tensorflow and evaluating a migration from the queue based model (using the string_input_producer interface) to the new Tensorflow Dataset API. The latter appears to offer an easier way to switch between Train and Validation, concurrently.

下面的一段代码展示了我们是如何做到这一点的.

A snippet of code below shows how we are doing this.

    train_dataset, train_iterator = get_dataset(train_files, batch_size, epochs)
    val_dataset, val_iterator = get_dataset(val_files, batch_size, epochs)


    is_validating = tf.placeholder(dtype=bool, shape=())
    next_batch = tf.cond(is_validating,
               lambda: val_iterator.get_next(),
               lambda: train_iterator.get_next())

    validation_tower = self.num_gpus - 1
    tower_grads = []

    for i in range(self.num_gpus):
        with tf.variable_scope(tf.get_variable_scope(),reuse=(i > 0)):
            with tf.device('/gpu:%d' % i), tf.name_scope('%s_%d' % ('gpu_', i)) as scope:
                if i == validation_tower:
                    images, labels = next_batch
                    # Loss funcs snipped out
                else:
                    images, labels = next_batch
                    # Loss funcs snipped out

get_dataset 函数构建数据集、设置映射函数和批量大小.它还构建了一个迭代器，但不初始化它.迭代器的初始化发生在会话开始之前.

The get_dataset function builds a dataset, sets a map function and a batch size. It also builds an iterator, but doesn't initialize it. Initialization of the iterator occurs before the session starts.

在会话运行时提供 is_validating 布尔值，每隔几步我们通过 feed_dict 将 is_validating 传递为 True 以使用验证数据集

The is_validating boolean is supplied while the session is running, and every few steps we pass is_validating as True via a feed_dict to use the validation dataset

我的问题是:

假设我有 8 个 GPU，所以我们在 7 个 GPU 上运行训练.对于这 7 个 GPU 中的每一个，迭代器是否从同一点前进，从而为所有 7 个 GPU 提供相同的数据?

Lets say I have 8 gpus, so we run training on 7 GPUs. Does the Iterator advance from the same point for each of these 7 GPUs, hence supplying all 7 GPU's with the same data?

如何使用 Tensorflow 的数据集 API 将数据移动到多个 GPU 塔 [英] How does one move data to multiple GPU towers using Tensorflow's Dataset API

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用 Tensorflow 的数据集 API 将数据移动到多个 GPU 塔 [英] How does one move data to multiple GPU towers using Tensorflow&#39;s Dataset API

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何使用 Tensorflow 的数据集 API 将数据移动到多个 GPU 塔 [英] How does one move data to multiple GPU towers using Tensorflow's Dataset API

登录关闭