TensorFlow 中的异步计算 [英] Asynchronous computation in TensorFlow

查看：61 发布时间：2021/9/5 18:43:51 python tensorflow

本文介绍了TensorFlow 中的异步计算的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近我一直在玩 TensorFlow，我提到该框架无法使用我所有可用的计算资源.在卷积神经网络教程中，他们提到

Recently I've been toying with TensorFlow and I mentioned that the framework is not able to use all my available computational resources. In Convolutional Neural Networks tutorial they mention that

天真地采用模型参数的异步更新会导致次优的训练性能，因为单个模型副本可能会在模型参数的陈旧副本上进行训练.相反，采用完全同步的更新将与最慢的模型副本一样慢.

Naively employing asynchronous updates of model parameters leads to sub-optimal training performance because an individual model replica might be trained on a stale copy of the model parameters. Conversely, employing fully synchronous updates will be as slow as the slowest model replica.

虽然他们在教程和白皮书中都提到了它，但我没有真正找到一种在本地机器上进行异步并行计算的方法.甚至有可能吗?或者它是 TensorFlow 的分布式待发布版本的一部分.如果是，那怎么办?

Although they mention it in both in the tutorial and in a whitepaper I did not really find a way to do the asynchronous parallel computation on a local machine. Is it even possible? Or is it part of the distributed to-be-released version of TensorFlow. If it is, then how?

推荐答案

TensorFlow 的开源版本支持异步梯度下降，甚至无需修改您的图形.最简单的方法是并行执行多个并发步骤:

Asynchronous gradient descent is supported in the open-source release of TensorFlow, without even modifying your graph. The easiest way to do it is to execute multiple concurrent steps in parallel:

loss = ...

# Any of the optimizer classes can be used here.
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

sess = tf.Session()
sess.run(tf.initialize_all_variables())

def train_function():
  # TODO: Better termination condition, e.g. using a `max_steps` counter.
  while True:
    sess.run(train_op)

# Create multiple threads to run `train_function()` in parallel
train_threads = []
for _ in range(NUM_CONCURRENT_STEPS):
  train_threads.append(threading.Thread(target=train_function))

# Start the threads, and block on their completion.
for t in train_threads:
  t.start()
for t in train_threads:
  t.join()

此示例设置对 sess.run(train_op) 的 NUM_CONCURRENT_STEPS 调用.由于这些线程之间没有协调，它们是异步进行的.

This example sets up NUM_CONCURRENT_STEPS calls to sess.run(train_op). Since there is no coordination between these threads, they proceed asynchronously.

实现同步并行训练实际上更具挑战性(目前)，因为这需要额外的协调以确保所有副本读取相同版本的参数，并且它们的所有更新都变得可见同时.用于 CIFAR-10 训练的多 GPU 示例通过使用共享参数在训练图中制作塔"的多个副本来执行同步更新，并在应用更新之前显式平均塔之间的梯度.

It's actually more challenging to achieve synchronous parallel training (at present), because this requires additional coordination to ensure that all replicas read the same version of the parameters, and that all of their updates become visible at the same time. The multi-GPU example for CIFAR-10 training performs synchronous updates by making multiple copies of the "tower" in the training graph with shared parameters, and explicitly averaging the gradients across the towers before applying the update.

注意此答案中的代码将所有计算放在同一设备上，如果您的机器中有多个 GPU，这将不是最佳选择.如果您想使用所有 GPU，请按照多 GPU CIFAR-10 模型，并创建多个塔"，并将其操作固定到每个 GPU.代码大致如下:

N.B. The code in this answer places all computation on the same device, which will not be optimal if you have multiple GPUs in your machine. If you want to use all of your GPUs, follow the example of the multi-GPU CIFAR-10 model, and create multiple "towers" with their operations pinned to each GPU. The code would look roughly as follows:

train_ops = []

for i in range(NUM_GPUS):
  with tf.device("/gpu:%d" % i):
    # Define a tower on GPU `i`.
    loss = ...

    train_ops.append(tf.train.GradientDescentOptimizer(0.01).minimize(loss))

def train_function(train_op):
  # TODO: Better termination condition, e.g. using a `max_steps` counter.
  while True:
    sess.run(train_op)


# Create multiple threads to run `train_function()` in parallel
train_threads = []
for train_op in train_ops:
  train_threads.append(threading.Thread(target=train_function, args=(train_op,))


# Start the threads, and block on their completion.
for t in train_threads:
  t.start()
for t in train_threads:
  t.join()

请注意，您可能会发现使用 "变量很方便范围" 以促进塔之间的变量共享.

Note that you might find it convenient to use a "variable scope" to facilitate variable sharing between the towers.

这篇关于TensorFlow 中的异步计算的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

TensorFlow 中的异步计算 [英] Asynchronous computation in TensorFlow

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

TensorFlow 中的异步计算 [英] Asynchronous computation in TensorFlow

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭