TensorFlow matmul 会在 GPU 上并行运行吗?(或任何 GPU 操作.) [英] Will TensorFlow matmul run in parallel on GPU? (Or any GPU ops.)

查看:33
本文介绍了TensorFlow matmul 会在 GPU 上并行运行吗?(或任何 GPU 操作.)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设代码如下:

w1 = tf.get_variable(...)
w2 = tf.get_variable(...)
x = ...
y1 = tf.matmul(x, w1)
y2 = tf.matmul(x, w2)

session.run([y1, y2], ...)

TensorFlow 可能会并行运行操作(通过选项 inter_op_parallelism_threads 控制).

TensorFlow can potentially run ops in parallel (controlled via option inter_op_parallelism_threads).

我的问题:它真的会在这种情况下 (matmul) 做到这一点(并扩展到:在所有类型的 GPU 操作上)?我认为要做到这一点,需要创建多个 CUDA 流,对吗?它是否自动(以及如何)执行此操作?还是会在 GPU 上顺序执行?

My question: Will it actually do that for this case here (matmul) (and extending on that: on all kinds of GPU ops)? I think to do that, it would need to create multiple CUDA streams, right? Does it do that automatically (and how)? Or will they be executed sequentially on the GPU?

(请注意,对于这个简单的示例,您还可以通过连接 w1w2 来重写代码,然后执行单个 matmul,然后分手.但这不是我的问题.)

(Note that for this simple example, you could also rewrite the code by concatenating w1 and w2, then doing a single matmul, and then splitting afterwards. But that is not my question.)

(相关的是这个问题,基本上会回答它会始终为所有 GPU 操作使用单个 CUDA 流,因此这不会并行运行.不过,不确定这是否是最新的.)

(Related is this question, which basically would answer that it will always use a single CUDA stream for all GPU ops and thus this will not run in parallel. Not sure if this is up-to-date, though.)

推荐答案

来自官方FAQ:

运行时是否将图执行的部分并行化?

Does the runtime parallelize parts of graph execution?

TensorFlow 运行时在许多不同维度上并行化图形执行:

The TensorFlow runtime parallelizes graph execution across many different dimensions:

单个操作具有并行实现,在一个 CPU 中使用多个内核,或在一个 GPU 中使用多个线程.

The individual ops have parallel implementations, using multiple cores in a CPU, or multiple threads in a GPU.

TensorFlow 图中的独立节点可以在多个设备上并行运行,这使得使用多个 GPU 加速 CIFAR-10 训练成为可能.

Independent nodes in a TensorFlow graph can run in parallel on multiple devices, which makes it possible to speed up CIFAR-10 training using multiple GPUs.

Session API 允许多个并发步骤(即并行调用 tf.Session.run.如果单个步骤未使用您计算机中的所有资源,这将使运行时获得更高的吞吐量.

The Session API allows multiple concurrent steps (i.e. calls to tf.Session.run in parallel. This enables the runtime to get higher throughput, if a single step does not use all of the resources in your computer.

这篇关于TensorFlow matmul 会在 GPU 上并行运行吗?(或任何 GPU 操作.)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆