TensorFlow:分配给 cpu 而不是 gpu 的关键图形操作 [英] TensorFlow: critical graph operations assigned to cpu rather than gpu

查看:30
本文介绍了TensorFlow:分配给 cpu 而不是 gpu 的关键图形操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了一个 TensorFlow DNN 模型(2 个隐藏层,带有在 MNIST 上训练的 elu 激活函数)作为 Python 类,以便将 TF 调用包装在另一个库中,并使用它自己的优化例程和工具.

I have implemented a TensorFlow DNN model (2 hidden layers with elu activation functions trained on MNIST) as a Python class in order to wrap TF calls within another library with its own optimization routines and tools.

在 TeslaK20 上运行一些测试时,我注意到 GPU 的使用率为总容量的 4%.因此,我更仔细地观察了 log-device-placement 并认为所有关键操作,如 MatMulSumAddMean 等被分配给 CPU.

When running some tests on a TeslaK20 I noticed that the GPU was being used at 4% of the total capacity. Therefore I looked a bit more closely to the log-device-placement and figured that all critical operations like MatMul, Sum, Add, Mean etc were being assigned to the CPU.

首先想到的是因为我使用的是dtype=float64,所以我切换到了dtype=float32.虽然更多的操作被分配给了 GPU,但仍然有很多操作被分配给了 CPU,比如 Meangradient/Mean_grad/Prodgradient/Mean.

The first thing that came to mind was that it was because I was using dtype=float64, therefore I switched to dtype=float32. While a lot more operations were assigned to GPU, still a good number were assigned to the CPU, like Mean, gradient/Mean_grad/Prod, gradient/Mean.

所以这是我的第一个问题(我在最后链接了一个工作代码示例),

So here comes my first question (I'm linking a working code example at the end),

1) 为什么会这样?我编写了不同的 TF 模型,这些模型由简单的张量乘法和归约组成,只要我使用单精度,它们就可以完全在 GPU 上运行.

1) why would that be? I have written different TF models that consist of simple tensor multiplications and reductions and they run fully on GPU as long as I use single precision.

那么第二个问题来了,

2) 为什么 TF 会根据数据类型将图分配给不同的设备?我知道并非所有内核都是为 GPU 实现的,但我原以为 MatMul 之类的东西可以在 GPU 上以单精度和双精度运行.

2) why does TF assign the graph to different devices depending on the data type? I understand that not all kernels are implemented for GPU but I would have thought that things like MatMul could run on GPU for both single and double precision.

3) 模型包含在 Python 类中这一事实是否会产生影响?我不认为是这种情况,因为正如我所说,其他类似包装的模型没有发生这种情况,但更简单.

3) Could the fact that the model is wrapped within a Python class have an effect? I do not think this is the case because as I said, it did not happen for other models wrapped similarly but that were simpler.

4) 要在 GPU 上完全运行模型,我可以采取哪些步骤?

4) What sort of steps can I take to run the model fully on a GPU?

这是我从库中分离出来的完整代码示例

Here is a full working example of my code that I have isolated from my library

https://gist.github.com/smcantab/8ecb679150a327738102.

如果您运行它并查看输出,您将看到图表的不同部分如何分配给不同的设备.在示例的末尾查看 main() 中的 dtypedevice 如何随着类型和设备的变化而变化.请注意,如果我设置 allow_soft_placement=False,图形将无法初始化.

If you run it and look at the output you'll see how different parts of the graph have been assigned to different devices. To see how this changes with types and devices change dtype and device within main() at the end of the example. Note that if I set allow_soft_placement=False the graph fails to initialize.

任何建议都将不胜感激.

Any word of advice would be really appreciated.

推荐答案

正如 Yaroslav 所指出的:Mean,特别是尚未为 GPU 实现,但它现在可用,因此这些操作应该在具有最新 TensorFlow 的 GPU 上运行.(根据该链接上的 DEVICE_GPU 注册)

As Yaroslav noted: Mean, in particular, was not yet implemented for GPU, but it is now available so these operations should run on the GPU with the latest TensorFlow. (as per the DEVICE_GPU registration at that link)

在均值可用之前,它的状态是:

Prior to availability of mean, the status of this was:

(a) 你可以手动实现 mean,因为 reduce_sum 可在 GPU 上使用.

(a) You can implement mean by hand, because reduce_sum is available on GPU.

(b) 我已经重新 ping 某人,看看是否有一种简单的方法来添加 GPU 支持,但我们会看到.

(b) I've re-pinged someone to see if there's an easy way to add the GPU support, but we'll see.

在 GPU 上重新float64,三天前有人用支持 GPU 上的 float64 减少.目前正在审核和测试中.

Re float64 on GPU, someone opened an issue three days ago with a patch for supporting float64 reductions on GPU. Currently being reviewed and tested.

不,它是否包含在 Python 中并不重要 - 实际上只是关于是否定义了内核以在 GPU 上执行.在许多情况下,答案是为什么 Y 在 GPU 上不支持 X?"归结为是否需要 Y 在 GPU 上运行.float64 的答案更简单:float32 要快得多,所以在大多数情况下,人们尽可能让他们的模型在 float32 中工作,因为它提供了全方位的速度优势.

No, it doesn't matter if it's wrapped in Python - it's really just about whether a kernel has been defined for it to execute on the GPU or not. In many cases, the answer to "why is X supported on GPU by Y not?" comes down to whether or not there's been demand for Y to run on the GPU. The answer for float64 is simpler: float32 is a lot faster, so in most cases, people work to make their models work in float32 when possible because it gives all-around speed benefits.

这篇关于TensorFlow:分配给 cpu 而不是 gpu 的关键图形操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆