在张量流中进行模型并行化的正确方法是什么? [英] What is the right way to do model parallelism in tensorflow?

查看:95
本文介绍了在张量流中进行模型并行化的正确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个4GB GPU节点,所以我希望它们并行运行大型模型.我希望仅将层与适当的设备范围分割成几部分就可以启用模型并行性,但事实证明,这不会减少主节点(任务0)的内存占用. (10个节点配置-主节点:20g,关注者:2g,1个节点配置-主节点:6〜7g)

I have multiple 4GB GPU nodes so I want them to run huge model in parallel. I hope just splitting layers into several pieces with appropriate device scopes just enables model parallelism but it turns out that it doesn't reduce memory footprint for master node(task 0). (10 nodes configuration - master: 20g, followers:2g, 1 node configuration - master: 6~7g)

可疑的是,由于我没有为它们设置正确的设备范围,因此没有分布渐变.

Suspicious one is that gradients are not distributed because I didn't setup right device scope for them.

我的模型在github上可用.( https://github.com/nakosung/tensorflow-wavenet/tree/model_parallel_2 )

my model is available on github.(https://github.com/nakosung/tensorflow-wavenet/tree/model_parallel_2)

设备放置日志在此处: https://gist.github.com/nakosung/a38d4610fff09992f7e5569f19eefa57

device placement log is here: https://gist.github.com/nakosung/a38d4610fff09992f7e5569f19eefa57

推荐答案

因此,好消息是您使用colocate_gradients_with_ops,这意味着您可以确保在与放置操作相同的设备上计算梯度. ( https://github.com/nakosung/tensorflow-wavenet/blob/model_parallel_2/train.py#L242 )

So the good news is that you using colocate_gradients_with_ops, which means that you are ensuring that the gradients are being computed on the same device that the ops are placed. (https://github.com/nakosung/tensorflow-wavenet/blob/model_parallel_2/train.py#L242)

读取设备放置日志有点困难,因此我建议使用TensorBoard尝试使图形可视化.它具有一些选项,可以可视化节点在设备上的放置方式.

Reading the device placement log is a little difficult, so I would suggest using TensorBoard to try visualizing the graph. It has options to be able to visualize how nodes are being placed on devices.

其次,您可以尝试查看操作的大小如何映射到设备上-可能最大的层(最大的激活或最大的权重)可能不成比例地更多地放置在某些节点上.您可以尝试使用 https://github.com/tensorflow/tensorflow/blob/6b1d4fd8090d44d20fdadabf06f1a9b178c3d80c/tensorflow/python/tools/graph_metrics.py 来分析您的图形,以更好地了解图形中需要资源的位置.

Secondly, you can try to see how the sizes of your operations map down to devices -- it is possible that the largest layers (largest activations, or largest weights) may be disproportionately placed more on some nodes than others. You might try to use https://github.com/tensorflow/tensorflow/blob/6b1d4fd8090d44d20fdadabf06f1a9b178c3d80c/tensorflow/python/tools/graph_metrics.py to analyze your graph to get a better picture of where resources are required in your graph.

从长远来看,我们想尝试自动解决其中的一些放置问题,但是到目前为止,模型并行性需要一些注意才能精确放置.

Longer term we'd like to try to solve some of these placement problems automatically, but so far model parallelism requires a bit of care to place things precisely.

这篇关于在张量流中进行模型并行化的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆