运行TF图时,当我们拥有多GPU时,哪些功能应使用CPU,GPU应使用哪些功能? [英] When running a TF graph which functions should use CPU and which functions should GPUs when we have Multi GPUs?

查看:117
本文介绍了运行TF图时,当我们拥有多GPU时,哪些功能应使用CPU,GPU应使用哪些功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们可以使用 tf.device('cpu或gpu')在Tensorflow图中分配不同的设备以执行不同的操作,目前尚不清楚如何划分它们. 另一件事是,如果我们使用默认值,如果有GPU,TF是否总是使用GPU?

We can assign different devices to do different operations in a Tensorflow Graph with tf.device('cpu or gpu') , It's not clear how to divide them . Other thing is if we use the default thing does TF always uses GPU if there's a GPU?

更新

当有两个GPU时如何划分运算. allow_soft_device_placement 是否可以自动执行此操作?

When have two GPUs how to divide operations . Can allow_soft_device_placement automatically do that ?

推荐答案

在TF中查找设备的操作如下:

Finding a device in TF works as follows:

  1. 检查是否完全有设备
  2. 合理性检查是否已将节点手动分配给设备消费者节点作为设备放置的提示
  3. 检查所有约束仅使用有效设备
  4. 使用默认设备如果没有选择了其他设备
  1. Check if there are devices at all
  2. sanity-check if nodes manually assigned to devices can really run on these devices
  3. prefer consumer nodes as hints for device placement
  4. check all constraints to just use valid devices
  5. use default device if no other devices is chosen

有一个可以理解的测试: https://github.com com/tensorflow/tensorflow/blob/3bc73f5e2ac437b1d9d559751af789c8c965a7f9/tensorflow/core/grappler/costs/virtual_placer_test.cc#L26-L54 归结为

There is an understandable test: https://github.com/tensorflow/tensorflow/blob/3bc73f5e2ac437b1d9d559751af789c8c965a7f9/tensorflow/core/grappler/costs/virtual_placer_test.cc#L26-L54 which boils down to

TEST(VirtualPlacerTest, LocalDevices) {
  // Create a virtual cluster with a local CPU and a local GPU
  std::unordered_map<string, DeviceProperties> devices;
  devices[".../cpu:0"] = cpu_device;
  devices[".../device:GPU:0"] = gpu_device;

  NodeDef node;
  node.set_op("Conv2D");
  // node.device() is empty, but GPU is default device if there is.
  EXPECT_EQ("GPU", placer.get_device(node).type());

  node.set_device("CPU");
  EXPECT_EQ("CPU", placer.get_device(node).type());

  node.set_device("GPU:0");
  EXPECT_EQ("GPU", placer.get_device(node).type());

}

我们从哪里获得默认设备?每个设备都是已注册优先:

Where do we get the default device? Each device is registered with a priority:

void DeviceFactory::Register(const string& device_type, DeviceFactory* factory,int priority)

评论这里很有趣,并且可以快速搜索给出:

The comment here is interesting and a quick search gives:

  • "CPU",ThreadPoolDeviceFactory,60
  • "CPU",GPUCompatibleCPUDeviceFactory,70
  • "GPU",GPUDeviceFactory,210

如果可能,TF放置器使用优先级更高的设备. 因此,只要有可用的GPU,就会为GPU 注册有Op的已注册内核,并且没有进行手动分配=>它使用了GPU.

The TF-placer uses devices with higher priority if possible. So whenever there is a GPU available and there is a registered kernel of the Op for the GPU and no manual assignment was made => it uses the GPU.

如果您关心效率,那么很难回答您的第二个问题(如何划分它们").在大多数情况下,无需将操作放在CPU上.

Your second question ("How to divide them") cannot be answered that easily if you care about efficiency. In most cases, there is no need to place the operation on the CPU.

经验法则:如果您不需要手动分配设备,请相信幕后的启发式方法.

As a rule of thumb: Trust the heuristics behind the scenes, if you feel no need to manually assign devices.

编辑:在对问题进行编辑后,以下是其他详细信息:

edit: As the questions was edited, here are the additional details:

soft_device_placement仅是应用于节点,该节点不能在预期的设备上运行.考虑在GPU上进行训练并在笔记本电脑上进行推理.由于每个操作内核仅注册到设备类型(CPU,GPU),它不能直接在不同GPU(它们是相同的设备类型)之间分配Op.

The soft_device_placement is only applied to nodes, that cannot run on the intended devices. Consider training on the GPU and inference on a laptop. As each Op-Kernel is only registered to a device type (CPU, GPU) it cannot distribute the Op between different GPUs directly (they are the same device type).

主要有两种方法进行分布式培训.而且您应该注意将变量放置在何处.我不确定您在寻找什么.但是TF允许您平衡展示位置所有GPU上.

There are mainly two ways to do distributed training. And you should care about where to place the variables. I am not sure what you are looking for. But TF allows you to balance the placement over all GPUs.

请允许我补充一点: 因为我只使用 TensorPack ,所以我知道它支持

Please allow me to add one further note: As I only use TensorPack, I know it supports distributed training in a very easy way as illustrated in the distributed ResNet example. So speaking it takes care about all this behind the scene.

这篇关于运行TF图时,当我们拥有多GPU时,哪些功能应使用CPU,GPU应使用哪些功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆