如何选择要在哪个GPU上运行作业? [英] How do I select which GPU to run a job on?

查看:165
本文介绍了如何选择要在哪个GPU上运行作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在多GPU计算机中,如何指定CUDA作业应在哪个GPU上运行?

In a multi-GPU computer, how do I designate which GPU a CUDA job should run on?

例如,在安装CUDA时,我选择安装 NVIDIA_CUDA-<#。#> _Samples 然后运行 nbody 模拟的几个实例,但是它们都在一个GPU 0上运行; GPU 1完全闲置(使用 watch -n 1 nvidia-dmi 进行监视)。使用

As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several instances of the nbody simulation, but they all ran on one GPU 0; GPU 1 was completely idle (monitored using watch -n 1 nvidia-dmi). Checking CUDA_VISIBLE_DEVICES using

echo $CUDA_VISIBLE_DEVICES

我发现未设置此项。我尝试使用

I found this was not set. I tried setting it using

CUDA_VISIBLE_DEVICES=1

进行设置

然后再次运行 nbody ,但它也转到了GPU0。

then running nbody again but it also went to GPU 0.

我查看了相关问题,如何选择指定的GPU来运行CUDA程序?,但是 deviceQuery 命令不在CUDA 8.0 bin目录中。除了 $ CUDA_VISIBLE_DEVICES $ 外,我还看到其他帖子引用了环境变量 $ CUDA_DEVICES ,但未设置我没有找到有关如何使用它的信息。

I looked at the related question, how to choose designated GPU to run CUDA program?, but deviceQuery command is not in the CUDA 8.0 bin directory. In addition to $CUDA_VISIBLE_DEVICES$, I saw other posts refer to the environment variable $CUDA_DEVICES but these were not set and I did not find information on how to use it.

虽然与我的问题没有直接关系,但使用 nbody -device = 1 我能够使应用程序在GPU 1上运行,但是使用 nbody -numdevices = 2 不能在GPU 0和1上运行。

While not directly related to my question, using nbody -device=1 I was able to get the application to run on GPU 1 but using nbody -numdevices=2 did not run on both GPU 0 and 1.

我正在使用bash shell在CentOS 6.8,CUDA 8.0、2个GTX 1080 GPU和NVIDIA驱动程序367.44上运行的系统上对此进行测试。

I am testing this on a system running using the bash shell, on CentOS 6.8, with CUDA 8.0, 2 GTX 1080 GPUs, and NVIDIA driver 367.44.

我知道在使用CUDA编写时可以管理和控制要使用的CUDA资源,但是在运行已编译的CUDA可执行文件时如何从命令行管理它?

I know when writing using CUDA you can manage and control which CUDA resources to use but how would I manage this from the command line when running a compiled CUDA executable?

推荐答案

问题是由于未在shell中正确设置 CUDA_VISIBLE_DEVICES 变量引起的。

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly.

要指定CUDA设备 1 将使用

To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using

export CUDA_VISIBLE_DEVICES=1

CUDA_VISIBLE_DEVICES=1 ./cuda_executable

前者在当前shell的生存期内设置变量,后者

The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation.

如果要指定多个设备,请使用

If you want to specify more than one device, use

export CUDA_VISIBLE_DEVICES=0,1

CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable

这篇关于如何选择要在哪个GPU上运行作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆