如何选择在哪个 GPU 上运行作业? [英] How do I select which GPU to run a job on?

查看:20
本文介绍了如何选择在哪个 GPU 上运行作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在多 GPU 计算机中,如何指定 CUDA 作业应在哪个 GPU 上运行?

In a multi-GPU computer, how do I designate which GPU a CUDA job should run on?

例如,在安装 CUDA 时,我选择安装 NVIDIA_CUDA-<#.#>_Samples 然后运行多个 nbody 模拟实例,但是它们都在一个 GPU 0 上运行;GPU 1 完全空闲(使用 watch -n 1 nvidia-dmi 进行监控).检查 CUDA_VISIBLE_DEVICES 使用

As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several instances of the nbody simulation, but they all ran on one GPU 0; GPU 1 was completely idle (monitored using watch -n 1 nvidia-dmi). Checking CUDA_VISIBLE_DEVICES using

echo $CUDA_VISIBLE_DEVICES

我发现这没有设置.我尝试使用

I found this was not set. I tried setting it using

CUDA_VISIBLE_DEVICES=1

然后再次运行 nbody 但它也进入了 GPU 0.

then running nbody again but it also went to GPU 0.

我看了相关的问题,如何选择指定GPU 运行 CUDA 程序?,但 deviceQuery 命令不在 CUDA 8.0 bin 目录下.除了 $CUDA_VISIBLE_DEVICES$ 之外,我看到其他帖子提到了环境变量 $CUDA_DEVICES 但这些都没有设置,我没有找到有关如何使用它的信息.

I looked at the related question, how to choose designated GPU to run CUDA program?, but deviceQuery command is not in the CUDA 8.0 bin directory. In addition to $CUDA_VISIBLE_DEVICES$, I saw other posts refer to the environment variable $CUDA_DEVICES but these were not set and I did not find information on how to use it.

虽然与我的问题没有直接关系,但使用 nbody -device=1 我能够让应用程序在 GPU 1 上运行,但使用 nbody -numdevices=2没有在 GPU 0 和 1 上运行.

While not directly related to my question, using nbody -device=1 I was able to get the application to run on GPU 1 but using nbody -numdevices=2 did not run on both GPU 0 and 1.

我正在使用 bash shell、CentOS 6.8、CUDA 8.0、2 个 GTX 1080 GPU 和 NVIDIA 驱动程序 367.44 运行的系统上对此进行测试.

I am testing this on a system running using the bash shell, on CentOS 6.8, with CUDA 8.0, 2 GTX 1080 GPUs, and NVIDIA driver 367.44.

我知道在使用 CUDA 编写代码时,您可以管理和控制要使用的 CUDA 资源,但是在运行已编译的 CUDA 可执行文件时,我该如何从命令行管理它?

I know when writing using CUDA you can manage and control which CUDA resources to use but how would I manage this from the command line when running a compiled CUDA executable?

推荐答案

问题是由于没有正确设置shell中的CUDA_VISIBLE_DEVICES变量造成的.

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly.

例如,要指定 CUDA 设备 1,您可以使用

To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using

export CUDA_VISIBLE_DEVICES=1

CUDA_VISIBLE_DEVICES=1 ./cuda_executable

前者为当前 shell 的生命周期设置变量,后者仅为该特定可执行调用的生命周期设置.

The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation.

如果要指定多个设备,请使用

If you want to specify more than one device, use

export CUDA_VISIBLE_DEVICES=0,1

CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable

这篇关于如何选择在哪个 GPU 上运行作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆