将Nvidia运行时添加到Docker运行时 [英] Add nvidia runtime to docker runtimes

查看:109
本文介绍了将Nvidia运行时添加到Docker运行时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用tesla GPU在 GCP 上运行虚拟机器.并尝试部署基于 PyTorch 的应用程序,以通过GPU对其进行加速.

I’m running a virtual vachine on GCP with a tesla GPU. And try to deploy a PyTorch-based app to accelerate it with GPU.

我想让docker使用此GPU,可以从容器访问它.

我设法在主机上安装了所有驱动程序,并且该应用在那儿运行良好,但是当我尝试在docker(基于nvidia/cuda容器)中运行它时,pytorch失败:

I managed to install all drivers on host machine, and the app runs fine there, but when I try to run it in docker (based on nvidia/cuda container) pytorch fails:

File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 82, 
in _check_driver http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from

要获取有关可在容器中看到的nvidia驱动程序的一些信息,请运行以下命令:

To get some info about nvidia drivers visible to the container, I run this:

docker run --runtime = nvidia --rm nvidia/cuda nvidia-smi
但是它抱怨: docker:来自守护程序的错误响应:未知的运行时指定了nvidia.

在主机上的 nvidia-smi 输出看起来像这样:

On the host machine nvidia-smi output looks like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    35W / 250W |    873MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

如果我在docker中检查运行时,则只能得到 runc 运行时,而没有 nvidia ,如Internet上的示例所示.

If I check my runtimes in docker, I get only runc runtime, no nvidia as in examples around the internet.

$ docker info|grep -i runtime
 Runtimes: runc
 Default Runtime: runc

如何将这个 nvidia 运行时环境添加到我的Docker?

How can I add this nvidia runtime environment to my docker?

到目前为止,我发现的大多数帖子和问题都说类似我只是忘记重新启动docker守护进程,它起作用了"之类的内容,但这对我没有帮助.我该怎么做?

Most posts and questions I found so far say something like "I just forgot to restart my docker daemon, it worked", but it does not help me. Whot should I do?

我在github上检查了很多问题,并且#1 #2 和

I checked many issues on github, and #1, #2 and #3 StackOverflow questions - didn't help.

推荐答案

您需要的 nvidia 运行时是 nvidia-container-runtime .

按照此处的安装说明进行操作:
https://github.com/NVIDIA/nvidia-container-runtime#installation

Follow the installation instructions here:
https://github.com/NVIDIA/nvidia-container-runtime#installation

基本上,如果不存在,请先与软件包管理器一起安装:

Basically, you install it with your package manager first, if it's not present:

sudo apt-get install nvidia-container-runtime

然后将其添加到docker runtimes:
https://github.com/nvidia/nvidia-container-runtime#daemon-配置文件

Then you add it to docker runtimes:
https://github.com/nvidia/nvidia-container-runtime#daemon-configuration-file

此选项对我有用:

$ sudo tee /etc/docker/daemon.json <<EOF
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF
sudo pkill -SIGHUP dockerd

检查是否已添加:

$ docker info|grep -i runtime
 Runtimes: nvidia runc
 Default Runtime: runc

这篇关于将Nvidia运行时添加到Docker运行时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆