Pytorch 说 CUDA 不可用 [英] Pytorch says that CUDA is not available

查看:563
本文介绍了Pytorch 说 CUDA 不可用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在我拥有的笔记本电脑上运行 Pytorch.这是一个较旧的模型,但它确实有一个 Nvidia 显卡.我意识到它可能不足以进行真正的机器学习,但我正在尝试这样做,以便我可以学习安装 CUDA 的过程.

I'm trying to run Pytorch on a laptop that I have. It's an older model but it does have an Nvidia graphics card. I realize it is probably not going to be sufficient for real machine learning but I am trying to do it so I can learn the process of getting CUDA installed.

我已按照安装指南 对于 Ubuntu 18.04(我的特定发行版是 Xubuntu).

I have followed the steps on the installation guide for Ubuntu 18.04 (my specific distribution is Xubuntu).

我的显卡是 GeForce 845M,通过 lspci | 验证grep nvidia:

My graphics card is a GeForce 845M, verified by lspci | grep nvidia:

01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce 845M] (rev a2)
01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)

我也安装了 gcc 7.5,通过 gcc --version

I also have gcc 7.5 installed, verified by gcc --version

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

并且我安装了正确的标头,通过尝试使用 sudo apt-get install linux-headers-$(uname -r) 安装它们来验证:

And I have the correct headers installed, verified by trying to install them with sudo apt-get install linux-headers-$(uname -r):

Reading package lists... Done
Building dependency tree       
Reading state information... Done
linux-headers-4.15.0-106-generic is already the newest version (4.15.0-106.107).

然后我按照安装说明使用本地 .deb 版本 10.1.

I then followed the installation instructions using a local .deb for version 10.1.

Npw,当我运行 nvidia-smi 时,我得到:

Npw, when I run nvidia-smi, I get:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 845M        On   | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P0    N/A /  N/A |     88MiB /  2004MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       982      G   /usr/lib/xorg/Xorg                            87MiB |
+-----------------------------------------------------------------------------+

然后我运行 nvcc -V 我得到:

and I run nvcc -V I get:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

然后我执行了 section 6.1,因此,echo $PATH 看起来像这样:

I then performed the post-installation instructions from section 6.1, and so as a result, echo $PATH looks like this:

/home/isaek/anaconda3/envs/stylegan2_pytorch/bin:/home/isaek/anaconda3/bin:/home/isaek/anaconda3/condabin:/usr/local/cuda-10.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

echo $LD_LIBRARY_PATH 看起来像这样:

/usr/local/cuda-10.1/lib64

和我的 /etc/udev/rules.d/40-vm-hotadd.rules 文件看起来像这样:

and my /etc/udev/rules.d/40-vm-hotadd.rules file looks like this:

# On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear
ATTR{[dmi/id]sys_vendor}=="Microsoft Corporation", ATTR{[dmi/id]product_name}=="Virtual Machine", GOTO="vm_hotadd_apply"
ATTR{[dmi/id]sys_vendor}=="Xen", GOTO="vm_hotadd_apply"
GOTO="vm_hotadd_end"

LABEL="vm_hotadd_apply"

# Memory hotadd request

# CPU hotadd request
SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu[0-9]*", TEST=="online", ATTR{online}="1"

LABEL="vm_hotadd_end"

在这一切之后,我什至编译并运行了示例../deviceQuery 返回:

After all of this, I even compiled and ran the samples. ./deviceQuery returns:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce 845M"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2004 MBytes (2101870592 bytes)
  ( 4) Multiprocessors, (128) CUDA Cores/MP:     512 CUDA Cores
  GPU Max Clock rate:                            863 MHz (0.86 GHz)
  Memory Clock rate:                             1001 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

./bandwidthTest 返回:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce 845M
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(GB/s)
   32000000         11.7

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(GB/s)
   32000000         11.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(GB/s)
   32000000         14.5

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

但毕竟这一切,这个 Python 片段(在安装了所有依赖项的 conda 环境中):

But after all of this, this Python snippet (in a conda environment with all dependencies installed):

import torch
torch.cuda.is_available()

返回 False

有人知道如何解决这个问题吗?我尝试将 /usr/local/cuda-10.1/bin 添加到 etc/environment 像这样:

Does anybody have any idea about how to resolve this? I've tried to add /usr/local/cuda-10.1/bin to etc/environment like this:

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"
PATH=$PATH:/usr/local/cuda-10.1/bin

并重新启动终端,但这并没有解决.我真的不知道还能尝试什么.

And restarting the terminal, but that didn't fix it. I really don't know what else to try.

Collecting environment information...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect

Python version: 3.6
Is CUDA available: No
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce 845M
Nvidia driver version: 418.87.00
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.18.5
[pip] pytorch-ranger==0.1.1
[pip] stylegan2-pytorch==0.12.0
[pip] torch==1.5.0
[pip] torch-optimizer==0.0.1a12
[pip] torchvision==0.6.0
[pip] vector-quantize-pytorch==0.0.2
[conda] numpy                     1.18.5                   pypi_0    pypi
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] stylegan2-pytorch         0.12.0                   pypi_0    pypi
[conda] torch                     1.5.0                    pypi_0    pypi
[conda] torch-optimizer           0.0.1a12                 pypi_0    pypi
[conda] torchvision               0.6.0                    pypi_0    pypi
[conda] vector-quantize-pytorch   0.0.2                    pypi_0    pypi

推荐答案

PyTorch 不使用系统的 CUDA 库.当您使用 pipconda 使用预编译的二进制文件安装 PyTorch 时,它会附带本地安装的 CUDA 库的指定版本的副本.事实上,您甚至不需要在系统上安装 CUDA 即可使用支持 CUDA 的 PyTorch.

PyTorch doesn't use the system's CUDA library. When you install PyTorch using the precompiled binaries using either pip or conda it is shipped with a copy of the specified version of the CUDA library which is installed locally. In fact, you don't even need to install CUDA on your system to use PyTorch with CUDA support.

有两种情况可能导致您的问题.

There are two scenarios which could have caused your issue.

  1. 您安装了仅 CPU 版本的 PyTorch.在这种情况下,PyTorch 没有使用 CUDA 支持进行编译,因此它不支持 CUDA.

  1. You installed the CPU only version of PyTorch. In this case PyTorch wasn't compiled with CUDA support so it didn't support CUDA.

您安装了 PyTorch 的 CUDA 10.2 版本.在这种情况下,问题是您的显卡当前使用 418.87 驱动程序,该驱动程序最多仅支持 CUDA 10.1.在这种情况下,两个潜在的修复方法是安装更新的驱动程序(根据 表 2) 或安装针对 CUDA 10.1 编译的 PyTorch 版本.

You installed the CUDA 10.2 version of PyTorch. In this case the problem is that your graphics card currently uses the 418.87 drivers, which only support up to CUDA 10.1. The two potential fixes in this case would be to either install updated drivers (version >= 440.33 according to Table 2) or to install a version of PyTorch compiled against CUDA 10.1.

要确定安装 PyTorch 时要使用的适当命令,您可以使用安装 PyTorch"中的方便小部件.pytorch.org 部分.只需选择合适的操作系统、包管理器和 CUDA 版本,然后运行推荐的命令即可.

To determine the appropriate command to use when installing PyTorch you can use the handy widget in the "Install PyTorch" section at pytorch.org. Just select the appropriate operating system, package manager, and CUDA version then run the recommended command.

在您的情况下,一种解决方案是使用

In your case one solution was to use

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

明确指定要安装针对 CUDA 10.1 编译的 PyTorch 版本的 conda.

which explicitly specifies to conda that you want to install the version of PyTorch compiled against CUDA 10.1.

有关 PyTorch CUDA 与相关驱动程序和硬件的兼容性的更多信息,请参阅此答案.

For more information about PyTorch CUDA compatibility with respect drivers and hardware see this answer.

编辑 在您添加 collect_env 的输出后,我们可以看到问题在于您安装了 PyTorch 的 CUDA 10.2 版本.基于此,另一种解决方案是更新图形驱动程序,如第 2 项和链接答案中所述.

Edit After you added the output of collect_env we can see that the problem was that you had the CUDA 10.2 version of PyTorch installed. Based on that an alternative solution would have been to update the graphics driver as elaborated in item 2 and the linked answer.

这篇关于Pytorch 说 CUDA 不可用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆