Tensorflow找不到libcuda.so(CUDA 7.5) [英] Tensorflow can't find libcuda.so (CUDA 7.5)
问题描述
我已经在anaconda env中安装了CUDA 7.5工具包和Tensorflow。 CUDA驱动程序也已安装。包含 so
库的文件夹位于 LD_LIBRARY_PATH
中。导入tensorflow时出现以下错误:
I've installed CUDA 7.5 toolkit, and Tensorflow inside anaconda env. The CUDA driver is also installed. The folder containing the so
libraries is in LD_LIBRARY_PATH
. When I import tensorflow I get the following error:
无法打开CUDA库libcuda.so。 LD_LIBRARY_PATH:
/usr/local/cuda-7.5/lib64
Couldn't open CUDA library libcuda.so. LD_LIBRARY_PATH: /usr/local/cuda-7.5/lib64
在此文件夹中,有一个名为<$的文件c $ c> libcudart.so (实际上是指向 libcudart.so.7.5
的符号链接)。因此(只是猜测),我创建了一个指向 libcudart.so
的符号链接,名为 libcuda.so
。现在Tensorflow找到了该库,但是当我调用 tensorflow.Session()
时,出现以下错误:
In this folder, there exist a file named libcudart.so
(which is actually a symbolic link to libcudart.so.7.5
). So (just as a guess) I created a symbolic link to libcudart.so
named libcuda.so
. Now the library is found by Tensorflow, but as soon as I call tensorflow.Session()
I get the following error:
F tensorflow / stream_executor / cuda / cuda_driver.cc:107]检查失败:f
!= nullptr找不到cuInitin libcuda DSO; dlerror:
/usr/local/cuda-7.5/lib64/libcudart.so.7.5:未定义符号:cuInit
F tensorflow/stream_executor/cuda/cuda_driver.cc:107] Check failed: f != nullptr could not find cuInitin libcuda DSO; dlerror: /usr/local/cuda-7.5/lib64/libcudart.so.7.5: undefined symbol: cuInit
有任何想法吗?
推荐答案
为便于将来参考,以下是我发现的内容以及为解决此问题所做的工作。
系统是Ubuntu 14.04 64位。我尝试安装的NVIDIA驱动程序版本为367.35。安装最终导致出现错误,并显示以下消息:
For future reference, here is what I found out and what I did to solve this problem. The system is Ubuntu 14.04 64 bit. The NVIDIA driver version that I was trying to install was 367.35. The installation resulted in an error towards the end, with message:
错误:无法加载内核模块'nvidia-drm'
ERROR: Unable to load the kernel module 'nvidia-drm'
但是CUDA示例的编译和运行没有问题,因此驱动程序至少已部分正确安装。但是,当我使用以下命令检查版本时:
However the CUDA samples compiled and run with no problem, so the driver was at least partially installed correctly. However, when I checked the version using:
cat / proc / driver / nvidia / version
cat /proc/driver/nvidia/version
我得到的版本是不同的(我记不清了,但有352个子版本)。
所以我发现最好删除驱动程序的所有痕迹并重新安装。我按照此处接受的答案中的说明进行操作: https:// askubuntu。 com / questions / 206283 / how-can-i-uninstall-a-nvidia-driver-completely (确保可以在引导中加载nouveau驱动程序的命令除外)。
The version I got was different (I don't remember exactly but some 352 sub-version). So I figured out I better remove all traces of the driver and re-install. I followed the instructions in the accepted answer here: https://askubuntu.com/questions/206283/how-can-i-uninstall-a-nvidia-driver-completely, except for the command that makes sure nouveau driver will be loaded in boot.
我终于重新安装了最新的NVIDIA驱动程序(367.35)。安装完成没有错误,并且Tensorflow能够加载所有库。
I finally reinstalled the most up-to-date NVIDIA driver (367.35). The installation finished with no errors and Tensorflow was able to load all libraries.
我认为问题始于在我之前使用安装方法的人使用 apt-get
来安装驱动程序,而不是 run
脚本。但是不确定。
I think the problem began when someone who worked on the installation before me used apt-get
to install the driver, and not a run
script. Not sure however.
PS 在安装过程中会出现 警告:
PS during installation there is a warning:
发行版提供的预安装脚本失败!您确定要继续
吗?
The distribution-provided pre-install script failed! Are you sure you want to continue?
查看日志,我可以找到此预安装脚本及其内容很简单:
Looking at the logs I could locate this pre-install script, and its content is simply:
# Trigger an error exit status to prevent the installer from overwriting
# Ubuntu's nvidia packages.
exit 1
因此尽管出现此警告,似乎也可以安装。
so it seems ok to install despite this warning.
这篇关于Tensorflow找不到libcuda.so(CUDA 7.5)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!