Tensorflow找不到libcuda.so(CUDA 7.5) [英] Tensorflow can't find libcuda.so (CUDA 7.5)

查看:481
本文介绍了Tensorflow找不到libcuda.so(CUDA 7.5)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在anaconda env中安装了CUDA 7.5工具包和Tensorflow。 CUDA驱动程序也已安装。包含 so 库的文件夹位于 LD_LIBRARY_PATH 中。导入tensorflow时出现以下错误:

I've installed CUDA 7.5 toolkit, and Tensorflow inside anaconda env. The CUDA driver is also installed. The folder containing the so libraries is in LD_LIBRARY_PATH. When I import tensorflow I get the following error:


无法打开CUDA库libcuda.so。 LD_LIBRARY_PATH:
/usr/local/cuda-7.5/lib64

Couldn't open CUDA library libcuda.so. LD_LIBRARY_PATH: /usr/local/cuda-7.5/lib64

在此文件夹中,有一个名为<$的文件c $ c> libcudart.so (实际上是指向 libcudart.so.7.5 的符号链接)。因此(只是猜测),我创建了一个指向 libcudart.so 的符号链接,名为 libcuda.so 。现在Tensorflow找到了该库,但是当我调用 tensorflow.Session()时,出现以下错误:

In this folder, there exist a file named libcudart.so (which is actually a symbolic link to libcudart.so.7.5). So (just as a guess) I created a symbolic link to libcudart.so named libcuda.so. Now the library is found by Tensorflow, but as soon as I call tensorflow.Session() I get the following error:


F tensorflow / stream_executor / cuda / cuda_driver.cc:107]检查失败:f
!= nullptr找不到cuInitin libcuda DSO; dlerror:
/usr/local/cuda-7.5/lib64/libcudart.so.7.5:未定义符号:cuInit

F tensorflow/stream_executor/cuda/cuda_driver.cc:107] Check failed: f != nullptr could not find cuInitin libcuda DSO; dlerror: /usr/local/cuda-7.5/lib64/libcudart.so.7.5: undefined symbol: cuInit

有任何想法吗?

推荐答案

为便于将来参考,以下是我发现的内容以及为解决此问题所做的工作。
系统是Ubuntu 14.04 64位。我尝试安装的NVIDIA驱动程序版本为367.35。安装最终导致出现错误,并显示以下消息:

For future reference, here is what I found out and what I did to solve this problem. The system is Ubuntu 14.04 64 bit. The NVIDIA driver version that I was trying to install was 367.35. The installation resulted in an error towards the end, with message:


错误:无法加载内核模块'nvidia-drm'

ERROR: Unable to load the kernel module 'nvidia-drm'

但是CUDA示例的编译和运行没有问题,因此驱动程序至少已部分正确安装。但是,当我使用以下命令检查版本时:

However the CUDA samples compiled and run with no problem, so the driver was at least partially installed correctly. However, when I checked the version using:


cat / proc / driver / nvidia / version

cat /proc/driver/nvidia/version

我得到的版本是不同的(我记不清了,但有352个子版本)。
所以我发现最好删除驱动程序的所有痕迹并重新安装。我按照此处接受的答案中的说明进行操作: https:// askubuntu。 com / questions / 206283 / how-can-i-uninstall-a-nvidia-driver-completely (确保可以在引导中加载nouveau驱动程序的命令除外)。

The version I got was different (I don't remember exactly but some 352 sub-version). So I figured out I better remove all traces of the driver and re-install. I followed the instructions in the accepted answer here: https://askubuntu.com/questions/206283/how-can-i-uninstall-a-nvidia-driver-completely, except for the command that makes sure nouveau driver will be loaded in boot.

我终于重新安装了最新的NVIDIA驱动程序(367.35)。安装完成没有错误,并且Tensorflow能够加载所有库。

I finally reinstalled the most up-to-date NVIDIA driver (367.35). The installation finished with no errors and Tensorflow was able to load all libraries.

我认为问题始于在我之前使用安装方法的人使用 apt-get 来安装驱动程序,而不是 run 脚本。但是不确定。

I think the problem began when someone who worked on the installation before me used apt-get to install the driver, and not a run script. Not sure however.

PS 在安装过程中会出现 警告:

PS during installation there is a warning:


发行版提供的预安装脚本失败!您确定要继续
吗?

The distribution-provided pre-install script failed! Are you sure you want to continue?

查看日志,我可以找到此预安装脚本及其内容很简单:

Looking at the logs I could locate this pre-install script, and its content is simply:

# Trigger an error exit status to prevent the installer from overwriting
# Ubuntu's nvidia packages.
exit 1

因此尽管出现此警告,似乎也可以安装。

so it seems ok to install despite this warning.

这篇关于Tensorflow找不到libcuda.so(CUDA 7.5)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆