nvcc选择错误的libcudart库 [英] nvcc is picking wrong libcudart library

查看:1148
本文介绍了nvcc选择错误的libcudart库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题出现时,我尝试导入theano与gpu模式。当导入theano时,它试图编译一些代码,使其共享库并尝试加载它。
以下是生成so文件的命令。

This problem comes when, I try to import theano with gpu mode. While importing the theano, it tries to compile some code, make a shared library of it and tries to load it. Here is the command to make the so file.

nvcc -shared -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=mc72d035fdf91890f3b36710688069b2e,\
  -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker \
  -rpath,/home/jay/.theano/compiledir_Linux-4.8--ARCH-x86_64-with-arch-Arch-Linux--3.6.0-64/cuda_ndarray \
  -I/usr/lib/python3.6/site-packages/Theano-0.9.0b1-py3.6.egg/theano/sandbox/cuda \
  -I/usr/lib/python3.6/site-packages/numpy-1.13.0.dev0+72839c4-py3.6-linux-x86_64.egg/numpy/core/include \
  -I/usr/include/python3.6m -I/usr/lib/python3.6/site-packages/Theano-0.9.0b1-py3.6.egg/theano/gof \
  -L/usr/lib -o /home/jay/.theano/compiledir_Linux-4.8--ARCH-x86_64-with-arch-Arch-Linux--3.6.0-64/cuda_ndarray/cuda_ndarray.so \
   mod.cu -lcublas -lpython3.6m 

它编译成功但找不到正确的library for cudart ldconfig 似乎知道这个库的位置,这是正确的。

It compiles the successfully but can't find the correct library for cudart. The ldconfig seems to know the location of this library, which is correct.

$ ldconfig -p | grep libcuda
    libcudart.so.8.0 (libc6,x86-64) => /opt/cuda/lib64/libcudart.so.8.0
    libcudart.so (libc6,x86-64) => /opt/cuda/lib64/libcudart.so
    libcuda.so.1 (libc6,x86-64) => /usr/lib/libcuda.so.1
    libcuda.so (libc6,x86-64) => /usr/lib/libcuda.so

但是,当我检查库时, libcudart

But however when I inspect the library, it states the problem with libcudart.

$ ldd cuda_ndarray.so  | grep cuda
    libcublas.so.8.0 => /opt/cuda/lib64/libcublas.so.8.0 (0x00007f006dd1b000)
    libcudart.so.7.5 => not found

阅读elf标题

$ readelf -a cuda_ndarray.so | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libcublas.so.8.0]
 0x0000000000000001 (NEEDED)             Shared library: [libpython3.6m.so.1.0]
 0x0000000000000001 (NEEDED)             Shared library: [libcudart.so.7.5]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

出错了,因此它选择了错误的库, cudart7.5 而不是 cudart8 .0

What went wrong so that it is picking the wrong library, cudart7.5 instead of cudart8.0 ?

这里是我的 nvcc -V

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

和我的 libcudart.so 指向正确的库版本

and my libcudart.so is pointing to correct library version

$ ls -la | grep libcudart
lrwxrwxrwx  1 root root        16 Jan 10 06:10 libcudart.so -> libcudart.so.8.0
lrwxrwxrwx  1 root root        19 Jan 10 06:10 libcudart.so.8.0 -> libcudart.so.8.0.44
-rwxr-xr-x  1 root root    415432 Jan 10 06:10 libcudart.so.8.0.44
-rw-r--r--  1 root root    775162 Jan 10 06:10 libcudart_static.a

还有一个常见的问题是链接器如何解析实际文件输入的位置如 -lm -lcudart 或在编译时使用的任何缩写符号?

and one more general question how does the linker resolves the actual file location of input like -lm or -lcudart or any shorthand notation used while compiling ?

我尝试了cuda附带的两个示例程序,其中包括 libcudart

I tried two of the sample programs shipped with cuda, which includes the libcudart library

$ grep -rnw . -e 'lcudart'
./3_Imaging/cudaDecodeGL/Makefile:329:LIBRARIES += -lcudart -lnvcuvid
./0_Simple/simpleMPI/Makefile:284:LIBRARIES += -L$(CUDA_PATH)/lib$(LIBSIZE) -lcudart

这两个 simpleMPI 运行没有错误。

$ ./simpleMPI 
Running on 1 nodes
Average of square roots is: 0.667242
PASSED

另一个失败,出现早期错误

The other one failed with earlier error

$ ./cudaDecodeGL 
./cudaDecodeGL: error while loading shared libraries: libcudart.so.7.5: cannot open shared object file: No such file or directory


推荐答案

$ c> CUDA8 ,在旧的安装 CUDA7.5 之上,所以它将旧的cuda库移动到 cuda / lib64 / stubs 。删除该目录后,一切都按预期工作。

I installed the CUDA8, above the my old installation of CUDA7.5, so it moved old cuda libraries to cuda/lib64/stubs. After removing that directory everything worked as I expected.

这篇关于nvcc选择错误的libcudart库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆