当直接从共享库调用CUDA内核时,为什么会出现分段错误? [英] Why do I get a segmentation fault when calling a CUDA kernel directly from a shared library?

查看:159
本文介绍了当直接从共享库调用CUDA内核时,为什么会出现分段错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试的方式(请参阅问题标题)它已编译,但是出现了段错误。是我,不支持从共享库直接进行内核调用的CMake或CUDA吗?解决方案不必使用CMake

The way I tried it (see question title) it compiled, but I get a segmentation fault. So is it me, CMake or CUDA which doesn't support direct kernel calls from a shared library? The solution doesn't have to be with CMake

更多详细信息:

我具有以下文件结构:

testKernel.hpp

testKernel.hpp

__global__ void kernelTest( float x );
void callKernel( float x );

testKernel.cu

testKernel.cu

#include "testKernel.hpp"

__global__ void kernelTest( float x ) {}
void callKernel( float x ) { kernelTest<<<1,1>>>( x ); }

useKernel.cu

useKernel.cu

#include <cstdio>
#include "testKernel.hpp"

int main( void )
{
    kernelTest<<<1,1>>>( 3.0f );
    //callKernel( 3.0f );
    printf("OK\n");
    return 0;
}

CMakeLists.txt

CMakeLists.txt

cmake_minimum_required(VERSION 3.3.1)
project(testKernelCall)
find_package(CUDA REQUIRED)

cuda_add_library( ${PROJECT_NAME} SHARED testKernel.cu testKernel.hpp )
target_link_libraries( ${PROJECT_NAME} ${CUDA_LIBRARIES} )

cuda_add_executable("useKernel" useKernel.cu)
target_link_libraries("useKernel" ${PROJECT_NAME})

编译并运行以下命令:

cmake .; make && ./useKernel

导致分段错误。 gdb的回溯是:

results in a segmentation fault. The backtrace with gdb is:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff75726bd in cudart::configData::addArgument(void const*, unsigned long, unsigned long) ()
   from ./libtestKernelCall.so
(gdb) bt
#0  0x00007ffff75726bd in cudart::configData::addArgument(void const*, unsigned long, unsigned long) ()
   from ./libtestKernelCall.so
#1  0x00007ffff7562eb7 in cudart::cudaApiSetupArgument(void const*, unsigned long, unsigned long) ()
   from ./libtestKernelCall.so
#2  0x00007ffff7591ca2 in cudaSetupArgument ()
   from ./libtestKernelCall.so
#3  0x00007ffff7556125 in __device_stub__Z10kernelTestf (__par0=3)
    at /tmp/tmpxft_00003900_00000000-4_testKernel.cudafe1.stub.c:7
#4  0x00007ffff755616c in kernelTest (__cuda_0=3) at ./testKernel.cu:2
#5  0x000000000040280e in main () at ./useKernel.cu:6

经过测试(这意味着段错误会在这些设置):

Tested with (which means the segfault appears in those setups):


  • 设置1

  • Setup 1


  • cmake 3.4.1

  • CUDA 7.0.27
  • g ++ 4.9.2

  • Debian

  • cmake 3.4.1
  • CUDA 7.0.27
  • g++ 4.9.2
  • Debian

设置2


  • cmake 3.3.1

  • CUDA 6.5 .14

  • g ++ 4.7.1

有两种方法可以解决此错误:

There are two ways to solve this error:


  • SHARED 更改为< CMakeList.txt中的code> STATIC

  • 使用包装函数 callKernel 而不是调用直接内核

  • change SHARED to STATIC in CMakeList.txt
  • use the wrapper function callKernel instead of calling the kernel directly

我真的不知道如何在没有CMake的情况下构建CUDA共享库。我知道如何构建CUDA静态库,但是这种情况似乎适用于CMake,因此我没有CMake也不进行测试。

I don't really know how to build a CUDA shared library without CMake. I know how to build a CUDA static library, but that case seems to work with CMake, so I didn't test it without CMake.

以下是相关的CMake命令我得到了 make VERBOSE = 1 。我尽可能将绝对路径更改为相对路径,但是我不确定所有这些库路径。将这些命令放在文件中并提供该文件的资源会正确且正确地编译共享库和程序,这会导致分段错误。我还添加了 command ,因为对我来说, nvcc 的别名是`-ccbin``选项。

Here are the relevant CMake commands I got with make VERBOSE=1. I changed absolute paths to relative paths, where possible, but I wasn't sure about all these library paths. Putting these commands in a file and sourcing that file compiles the shared library and the program correctly and "correctly" leads to the segmentation fault. I also added command because for me nvcc is aliased with the `-ccbin`` option.

make.sh

command nvcc "./testKernel.cu" -c -o "./testKernel.cu.o" -ccbin /usr/bin/cc -m64 -DtestKernelCall_EXPORTS -Xcompiler ,\"-fPIC\",\"-g\" -DNVCC -I/opt/cuda-7.0/include -I/opt/cuda-7.0/include
/usr/bin/c++  -fPIC   -shared -Wl,-soname,libtestKernelCall.so -o libtestKernelCall.so ./testKernel.cu.o /opt/cuda-7.0/lib64/libcudart_static.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so /opt/cuda-7.0/lib64/libcudart_static.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so
command nvcc "./useKernel.cu" -c -o "./useKernel.cu.o" -ccbin /usr/bin/cc -m64 -Xcompiler ,\"-g\" -DNVCC -I/opt/cuda-7.0/include -I/opt/cuda-7.0/include
/usr/bin/c++ ./useKernel.cu.o  -o useKernel -rdynamic /opt/cuda-7.0/lib64/libcudart_static.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so libtestKernelCall.so /opt/cuda-7.0/lib64/libcudart_static.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so -Wl,-rpath,"."


推荐答案

您的代码可以使用普通的 nvcc 命令(不是CMake),如果我将 -cudart shared 开关添加到 each nvcc命令。这是一个完整的序列:

Your code compiles and runs correctly for me using ordinary nvcc commands (not CMake) if I add the -cudart shared switch to each nvcc command. Here's a fully-worked sequence:

$ cat testKernel.hpp
__global__ void kernelTest( float x );
void callKernel( float x );
$ cat testKernel.cu
#include "testKernel.hpp"

__global__ void kernelTest( float x ) {}
void callKernel( float x ) { kernelTest<<<1,1>>>( x ); }
$ cat useKernel.cu
#include <cstdio>
#include "testKernel.hpp"

int main( void )
{
    kernelTest<<<1,1>>>( 3.0f );
    //callKernel( 3.0f );
    cudaDeviceSynchronize();
    printf("OK\n");
    return 0;
}
$ nvcc -shared -cudart shared -o test.so -Xcompiler -fPIC testKernel.cu
$ nvcc -cudart shared -o test test.so useKernel.cu
$ cuda-memcheck ./test
========= CUDA-MEMCHECK
OK
========= ERROR SUMMARY: 0 errors
$

如果我省略 -cudart分享了上面的 nvcc 命令中的 ,然后编译仍将继续,但是在执行时,我将看到上述seg错误。根据我的测试,在Fedora 20上使用CUDA 7.5进行了测试。

If I omit -cudart shared on either of the above nvcc commands, then the compile will still proceed, but on execution I will witness the aforementioned seg fault. Tested with CUDA 7.5 on Fedora 20.

关于您的CMake设置,有必要针对共享的cudart link 。因此,将 -cudart shared 添加到 -c 命令(它们是编译命令)是不够的。很抱歉,我不清楚。我上面的编译命令在每个步骤都同时进行 链接。)

Regarding your CMake setup, it's necessary to link against the shared cudart, according to my testing. Therefore it's insufficient to add -cudart shared to the -c commands (which are compile commands. Sorry if I was unclear. My "compile" commands above are doing both compiling and linking, at each step.)

链接时使用 nvcc ,正确的开关是 -cudart shared 。但是,您的 make.sh 表示最终链接是由主机c ++编译器完成的。

When linking with nvcc, the correct switch is -cudart shared. However, your make.sh indicates final link is being done by the host c++ compiler:

command nvcc "./testKernel.cu" -c -o "./testKernel.cu.o" -ccbin /usr/bin/cc -m64 -DtestKernelCall_EXPORTS -Xcompiler ,\"-fPIC\",\"-g\" -DNVCC -I/opt/cuda-7.0/include -I/opt/cuda-7.0/include
/usr/bin/c++  -fPIC   -shared -Wl,-soname,libtestKernelCall.so -o libtestKernelCall.so ./testKernel.cu.o /opt/cuda-7.0/lib64/libcudart_static.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so /opt/cuda-7.0/lib64/libcudart_static.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so
command nvcc "./useKernel.cu" -c -o "./useKernel.cu.o" -ccbin /usr/bin/cc -m64 -Xcompiler ,\"-g\" -DNVCC -I/opt/cuda-7.0/include -I/opt/cuda-7.0/include
/usr/bin/c++ ./useKernel.cu.o  -o useKernel -rdynamic /opt/cuda-7.0/lib64/libcudart_static.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so libtestKernelCall.so /opt/cuda-7.0/lib64/libcudart_static.a -lpthread /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so -Wl,-rpath,"."

在这种情况下,您不想链接:

In that case, you don't want to link against:

/opt/cuda-7.0/lib64/libcudart_static.a

,而不是 libcudart.so

/opt/cuda-7.0/lib64/libcudart.so

如果您正在编辑 make.sh ,您可能想在 / usr / bin / c ++ 两者中进行更改c>您显示的命令行。例如,如果我要修改已经显示的编译顺序以反映您使用主机c ++编译器进行链接的用法,则它看起来像这样:

If you were editing your make.sh directly, you would want to make that change in both of the /usr/bin/c++ command lines you have shown. For example, if I were to modify my compile sequence already presented to reflect your usage of the host c++ compiler to do the linking, it would look like this:

$ nvcc -c -Xcompiler -fPIC testKernel.cu                     
$ g++ -fPIC -shared -o test.so -L/usr/local/cuda/lib64 -lcudart testKernel.o
$ nvcc -c useKernel.cu
$ g++ -o test -L/usr/local/cuda/lib64 -lcudart test.so useKernel.o
$ cuda-memcheck ./test
========= CUDA-MEMCHECK
OK
========= ERROR SUMMARY: 0 errors
$

这篇关于当直接从共享库调用CUDA内核时,为什么会出现分段错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆