CUDA 5.0库单独编译与cmake [英] CUDA 5.0 separate compilation of library with cmake

查看:432
本文介绍了CUDA 5.0库单独编译与cmake的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的cuda库的构建时间正在增加,所以我认为CUDA 5.0中引入的单独编译可能会帮助我。我不知道如何实现单独的编译与cmake。我研究了NVCC文档,并找到如何编译设备对象(使用-dc选项)以及如何链接它们(使用-dlink)。我试图让它运行使用cmake失败。我使用cmake 2.8.10.2和FindCUDA.cmake的trunk的头。我不知道如何指定应该编译哪些文件以及如何将它们链接到库中。
特别是函数(CUDA_LINK_SEPARABLE_COMPILATION_OBJECTS output_file_var cuda_target选项object_files source_files)
的语法对我不清楚,因为我不知道 output_file_var cuda_target
这里没有我的尝试的工作结果:

  cuda_compile(DEVICEMANAGER_O devicemanager.cu选项-dc)
cuda_compile(BLUB_O blub.cu OPTIONS-dc)
CUDA_LINK_SEPARABLE_COMPILATION_OBJECTS(TEST_O gpuacceleration
DEVICEMANGER_O BLUB_O)
set(LIB_TYPE SHARED)
#cuda_add_library(gpuacceleration $ {LIB_TYPE}
#$ {gpuacc_SRCS}
#devicemanager.cu
#blub.cu
#DEVICEMANAGER_O
#TEST_O
#)

有没有人知道如何使用cmake编译和链接cuda库?
提前感谢。



编辑:
在朋友咨询了FindCUDA.cmake的开发人员之后,在FindCUDA.cmake提供的示例中得到修复( https:/ /gforge.sci.utah.edu/gf/project/findcuda/scmsvn/?action=browse&path=%2F 结帐%2Ftrunk%2FFindCuda.html)。
现在我可以构建这个例子了。
在我的项目中,我可以根据需要使用以下内容构建库(需要cmake 2.8.10):

  (LIB_TYPE SHARED)
set(CUDA_SEPARABLE_COMPILATION ON)
cuda_add_library(gpuacceleration $ {LIB_TYPE}
blub.cu
blab.cu

但是:
我无法链接到此库。当我构建lib没有单独的编译,我能够链接到它。
现在得到以下错误:

 未定义引用`__cudaRegisterLinkedBinary_53_tmpxft_00005ab4_00000000_6_blub_cpp1_ii_d07d5695'

用于接口中使用的函数。似乎很奇怪,因为它建立没有任何警告等
任何想法如何让这个工作?



编辑:
我终于找到了如何做到这一点。详情请参阅@ PHD和我的答案。

解决方案

EDIT(2016-03-15):是的,它被FindCUDA确认为一个错误: https://cmake.org/Bug/view.php?id=15157






TL; DR:这似乎是FindCUDA中的一个错误,它使得对象在最终链接之前对外部定义是松散的。 >

问题是,即使启用了可分离的编译,仍然在最终连接之前对所有目标单独执行链接步骤。



例如,我有 module.cu

  #includemodule.h
#include< cstdio>

double arr [10] = {1,2,3,4,5,6,7,8,9,10};
__constant__ double carr [10];

void init_carr(){
cudaMemcpyToSymbol(carr,arr,10 * sizeof(double));
}

__global__ void pkernel(){
printf((pkernel)carr [%d] =%g\\\
,threadIdx.x,carr [threadIdx。 X]);
}

void print_carr(){
printf(in print_carr\\\
);
pkernel<<< 1>>>>();
}

module.h with:

  extern __constant__ double carr [10]; 
extern double arr [10];

void print_carr();
void init_carr();

最后 main.cu / p>

  #includemodule.h

#include< cstdio>

__global__ void kernel(){
printf((kernel)carr [%d] =%g\\\
,threadIdx.x,carr [threadIdx.x]);
}


int main(int argc,char * argv []){
printf(arr:%g%g%g .. \\\
,arr [0],arr [1],arr [2]);

kernel<<< 1,10>>>();
cudaDeviceSynchronize();
print_carr();
cudaDeviceSynchronize();
init_carr();
cudaDeviceSynchronize();
kernel<<< 1,10>>>();
cudaDeviceSynchronize();
print_carr();
cudaDeviceSynchronize();

return 0;
}

这样就可以正常工作了 Makefile

  NVCC = nvcc 
NVCCFLAGS = -arch = sm_20
LIB = libmodule .a
OBJS = module.o main.o
PROG = extern

$(PROG):main.o libmodule.a
$(NVCC)$ NVCCFLAGS)-o $ @ $ ^

%.o:%.cu
$(NVCCFLAGS)-dc -c -o $ @ $ ^

$(LIB):module.o
ar cr $ @ $ ^

clean:
$(RM)$(PROG)$(OBJS)$ )

但是我试着使用下面的 CMakeLists.txt

  CMAKE_MINIMUM_REQUIRED(VERSION 2.8.8)

PROJECT(extern)

FIND_PACKAGE(CUDA REQUIRED)
SET(CUDA_SEPARABLE_COMPILATION ON)

SITE_NAME(HOSTNAME)

SET(CUDA_NVCC_FLAGS $ {CUDA_NVCC_FLAGS} -arch = sm_20)

cuda_add_library(module module.cu)

CUDA_ADD_EXECUTABLE(extern main.cu)
TARGET_LINK_LIBRARIES(extern module)

然后编译时,会发生以下情况:

  $ cmake .. 
- C编译器标识为GNU 4.9.2
...
$ make VERBOSE = 1
...
[25%]建立NVCC(设备)对象CMakeFiles / module.dir //./ module_generated_module.cu.o
...
- 生成< ...& CMakeFiles / module.dir //./ module_generated_module.cu.o
/ usr / local / cuda / bin / nvcc< ...> /module.cu -dc -o< ...> /build/CMakeFiles/module.dir//./module_generated_module.cu.o -ccbin / usr / bin / cc -m64 -Xcompiler,\-g \-arch = sm_20 -DNVCC -I / usr / local / cuda / include
[50%]建立NVCC中间链接文件CMakeFiles / module.dir /./ module_intermediate_link.o
/ usr / local / cuda / bin / nvcc -arch = sm_20 -m64 -ccbin / usr / bin / cc-dlink< ...> /build/CMakeFiles/module.dir //./ module_generated_module.cu.o -o< ...> / build / CMakeFiles / module。 dir /./ module_intermediate_link.o
...
/ usr / bin / ar cr libmodule.a CMakeFiles / module.dir /./ module_generated_module.cu.o CMakeFiles / module.dir /./ module_intermediate_link 。$
/ usr / bin / ranlib libmodule.a
...
[50%]建立目标模块
[75%]建立NVCC设备对象CMakeFiles / extern .dir //./ extern_generated_main.cu.o
...
- 生成< ...> /build/CMakeFiles/extern.dir //./ extern_generated_main.cu.o
/ usr / local / cuda / bin / nvcc< ...> /main.cu -dc -o< ...> /build/CMakeFiles/extern.dir //./ extern_generated_main.cu .o -ccbin / usr / bin / cc -m64 -Xcompiler,\-g \-arch = sm_20 -DNVCC -I / usr / local / cuda / include -I / usr / local / cuda / include
...
[100%]建立NVCC中间链接文件CMakeFiles / extern.dir /./ extern_intermediate_link.o
/ usr / local / cuda / bin / nvcc -arch = sm_20 -m64 -ccbin/ usr / bin / cc-dlink< ...> /build/CMakeFiles/extern.dir //./ extern_generated_main.cu.o -o< ...> / build / CMakeFiles / extern.dir /./ extern_intermediate_link.o
nvlink错误:未定义对'carr'在'< ...> /build/CMakeFiles/extern.dir //./ extern_generated_main.cu.o'中的引用b $ b

显然,问题是 nvcc -dlink obj.o -o obj_intermediate_link。 o 行。然后,我猜,外部定义的信息丢失。所以,问题是,有可能使CMake / FindCUDA不做这个额外的链接步骤?



否则,我会认为这是一个错误。你同意吗?我可以用CMake提交一个错误报告。


The buildtime of my cuda library is increasing and so I thought that separate compilation introduced in CUDA 5.0 might help me. I couldn't figure out how to achieve separate compilation with cmake. I looked into the NVCC documentation and found how to compile device object (using the -dc option) and how to link them (using the -dlink). My attempts to get it running using cmake failed. I'm using cmake 2.8.10.2 and the head of the trunk of the FindCUDA.cmake. I couldn't however find out how to specify which files should be compiled and how to link them into a library. Especially the syntax of the function(CUDA_LINK_SEPARABLE_COMPILATION_OBJECTS output_file_var cuda_target options object_files source_files) is unclear to me because I don't know what the output_file_var and the cuda_target are. Here the not working results of my attemps:

cuda_compile(DEVICEMANAGER_O devicemanager.cu OPTIONS -dc)
cuda_compile(BLUB_O blub.cu OPTIONS -dc)
CUDA_LINK_SEPARABLE_COMPILATION_OBJECTS(TEST_O gpuacceleration
                                          ""  DEVICEMANGER_O BLUB_O)
set(LIB_TYPE SHARED)
#cuda_add_library(gpuacceleration ${LIB_TYPE} 
  #${gpuacc_SRCS} 
  #devicemanager.cu
  # blub.cu
  #DEVICEMANAGER_O
#  TEST_O
#)

Does anyone know how to compile and link a cuda library using cmake? Thanks in advance.

EDIT: After a friend consulted the developer of the FindCUDA.cmake, a bug got fixed in the example provided with FindCUDA.cmake (https://gforge.sci.utah.edu/gf/project/findcuda/scmsvn/?action=browse&path=%2Fcheckout%2Ftrunk%2FFindCuda.html). I'm now able to build the example. In my project I can build the library as needed using the following (cmake 2.8.10 required):

set(LIB_TYPE SHARED)
set(CUDA_SEPARABLE_COMPILATION ON)
cuda_add_library(gpuacceleration ${LIB_TYPE} 
 blub.cu
 blab.cu
)

BUT: I cannot link against this library. When I builded the lib without separate compilation i was able to link against it. Now getting the following error:

 undefined reference to `__cudaRegisterLinkedBinary_53_tmpxft_00005ab4_00000000_6_blub_cpp1_ii_d07d5695'

for every file with a function used in the interface. Seems strange since it builds without any warning etc. Any ideas how to get this working?

EDIT: I finally figured out how to do this. See @PHD's and my answer for details.

解决方案

EDIT (2016-03-15): Yes, it is confirmed as a bug in FindCUDA: https://cmake.org/Bug/view.php?id=15157


TL;DR: This seems to be a bug in FindCUDA, which makes objects loose info on external definitions before the final linking.

The problem is that, even if separable compilation is enabled, a linking step is still performed for all the targets individually before the final linking.

For instance, I have module.cu with:

#include "module.h"
#include <cstdio>

double arr[10] = {1,2,3,4,5,6,7,8,9,10};
__constant__ double carr[10];

void init_carr() {
  cudaMemcpyToSymbol(carr,arr,10*sizeof(double));
}

__global__ void pkernel() {
  printf("(pkernel) carr[%d]=%g\n",threadIdx.x,carr[threadIdx.x]);
}

void print_carr() {
  printf("in print_carr\n");
  pkernel<<<1,10>>>();
}

and module.h with:

extern __constant__ double carr[10];
extern double arr[10];

void print_carr();
void init_carr();

and finally main.cu with:

#include "module.h"

#include <cstdio>

__global__ void kernel() {
  printf("(kernel) carr[%d]=%g\n",threadIdx.x,carr[threadIdx.x]);
}


int main(int argc, char *argv[]) {
  printf("arr: %g %g %g ..\n",arr[0],arr[1],arr[2]);

  kernel<<<1,10>>>();
  cudaDeviceSynchronize();
  print_carr();
  cudaDeviceSynchronize();
  init_carr();
  cudaDeviceSynchronize();
  kernel<<<1,10>>>();
  cudaDeviceSynchronize();
  print_carr();
  cudaDeviceSynchronize();

  return 0;
}

This then works fine with the following Makefile:

NVCC=nvcc
NVCCFLAGS=-arch=sm_20
LIB=libmodule.a
OBJS=module.o main.o
PROG=extern

$(PROG): main.o libmodule.a
    $(NVCC) $(NVCCFLAGS) -o $@ $^

%.o: %.cu
    $(NVCC) $(NVCCFLAGS) -dc -c -o $@ $^

$(LIB): module.o
    ar cr $@ $^

clean:
    $(RM) $(PROG) $(OBJS) $(LIB)

But then I try to use the following CMakeLists.txt:

CMAKE_MINIMUM_REQUIRED(VERSION 2.8.8)

PROJECT(extern)

FIND_PACKAGE(CUDA REQUIRED)
SET(CUDA_SEPARABLE_COMPILATION ON)

SITE_NAME(HOSTNAME)

SET(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -arch=sm_20)

cuda_add_library(module module.cu)

CUDA_ADD_EXECUTABLE(extern main.cu)
TARGET_LINK_LIBRARIES(extern module)

When then compiling, what then happens is that the following:

$ cmake ..
-- The C compiler identification is GNU 4.9.2
...
$ make VERBOSE=1
...
[ 25%] Building NVCC (Device) object CMakeFiles/module.dir//./module_generated_module.cu.o
...
-- Generating <...>/build/CMakeFiles/module.dir//./module_generated_module.cu.o
/usr/local/cuda/bin/nvcc <...>/module.cu -dc -o <...>/build/CMakeFiles/module.dir//./module_generated_module.cu.o -ccbin /usr/bin/cc -m64 -Xcompiler ,\"-g\" -arch=sm_20 -DNVCC -I/usr/local/cuda/include
[ 50%] Building NVCC intermediate link file CMakeFiles/module.dir/./module_intermediate_link.o
/usr/local/cuda/bin/nvcc -arch=sm_20 -m64 -ccbin "/usr/bin/cc" -dlink <...>/build/CMakeFiles/module.dir//./module_generated_module.cu.o -o <...>/build/CMakeFiles/module.dir/./module_intermediate_link.o
...
/usr/bin/ar cr libmodule.a  CMakeFiles/module.dir/./module_generated_module.cu.o CMakeFiles/module.dir/./module_intermediate_link.o
/usr/bin/ranlib libmodule.a
...
[ 50%] Built target module
[ 75%] Building NVCC (Device) object CMakeFiles/extern.dir//./extern_generated_main.cu.o
...
-- Generating <...>/build/CMakeFiles/extern.dir//./extern_generated_main.cu.o
/usr/local/cuda/bin/nvcc <...>/main.cu -dc -o <...>/build/CMakeFiles/extern.dir//./extern_generated_main.cu.o -ccbin /usr/bin/cc -m64 -Xcompiler ,\"-g\" -arch=sm_20 -DNVCC -I/usr/local/cuda/include -I/usr/local/cuda/include
...
[100%] Building NVCC intermediate link file CMakeFiles/extern.dir/./extern_intermediate_link.o
/usr/local/cuda/bin/nvcc -arch=sm_20 -m64 -ccbin "/usr/bin/cc" -dlink <...>/build/CMakeFiles/extern.dir//./extern_generated_main.cu.o -o <...>/build/CMakeFiles/extern.dir/./extern_intermediate_link.o
nvlink error   : Undefined reference to 'carr' in '<...>/build/CMakeFiles/extern.dir//./extern_generated_main.cu.o'

Clearly, the problem are the nvcc -dlink obj.o -o obj_intermediate_link.o lines. Then, I guess, the info on external definitions are lost. So, the question is, it is possible to make CMake/FindCUDA not do this extra linking step?

Otherwise, I would argue that this is a bug. Do you agree? I can file a bug report with CMake.

这篇关于CUDA 5.0库单独编译与cmake的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆