mex链接cuda代码在单独的编译模式 [英] mex linking of cuda code in separate compilation mode

查看:255
本文介绍了mex链接cuda代码在单独的编译模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在MATLAB mex中使用CUDA代码,在linux下。有了整个程序编译模式,它对我很好。我在Nsight中采取以下两个步骤:



(1)将-fPIC作为编译器选项添加到每个.cpp或.cu文件,每个生成一个.o文件。



(2)将链接器命令设置为mex,并添加-cxx以指示所有.o输入文件的类型为cpp文件,添加cuda的库路径。还要添加一个包含mexFunction条目的cpp文件作为附加输入。



这样工作得很好,结果mex文件在MATLAB下运行良好。之后,当我需要使用动态并行性,我必须切换到Nsight的单独的编译模式。我尝试了上面的同样的事情,但链接器产生了许多错误的缺失引用,这是我不能解决。



然后我检查编译和链接步骤的单独编译模式。我对它在做什么感到困惑。似乎Nsight为每个.cpp或.cu文件做两个编译步骤,并产生一个.o文件以及一个.d文件。像这样:

  /usr/local/cuda-5.5/bin/nvcc -O3 -gencode arch = compute_35,code = sm_35 -odirsrc-M -osrc / tn_matrix.d../src/tn_matrix.cu
/usr/local/cuda-5.5/bin/nvcc --device-c -O3 - gencode arch = compute_35,code = compute_35 -gencode arch = compute_35,code = sm_35 -x cu -osrc / tn_matrix.o../src/tn_matrix.cu



链接命令如下:

  usr / local / cuda-5.5 / bin / nvcc --cudart static --relocatable-device-code = true -gencode arch = compute_35,code = compute_35 -gencode arch = compute_35,code = sm_35 -link -otest7。 /src/cu_base.o ./src/exp_bp_wsj_dev_mex.o ./src/tn_main.o ./src/tn_matlab_helper.o ./src/tn_matrix.o ./src/tn_matrix_lib_dev.o ./src/tn_matrix_lib_host.o ./ src / tn_model_wsj_dev.o ./src/tn_model_wsj_host.o ./src/tn_utility.o -lcudadevrt -lmx -lcusparse -lcurand -lcublas 

有趣的是,链接器不会将.d文件作为输入。所以我不知道如何处理这些文件,以及我应该如何处理他们与mex命令链接时



另一个问题是链接阶段很多选项我不明白(--cudart static --relocatable-device-code = true),我想是为什么我不能使它像整个程序编译模式下工作的原因。所以我尝试以下:



(1)以与开头相同的方式编译。



(2)保留Nsight提供的链接命令,但改为使用-shared选项,以便链接器生成一个lib文件。



(3)调用mex输入lib文件和另一个包含mexFunction条目的cpp文件。



这种方式mex编译工作,它产生一个mex可执行文件作为输出。但是,在MATLAB下运行结果mex可执行文件会立即生成分段错误并导致MATLAB崩溃。



我不知道这种链接方式是否会导致任何问题。更奇怪的是,我发现mex链接步骤似乎完成,甚至没有检查可执行文件的完整性,因为即使我错过一个.cpp文件的某些函数,mexFunction将使用,它仍然编译。



编辑:



我想出了如何手动链接到一个mex可执行文件,但我还没有想出如何自动地在Nsight下,我可以在整个程序编译模式。这是我的方法:



(1)从构建包含mexFunction条目的cpp文件中排除。使用命令mex -c手动编译它。



(2)将-fPIC作为编译器选项添加到其余的.cpp或.cu文件,然后单独编译它们,每个生成一个.o文件。



(3)链接将失败,因为它找不到主函数。我们没有它,因为我们使用mexFunction,它被排除在外。



(4)按照下面的帖子中的方法手动将.o文件链接到设备对象文件中



cuda分享库链接:未定义引用cudaRegisterLinkedBinary



例如,如果步骤(2)产生ao和bo,这里我们

  nvcc -gencode arch = compute_35,code = sm_35 -Xcompiler'-fPIC'-dlink ao bo -o mex_dev.o -lcudadevrt 
mex_dev.o
不应该存在,否则将不会存在。c $ c>



<



(5)使用mex命令链接步骤(2)和步骤(4)中生成的所有.o文件,并提供所有必需的库。



这样可以运行和生成可运行的mex可执行文件。我无法自动化步骤(1)在Nsight内的原因是,如果我将编译命令更改为mex,Nsight也将使用此命令生成依赖文件(问题文本中提及的.d文件)。而我不能自动的步骤(4)和步骤(5)在Nsight的原因是因为它涉及两个命令,我不知道如何把它们。请让我知道,如果你知道如何做这些。谢谢!

解决方案

好的,我想出了解决方案。以下是在Nsight中使用单独编译模式编译mex程序的完整步骤:


  1. 创建cuda项目。

  2. 在专案层级中,变更下列项目的建置选项:




    • 开启
    • 添加 -dlink -Xcompiler'-fPIC' 链接器NVCC链接器的专家设置命令行模式

    • 添加字母 o 到Build Artifact - >Artifact Extension,因为在最后一步中 -dlink 我们使输出a .o 档案。

    • 新增 mex -cxx -o path_to_mex_bin / mex_bin_filename ./*.o ./src/*.o -lcudadevrt toPost Build Steps,(添加其他必要的库)



    更新:在我的实际项目中,步骤到MATLAB中的.m文件,因为否则如果我在我的mex程序运行时,它可能会导致MATLAB崩溃。


  3. 对于需要使用mex编译的文件,请为每个文件更改这些构建选项:




    • 在工具链编辑器中将编译器更改为 GCC C ++编译器

    • 返回到 GCC C ++编译器的编译器设置,并将命令更改为 mex

    • 将命令行模式更改为 $ {COMMAND} -c -outdirsrc$ {INPUTS}


其他几个附注:



1)Cuda的具体细节(例如内核函数和对内核函数的调用)必须从mex编译器中隐藏。所以他们应该放在.cu文件,而不是头文件。



在头文件中(例如 fh ),你只需要这样的函数声明:

  template< typename ValueType> 
void func(ValueType x);

添加一个名为 f.inc ,其中包含定义

 模板< 
void func(ValueType x){
//可能的内核启动应该从mex
隐藏}

在源代码文件(例如, f.cu )中,您输入

  #define ValueType float 
#includef.inc
#undef ValueType

#define ValueType double
#includef.inc
#undef ValueType

//添加其他类型。

这个技巧可以很容易地通过模板化的类来隐藏细节。



(2)mex的具体细节也应该隐藏从cuda源文件,因为 mex.h 将改变一些系统功能的定义, as printf 。因此,包括mex.h不应出现在可能包含在cuda源文件中的头文件中。



(3)在mex源代码文件中条目mexFunction,可以使用编译器宏 MATLAB_MEX_FILE 有选择地编译代码段。这样源代码文件可以编译成mex可执行文件或通常可执行,允许在Nsight下调试而不使用matlab。下面是在Nsight下构建多个目标的诀窍:在一个Eclipse项目中构建多个二进制文件


I'm trying to use CUDA code inside MATLAB mex, under linux. With the "whole program compilation" mode, it works good for me. I take the following two steps inside Nsight:

(1) Add "-fPIC" as a compiler option to each .cpp or .cu file, then compile them separately, each producing a .o file.

(2) Set the linker command to be "mex" and add "-cxx" to indicate that the type of all the .o input files are cpp files, and add the library path for cuda. Also add a cpp file that contains the mexFunction entry as an additional input.

This works good and the resulted mex file runs well under MATLAB. After that when I need to use dynamical parallelism, I have to switch to the "separate compilation mode" in Nsight. I tried the same thing above but the linker produces a lot of errors of missing reference, which I wasn't able to resolve.

Then I checked the compilation and linking steps of the "separate compilation" mode. I got confused by what it is doing. It seems that Nsight does two compilation steps for each .cpp or .cu file and produces a .o file as well as a .d file. Like this:

/usr/local/cuda-5.5/bin/nvcc -O3 -gencode arch=compute_35,code=sm_35 -odir "src" -M -o "src/tn_matrix.d" "../src/tn_matrix.cu"
/usr/local/cuda-5.5/bin/nvcc --device-c -O3 -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35  -x cu -o  "src/tn_matrix.o" "../src/tn_matrix.cu"

The linking command is like this:

/usr/local/cuda-5.5/bin/nvcc --cudart static --relocatable-device-code=true -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35 -link -o  "test7"  ./src/cu_base.o ./src/exp_bp_wsj_dev_mex.o ./src/tn_main.o ./src/tn_matlab_helper.o ./src/tn_matrix.o ./src/tn_matrix_lib_dev.o ./src/tn_matrix_lib_host.o ./src/tn_model_wsj_dev.o ./src/tn_model_wsj_host.o ./src/tn_utility.o   -lcudadevrt -lmx -lcusparse -lcurand -lcublas

What's interesting is that the linker does not take the .d file as input. So I'm not sure how it dealt with these files and how I should process them with the "mex" command when linking?

Another problem is that the linking stage has a lot of options I don't understand (--cudart static --relocatable-device-code=true), which I guess is the reason why I cannot make it work like in the "whole program compilation" mode. So I tried the following:

(1) Compile in the same way as in the beginning of the post.

(2) Preserve the linking command as provided by Nsight but change to use "-shared" option, so that the linker produces a lib file.

(3) Invoke mex with input the lib file and another cpp file containing the mexFunction entry.

This way mex compilation works and it produces a mex executable as output. However, running the resulted mex executable under MATLAB produces a segmentation fault immediately and crashes MATLAB.

I'm not sure if this way of linking would cause any problem. More strangely, I found that the mex linking step seems to finish trivially without even checking the completeness of the executable, because even if I miss a .cpp file for some function that the mexFunction will use, it still compiles.

EDIT:

I figured out how to manually link into a mex executable which can run correctly under MATLAB, but I haven't figured out how to do that automatically under Nsight, which I can in the "whole program compilation" mode. Here is my approach:

(1) Exclude from build the cpp file which contains the mexFunction entry. Manually compile it with the command "mex -c".

(2) Add "-fPIC" as a compiler option to each of the rest .cpp or .cu file, then compile them separately, each producing a .o file.

(3) Linking will fail because it cannot find the main function. We don't have it since we use mexFunction and it is excluded. This doesn't matter and I just leave it there.

(4) Follow the method in the post below to manually dlink the .o files into a device object file

cuda shared library linking: undefined reference to cudaRegisterLinkedBinary

For example, if step (2) produces a.o and b.o, here we do

nvcc -gencode arch=compute_35,code=sm_35 -Xcompiler '-fPIC' -dlink a.o b.o -o mex_dev.o -lcudadevrt

Note that here the output file mex_dev.o should not exist, otherwise the above command will fail.

(5) Use mex command to link all the .o files produced in step (2) and step (4), with all necessary libraries supplied.

This works and produces runnable mex executable. The reason I cannot automate step (1) inside Nsight is because if I change the compilation command to "mex", Nsight will also use this command to generate a dependency file (the .d file mentioned in the question text). And the reason I cannot automate step (4) and step (5) in Nsight is because it involves two commands, which I don't know how to put them in. Please let me know if you knows how to do these. Thanks!

解决方案

OK, I figured out the solution. Here are the complete steps for compiling mex programs with "separate compilation mode" in Nsight:

  1. Create a cuda project.
  2. In the project level, change build option for the following:

    • Switch on -fPIC in the compiler option of "NVCC compiler" at the project level.
    • Add -dlink -Xcompiler '-fPIC' to "Expert Settings" "Command Line Pattern" of the linker "NVCC Linker"
    • Add letter o to "Build Artifact" -> "Artifact Extension", since by -dlink in the last step we are making the output a .o file.
    • Add mex -cxx -o path_to_mex_bin/mex_bin_filename ./*.o ./src/*.o -lcudadevrt to "Post Build Steps", (add other necessary libs)

    UPDATE: In my actual project I moved the last step to a .m file in MATLAB, because otherwise if I do it while my mex program is running, it could cause MATLAB crash.

  3. For files needs to be compiled with mex, change these build option for each of them:

    • Change the compiler to GCC C++ Compiler in Tool Chain Editor.
    • Go back to compiler setting of GCC C++ Compiler and change Command to mex
    • Change command line pattern to ${COMMAND} -c -outdir "src" ${INPUTS}

Several additional notes:

(1) Cuda specific details (such as kernel functions and calls to kernel functions) must be hidden from the mex compiler. So they should be put in the .cu files rather than the header files. Here is a trick to put templates involving cuda details into .cu files.

In the header file (e.g., f.h), you put only the declaration of the function like this:

template<typename ValueType>
void func(ValueType x);

Add a new file named f.inc, which holds the definition

template<>
void func(ValueType x) {
  // possible kernel launches which should be hidden from mex
}

In the source code file (e.g., f.cu), you put this

#define ValueType float
#include "f.inc"
#undef ValueType

#define ValueType double
#include "f.inc"
#undef ValueType

// Add other types you want.

This trick can be easily generalized for templated classes to hide details.

(2) mex specific details should also be hidden from cuda source files, since the mex.h will alter the definitions of some system functions, such as printf. So including of "mex.h" should not appear in header files that can potentially be included in the cuda source files.

(3) In the mex source code file containing the entry mexFunction, one can use the compiler macro MATLAB_MEX_FILE to selectively compile code sections. This way th source code file can be compiled into both mex executable or ordinarily executable, allowing debugging under Nsight without matlab. Here is a trick for building multiple targets under Nsight: Building multiple binaries within one Eclipse project

这篇关于mex链接cuda代码在单独的编译模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆