当我有一个表面声明时，如何编译sm_1X和sm_2X的CUDA程序 [英] How can I compile a CUDA program for sm_1X AND sm_2X when I have a surface declaration

查看：308 发布时间：2017/3/4 14:50:33 c++ cuda macros c-preprocessor nvcc

本文介绍了当我有一个表面声明时，如何编译sm_1X和sm_2X的CUDA程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个使用表面（对纹理重新取样和写入）的库来获得性能提升：

  ... 
 surface< void，2> my_surf2D; //允许写纹理
 ...

目标平台GPU具有计算能力2.0，我可以编译我的代码：

  nvcc -arch = sm_20 ... 
  
 
 问题是当我试图开发时并调试我的笔记本电脑上的库，它有一个NVIDIA ION GPU计算能力1.1（我也希望我的库向后兼容）。我知道这个架构不支持曲面，所以我使用nvcc宏在我的设备代码为这个旧的架构定义一个替代代码路径：
  #if（__CUDA_ARCH__< 200）
 #warning使用kernel for CUDA ARCH< 2.0 
 ... 
 temp_array [...] = tex3D（my_tex，X，Y，Z + 0.5f）; 
 #else 
 ... 
 surf2Dwrite（tex3D（my_tex，X，Y，Z + 0.5f），my_surf2D，ix * 4，iy，cudaBoundaryModeTrap）; 
 #endif 
  
问题是，当我这样做：
  nvcc -gencode arch = compute_11，code = sm_11 
  
 
 b $ b 
我得到这个错误：
  ptxas PTX / myLibrary.ptx，line 1784; fatal：'.surf'附近的解析错误：语法错误
  
当我看到PTX文件时看起来是表面声明：
  .surf .u32 _ZN16LIB_15my_surf2DE; 
  
如果我试图在我的源代码中的表面声明周围放置一个类似的宏：
  #ifdef __CUDACC__ 
 #if __CUDA_ARCH__< 200 
 #warning跳过表面声明为nvcc轨道
 #else 
 surface ... 
 #endif 
 #else 
默认情况下保持表面声明
 surface ... 
 #endif 
  
变量在主机代码调用中未定义以将cuda表面绑定到数组。 
 
 
 我不确定是否可能，或者如果我在某处进行添加，请帮助。
解决方案
该线程应该显示为已回答... 
 
 
 我得到它的工作（实际上很简单）。您必须在使用表面引用的所有三个可能的地方放置宏，并且要小心使用宏（原因是，__CUDACC__不是必需的）。
 
 
  以下仅仅在编译计算能力时更改代码。 2.0 
 
 
 表面声明：
  //启用向后兼容性：
 #if defined（__ CUDA_ARCH__）& （__CUDA_ARCH__< 200）
 #warning跳过计算能力的表面声明< 2.0 
 #else 
 surface< void，2> my_surf2D; //允许写纹理
 #endif 
  
表面绑定：
  #if defined（__ CUDA_ARCH__）& （__CUDA_ARCH__< 200）
 #warning skipping cudaBindSurfaceToArray for compute capability< 2.0 
 ... 
 #else 
 errorCode = cudaBindSurfaceToArray（my_surf2D，my_cudaArray2D）; 
 #endif 
  
和Surface写作：
 
 $ b b 
  #if defined（__ CUDA_ARCH__）& （__CUDA_ARCH__< 200）
 #warning使用内核用于计算能力< 2.0 
 ... 
 temp_array [...] = tex3D（my_tex，X，Y，Z + 0.5f）; 
 #else 
 ... 
 surf2Dwrite（tex3D（my_tex，X，Y，Z + 0.5f），my_surf2D，ix * 4，iy，cudaBoundaryModeTrap）; 
 #endif 
  
这适用于虚拟和实际目标（-arch = compute_XX和 - arch = sm_XX）。
 
 
 感谢 talonmies 和< a href =http://stackoverflow.com/users/442006/roger-dahl> Roger Dahl ，指向正确的方向，以及来自 talonmies 的此回答，其中有关于nvcc / CUDA宏的详细说明
 
I am writing a library that uses a surface (to re-sample and write to a texture) for a performance gain:
...
surface<void,  2> my_surf2D; //allows writing to a texture
...
The target platform GPU has compute capability 2.0 and I can compile my code with:
nvcc -arch=sm_20 ...
and it works just fine.

The problem is when I am trying to develop and debug the library on my laptop which has an NVIDIA ION GPU with compute capability 1.1 (I would also like my library to be backwards compatible). I know this architecture does not support surfaces so I used the nvcc macros in my device code to define an alternate code path for this older architecture:
#if (__CUDA_ARCH__ < 200)
#warning using kernel for CUDA ARCH < 2.0
...
temp_array[...] =  tex3D(my_tex,X,Y,Z+0.5f);
#else
...
surf2Dwrite( tex3D(my_tex,X,Y,Z+0.5f), my_surf2D, ix*4, iy,cudaBoundaryModeTrap);
#endif
The problem is that when I do:
nvcc -gencode arch=compute_11,code=sm_11
I get this error:
ptxas PTX/myLibrary.ptx, line 1784; fatal  : Parsing error near '.surf': syntax error
When I look at the PTX file is see what appears to be the surface declaration:
.surf .u32 _ZN16LIB_15my_surf2DE;
If I try to put a similar macro around the surface declaration in my source code:
#ifdef __CUDACC__
#if __CUDA_ARCH__ < 200
#warning skipping surface declaration for nvcc trajectory
#else
surface ...
#endif
#else
#warning keeping surface declaration by default
surface ...
#endif
I get an error saying the surface variable is undefined in the host code call to to bind cuda surface to array. Should I add the macro around the bind function as well?

I'm not sure if it is possible, or if I goofed somewhere, please help.
 解决方案 
Figured this thread should show up as answered...

I got it to work (quite simple actually). You must put a macro around all three possible places where the surface reference is used, and be careful to use the macros properly (it turns out, __CUDACC__ is not necessary).

The following only changes the code when compiling for compute capability < 2.0

The surface declaration:
//enable backwards compatability:
#if defined(__CUDA_ARCH__) & (__CUDA_ARCH__ < 200)
#warning skipping surface declarations for compute capability < 2.0
#else
surface<void,  2> my_surf2D; //allows writing to a texture
#endif
Surface binding:
#if defined(__CUDA_ARCH__) & (__CUDA_ARCH__ < 200)
#warning skipping cudaBindSurfaceToArray for compute capability < 2.0
...
#else
errorCode = cudaBindSurfaceToArray(my_surf2D, my_cudaArray2D);
#endif
And Surface writing:
#if defined(__CUDA_ARCH__) & (__CUDA_ARCH__ < 200)
#warning using kernel for compute capability < 2.0
...
temp_array[...] =  tex3D(my_tex,X,Y,Z+0.5f);
#else
...
surf2Dwrite( tex3D(my_tex,X,Y,Z+0.5f), my_surf2D, ix*4, iy,cudaBoundaryModeTrap);
#endif
This works for both virtual and real targets (-arch=compute_XX and -arch=sm_XX respectively).

Thanks to talonmies and Roger Dahl for pointing me in the right direction, as well as this answer from talonmies which has a great explanation of nvcc/CUDA macros as well.

                        这篇关于当我有一个表面声明时，如何编译sm_1X和sm_2X的CUDA程序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

当我有一个表面声明时，如何编译sm_1X和sm_2X的CUDA程序 [英] How can I compile a CUDA program for sm_1X AND sm_2X when I have a surface declaration

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

当我有一个表面声明时，如何编译sm_1X和sm_2X的CUDA程序 [英] How can I compile a CUDA program for sm_1X AND sm_2X when I have a surface declaration

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭