CUDA和nvcc:使用预处理器在float或double之间进行选择 [英] CUDA and nvcc: using the preprocessor to choose between float or double

查看:1423
本文介绍了CUDA和nvcc:使用预处理器在float或double之间进行选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

有一个.h,我希望在编译c / c ++或cuda计算能力> = 1.3。如果为具有计算能力的cuda编译



<$>



<$> p $ p>
#如果定义(__ CUDACC__)

#警告*定义cuda

#如果定义(__ CUDA_ARCH__)
#warning __CUDA_ARCH__定义
#else
#警告__CUDA_ARCH__未定义
#endif

#if(__CUDA_ARCH__> = 130)
#define real double
#警告在cuda中使用double
#elif(__CUDA_ARCH__> = 0)
#define real float
#使用cuda中的float警告
#警告如何打印当__CUDA_ARCH__没有定义?
#else
#define真正的
#错误什么是__CUDA_ARCH__的值,我如何打印它
#endif

#else
#warning *为c / c ++定义
#define真正的double
#为c / c ++使用双引号的警告
#endif

(注意-arch标志)

 
nvcc -arch compute_13 -Ilibcutil testFloatDouble.cu

我得到

 
*定义cuda
__CUDA_ARCH__定义为
在cuda中使用double

*定义cuda
警告__CUDA_ARCH__未定义
警告使用float in cuda
如果__CUDA_ARCH__未定义,如何打印这个地址?

架构i386的未定义符号:
myKernel(float *,int),引用自:....

我知道文件由nvcc编译两次。第一个是OK( CUDACC 已定义, CUDA_ARCH > = 130),但第二次会发生什么?
CUDA_DEFINED ,但 CUDA_ARCH 未定义或值< 130?为什么?



感谢您的时间。

解决方案

可能会混淆两件事 - 如何区分主机和设备编译轨迹时,当nvcc处理CUDA代码,以及如何区分CUDA和非CUDA代码。两者之间有微妙的差别。 __ CUDA_ARCH __ 回答第一个问题, __ CUDACC __ 回答第二个问题。



请考虑以下代码段:

  #ifdef __CUDACC__ 
#warning使用nvcc

template< typename T>
__global__ void add(T * x,T * y,T * z)
{
int idx = threadIdx.x + blockDim.x * blockIdx.x;

z [idx] = x [idx] + y [idx];
}

#ifdef __CUDA_ARCH__
#warning设备代码轨迹
#if __CUDA_ARCH__> 120
#warning双精度编译
template void add< double>(double *,double *,double *);
#else
#warning用单精度编译
template void add< float>(float *,float *,float *);
#else
#warning nvcc主机代码轨迹
#endif
#else
#warning非nvcc代码轨迹
#endif

这里我们有一个模板化的CUDA内核,具有CUDA架构相关的实例化,主机代码的一个单独的节由 nvcc ,以及用于编译不由 nvcc 引导的主机代码的节。此操作如下所示:

  $ ln -s cudaarch.cu cudaarch.cc 
$ gcc -c cudaarch.cc -o cudaarch.o
cudaarch.cc:26:2:warning:#warning non-nvcc代码轨迹

$ nvcc -arch = sm_11 -Xptxas = - v-c cudaarch .cu -o cudaarch.cu.o
cudaarch.cu:3:2:warning:#warning使用nvcc
cudaarch.cu:14:2:warning:#warning设备代码轨迹
cudaarch.cu:19:2:warning:#warning编译单精度
cudaarch.cu:3:2:warning:#warning使用nvcc
cudaarch.cu:23:2:warning:#warning nvcc主机代码轨迹
ptxas信息:为'sm_11'编译条目函数'_Z3addIfEvPT_S1_S1_'
ptxas信息:使用4个寄存器,12 + 16字节smem

$ nvcc -arch = sm_20 -Xptxas = - v-c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2:warning:#warning使用nvcc
cudaarch.cu:14:2:警告:#warning设备代码轨迹
cudaarch.cu:16:2:warning:#warning双精度编译
cudaarch.cu:3:2:warning:#warning使用nvcc
cudaarch .cs:23:2:warning:#warning nvcc主机代码轨迹
ptxas信息:为'sm_20'编译条目函数'_Z3addIdEvPT_S1_S1_'
ptxas信息:使用8个寄存器,44个字节cmem [0]

从这里取走的点数有:




  • __ CUDACC __ 定义 nvcc 是否正在转向汇编

  • __ CUDA_ARCH __ 在编译主机代码时总是未定义,由 nvcc 不是

  • __ CUDA_ARCH __ 仅定义为由 nvcc



这三个信息总是足够有条件编译设备代码到不同的CUDA架构,主机端CUDA代码,和代码不是由 nvcc 编译的。 nvcc 文档有时有点简洁,但所有这一切都在编译轨迹的讨论中讨论。


The problem:

Having a .h, I want to define real to be double if compiling for c/c++ or for cuda with computing capability >= 1.3. If compiling for cuda with computing capability < 1.3 then define real to be float.

After many hours I came to this (which does not work )

#   if defined(__CUDACC__)

#       warning * making definitions for cuda

#       if defined(__CUDA_ARCH__)
#           warning __CUDA_ARCH__ is defined
#       else
#           warning __CUDA_ARCH__ is NOT defined
#       endif

#       if (__CUDA_ARCH__ >= 130)
#                       define real double
#                       warning using double in cuda
#       elif (__CUDA_ARCH__ >= 0)
#               define real float
#               warning using float in cuda
#               warning how the hell is this printed when __CUDA_ARCH__ is not defined?
#       else
#               define real 
#               error what the hell is the value of __CUDA_ARCH__ and how can I print it
#       endif

#   else
#       warning * making definitions for c/c++
#       define real double
#       warning using double for c/c++
#   endif

when I compile (note the -arch flag)

nvcc -arch compute_13  -Ilibcutil testFloatDouble.cu 

I get

* making definitions for cuda
__CUDA_ARCH__ is defined
using double in cuda

* making definitions for cuda
warning __CUDA_ARCH__ is NOT defined
warning using float in cuda
how the hell is this printed if __CUDA_ARCH__ is not defined now?

Undefined symbols for architecture i386:
  "myKernel(float*, int)", referenced from: ....

I know that files get compiled twice by nvcc. The first one is OK (CUDACC defined and CUDA_ARCH >= 130) but what happens the second time? CUDA_DEFINED but CUDA_ARCH undefined or with value < 130? Why ?

Thanks for your time.

解决方案

It seems you might be conflating two things - how to differentiate between the host and device compilation trajectories when nvcc is processing CUDA code, and how to differentiate between CUDA and non-CUDA code. There is a subtle difference between the two. __CUDA_ARCH__ answers the first question, and __CUDACC__ answers the second.

Consider the following code snippet:

#ifdef __CUDACC__
#warning using nvcc

template <typename T>
__global__ void add(T *x, T *y, T *z)
{
    int idx = threadIdx.x + blockDim.x * blockIdx.x;

    z[idx] = x[idx] + y[idx];
}

#ifdef __CUDA_ARCH__
#warning device code trajectory
#if __CUDA_ARCH__ > 120
#warning compiling with double precision
template void add<double>(double *, double *, double *);
#else
#warning compiling with single precision
template void add<float>(float *, float *, float *);
#else
#warning nvcc host code trajectory
#endif
#else
#warning non-nvcc code trajectory
#endif

Here we have a templated CUDA kernel with CUDA architecture dependent instantiation, a separate stanza for host code steeered by nvcc, and a stanza for compilation of host code not steered by nvcc. This behaves as follows:

$ ln -s cudaarch.cu cudaarch.cc
$ gcc -c cudaarch.cc -o cudaarch.o
cudaarch.cc:26:2: warning: #warning non-nvcc code trajectory

$ nvcc -arch=sm_11 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:19:2: warning: #warning compiling with single precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIfEvPT_S1_S1_' for 'sm_11'
ptxas info    : Used 4 registers, 12+16 bytes smem

$ nvcc -arch=sm_20 -Xptxas="-v" -c cudaarch.cu -o cudaarch.cu.o
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:14:2: warning: #warning device code trajectory
cudaarch.cu:16:2: warning: #warning compiling with double precision
cudaarch.cu:3:2: warning: #warning using nvcc
cudaarch.cu:23:2: warning: #warning nvcc host code trajectory
ptxas info    : Compiling entry function '_Z3addIdEvPT_S1_S1_' for 'sm_20'
ptxas info    : Used 8 registers, 44 bytes cmem[0]

The take away points from this are:

  • __CUDACC__ defines whether nvcc is steering compilation or not
  • __CUDA_ARCH__is always undefined when compiling host code, steered by nvcc or not
  • __CUDA_ARCH__is only defined for the device code trajectory of compilation steered by nvcc

Those three pieces of information are always enough to have conditional compilation for device code to different CUDA architectures, host side CUDA code, and code not compiled by nvccat all. The nvccdocumentation is a bit terse at times, but all of this is covered in the discussion on compilation trajectories.

这篇关于CUDA和nvcc:使用预处理器在float或double之间进行选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆