CUDA内核中的主机浮点常量使用 [英] Host float constant usage in a kernel in CUDA

查看:35
本文介绍了CUDA内核中的主机浮点常量使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 CUDA 5.0.我注意到编译器将允许我在内核中使用主机声明的 int 常量.但是,它拒绝编译任何使用主机声明的 float 常量的内核.有谁知道这种看似差异的原因?

I am using CUDA 5.0. I noticed that the compiler will allow me to use host-declared int constants within kernels. However, it refuses to compile any kernels that use host-declared float constants. Does anyone know the reason for this seeming discrepancy?

例如,以下代码可以正常运行,但如果内核中的最后一行未注释,则无法编译.

For example, the following code runs just fine as is, but it will not compile if the final line in the kernel is uncommented.

#include <cstdio>
#include <cuda_runtime.h>

static int   __constant__ DEV_INT_CONSTANT   = 1;
static float __constant__ DEV_FLOAT_CONSTANT = 2.0f;

static int   const        HST_INT_CONSTANT   = 3;
static float const        HST_FLOAT_CONSTANT = 4.0f;

__global__ void uselessKernel(float * val)
{
    *val = 0.0f;

    // Use device int and float constants
    *val += DEV_INT_CONSTANT;
    *val += DEV_FLOAT_CONSTANT;

    // Use host int and float constants
    *val += HST_INT_CONSTANT;
    //*val += HST_FLOAT_CONSTANT; // won't compile if uncommented
}

int main(void)
{
    float * d_val;
    cudaMalloc((void **)&d_val, sizeof(float));

    uselessKernel<<<1, 1>>>(d_val);

    cudaFree(d_val);
}

推荐答案

在设备代码中添加一个 const 数字是可以的,但是在设备代码中添加一个存储在主机内存中的数字是不行的.

Adding a const number in the device code is OK, but adding a number stored on the host memory in the device code is NOT.

当从不引用该变量的 addr 时,编译器/优化器可以将代码中 static const int 的每个引用替换为值 3.在这种情况下,它就像#define HST_INT_CONSTANT 3,并且没有为这个变量分配主机内存.

Every reference of the static const int in your code can be replaced with the value 3 by the compiler/optimizer when the addr of that variable is never referenced. In this case, it is like #define HST_INT_CONSTANT 3, and no host memory is allocated for this variable.

但是对于float var,即使是static const float,也总是分配主机内存.由于内核无法直接访问主机内存,因此您的带有 static const float 的代码将不会被编译.

But for float var, the host memory is always allocated even it is of static const float. Since the kernel can not access the host memory directly, your code with static const float won't be compiled.

对于 C/C++,int 可以比 float 更积极地优化.

For C/C++, int can be optimized more aggressively than float.

我认为当注释为 ON 时运行的代码可以看作是 CUDA C 的错误.static const int 是主机端的东西,设备不能直接访问.

You code runs when the comment is ON can be seen as a bug of CUDA C I think. The static const int is a host side thing, and should not be accessible to the device directly.

这篇关于CUDA内核中的主机浮点常量使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆