主机浮动常量在CUDA内核中的使用 [英] host float constant usage in a kernel in CUDA

查看:245
本文介绍了主机浮动常量在CUDA内核中的使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是CUDA 5.0。我注意到编译器将允许我在内核中使用主机声明的 int 常量。但是,它拒绝编译使用主机声明的 float 常量的任何内核。有没有人知道这个看起来的差异的原因?

I am using CUDA 5.0. I noticed that the compiler will allow me to use host-declared int constants within kernels. However, it refuses to compile any kernels that use host-declared float constants. Does anyone know the reason for this seeming discrepancy?

例如,以下代码运行正常,但如果内核中的最后一行未注释。

For example, the following code runs just fine as is, but it will not compile if the final line in the kernel is uncommented.

#include <cstdio>
#include <cuda_runtime.h>

static int   __constant__ DEV_INT_CONSTANT   = 1;
static float __constant__ DEV_FLOAT_CONSTANT = 2.0f;

static int   const        HST_INT_CONSTANT   = 3;
static float const        HST_FLOAT_CONSTANT = 4.0f;

__global__ void uselessKernel(float * val)
{
    *val = 0.0f;

    // Use device int and float constants
    *val += DEV_INT_CONSTANT;
    *val += DEV_FLOAT_CONSTANT;

    // Use host int and float constants
    *val += HST_INT_CONSTANT;
    //*val += HST_FLOAT_CONSTANT; // won't compile if uncommented
}

int main(void)
{
    float * d_val;
    cudaMalloc((void **)&d_val, sizeof(float));

    uselessKernel<<<1, 1>>>(d_val);

    cudaFree(d_val);
}

谢谢,

Aaron

推荐答案

在设备代码中添加一个常数可以,但在主机内存中设备代码不是

Adding a const number in the device code is OK, but adding a number stored on the host memory in the device code is NOT.

代码中的 static const int 的每个引用都可以替换为 3 由编译器/优化器在该变量的addr从未被引用时。在这种情况下,它就像 #define HST_INT_CONSTANT 3 ,并且没有为此变量分配主机内存。

Every reference of the static const int in your code can be replaced with the value 3 by the compiler/optimizer when the addr of that variable is never referenced. In this case, it is like #define HST_INT_CONSTANT 3, and no host memory is allocated for this variable.

但是对于 float var,即使它是 static const float ,也总是分配主机内存。由于内核不能直接访问主机内存,因此不会编译 static const float 的代码。

But for float var, the host memory is always allocated even it is of static const float. Since the kernel can not access the host memory directly, your code with static const float won't be compiled.

对于C / C ++, int 可以比 float 更积极地优化。

For C/C++, int can be optimized more aggressively than float.

您的代码运行时,注释为ON可以看作是CUDA CI的一个bug。 static const int 是主机端的东西,应该不能被设备直接访问。

You code runs when the comment is ON can be seen as a bug of CUDA C I think. The static const int is a host side thing, and should not be accessible to the device directly.

这篇关于主机浮动常量在CUDA内核中的使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆