主机浮动常量在CUDA内核中的使用 [英] host float constant usage in a kernel in CUDA
问题描述
我使用的是CUDA 5.0。我注意到编译器将允许我在内核中使用主机声明的 int 常量。但是,它拒绝编译使用主机声明的 float 常量的任何内核。有没有人知道这个看起来的差异的原因?
I am using CUDA 5.0. I noticed that the compiler will allow me to use host-declared int constants within kernels. However, it refuses to compile any kernels that use host-declared float constants. Does anyone know the reason for this seeming discrepancy?
例如,以下代码运行正常,但如果内核中的最后一行未注释。
For example, the following code runs just fine as is, but it will not compile if the final line in the kernel is uncommented.
#include <cstdio>
#include <cuda_runtime.h>
static int __constant__ DEV_INT_CONSTANT = 1;
static float __constant__ DEV_FLOAT_CONSTANT = 2.0f;
static int const HST_INT_CONSTANT = 3;
static float const HST_FLOAT_CONSTANT = 4.0f;
__global__ void uselessKernel(float * val)
{
*val = 0.0f;
// Use device int and float constants
*val += DEV_INT_CONSTANT;
*val += DEV_FLOAT_CONSTANT;
// Use host int and float constants
*val += HST_INT_CONSTANT;
//*val += HST_FLOAT_CONSTANT; // won't compile if uncommented
}
int main(void)
{
float * d_val;
cudaMalloc((void **)&d_val, sizeof(float));
uselessKernel<<<1, 1>>>(d_val);
cudaFree(d_val);
}
谢谢,
Aaron
推荐答案
在设备代码中添加一个常数可以,但在主机内存中设备代码不是
Adding a const number in the device code is OK, but adding a number stored on the host memory in the device code is NOT.
代码中的 static const int
的每个引用都可以替换为 3
由编译器/优化器在该变量的addr从未被引用时。在这种情况下,它就像 #define HST_INT_CONSTANT 3
,并且没有为此变量分配主机内存。
Every reference of the static const int
in your code can be replaced with the value 3
by the compiler/optimizer when the addr of that variable is never referenced. In this case, it is like #define HST_INT_CONSTANT 3
, and no host memory is allocated for this variable.
但是对于 float
var,即使它是 static const float
,也总是分配主机内存。由于内核不能直接访问主机内存,因此不会编译 static const float
的代码。
But for float
var, the host memory is always allocated even it is of static const float
. Since the kernel can not access the host memory directly, your code with static const float
won't be compiled.
对于C / C ++, int
可以比 float
更积极地优化。
For C/C++, int
can be optimized more aggressively than float
.
您的代码运行时,注释为ON可以看作是CUDA CI的一个bug。 static const int
是主机端的东西,应该不能被设备直接访问。
You code runs when the comment is ON can be seen as a bug of CUDA C I think. The static const int
is a host side thing, and should not be accessible to the device directly.
这篇关于主机浮动常量在CUDA内核中的使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!