在CUDA中分配设备变量时出现问题 [英] Having problem assigning a device variable in CUDA
问题描述
我无法尝试为设备变量分配值,然后将其复制到主机变量。
I'm having trouble trying to assign a value to a device variable and then copying this to a host variable.
我从d_test和h_test = 0.0开始。我有一个简单的内核将设备变量d_test设置为1.0。然后我将它复制到宿主变量h_test并打印。问题是,当我打印我得到h_test = 0.0。我究竟做错了什么?以下是代码:
I start with d_test and h_test = 0.0. I have a simple kernel to set the device variable, d_test, to 1.0. I then copy this to the host variable h_test and print. The problem is that when I print I get h_test = 0.0. What am I doing wrong? Here's the code:
// -*- mode: C -*-
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
// device variable and kernel
__device__ float d_test;
__global__ void kernel1(float d_test) { d_test = 1.0; }
int main() {
// initialise variables
float h_test = 0.0;
cudaMemset(&d_test,0,sizeof(float));
// invoke kernel
kernel1 <<<1,1>>> (d_test);
// Copy device variable to host and print
cudaMemcpy(&h_test,&d_test,sizeof(float),cudaMemcpyDeviceToHost);
printf("%f\n",h_test);
}
推荐答案
-
正如pezcode所说,
kernel1
的参数d_test
会影响你的全局变量,因此当它分配给d_test
时,的参数,而不是你想要的全局变量。kernel1
不需要在此示例中使用参数。
As pezcode correctly notes,
kernel1
's parameterd_test
shadows your global variable, so when it assigns tod_test
, it is actually changing the value of its parameter, instead of the global variable as you intend.kernel1
need not take an argument in this example.
而不是 cudaMemcpy 时,使用
cudaMemcpyFromSymbol
li>
Instead of cudaMemcpy
, use cudaMemcpyFromSymbol
when copying from a global __device__
variable.
这是完整的解决方案:
// -*- mode: C -*-
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
// device variable and kernel
__device__ float d_test;
__global__ void kernel1() { d_test = 1.0; }
int main() {
// initialise variables
float h_test = 0.0;
cudaMemset(&d_test,0,sizeof(float));
// invoke kernel
kernel1 <<<1,1>>> ();
// Copy device variable to host and print
cudaMemcpyFromSymbol(&h_test, "d_test", sizeof(float), 0, cudaMemcpyDeviceToHost);
printf("%f\n",h_test);
}
输出:
$ nvcc test.cu -run
1.000000
这篇关于在CUDA中分配设备变量时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!