将结构数组从主机复制到设备cuda [英] Copying array of structs from host to device cuda
问题描述
假设我有一个结构如下:
Suppose I have a struct as follows:
typedef struct values{
int one, int two, int three
} values;
现在,假设我在主机上创建了一个值数组并填充了随机数据
Now, suppose I create an array of values on the host and populate with random data
values vals*;
__device__ values* d_vals;
int main(){
vals = (values*)malloc(sizeof(values) * A_LARGE_NUMBER);
PopulateWithDate(); //populates vals with random data
}
现在我希望能够复制设备的值,这样我就可以像这样在内核中访问它们:
Now I want to be able to copy the values to the device such that I can access them in my kernel like so:
__global__ void myKernel(){
printf("%d", d_vals[0].one);//I don't really want to print, but whenever I try to access I get an error
}
无论如何我都会遇到非法的内存访问错误。
Whatever I try I get an illegal memory access was encountered error.
这是我的当前尝试:
int main(){
vals = (values*)malloc(sizeof(values) * A_LARGE_NUMBER);
PopulateWithDate(); //populates vals with random data
values* d_ptr;
cudaGetSymbolAddress((void**)&d_ptr, d_vals);
cudaMalloc((void**)&d_ptr, A_LARGE_NUMBER * sizeof(values));
cudaMemcpyToSymbol(d_ptr, &vals, sizeof(values) * A_LARGE_NUMBER);
cudaDeviceSynchronize();
dim3 blocksPerGrid(2, 2);
dim3 threadsPerBlock(16, 16);
myKernel<< <blocksPerGrid, threadsPerBlock >> >();
}
推荐答案
对于您显示的内容到目前为止,使用 __ device __
指针变量只会产生不必要的复杂性。只需使用使用 cudaMalloc
的普通动态分配进行设备存储,否则就可以采用类似于任何CUDA示例代码的方法,例如vectorAdd。这是一个示例:
For what you have shown so far, using a __device__
pointer variable just creates needless complexity. Just use an ordinary dynamic allocation using cudaMalloc
for device storage, and otherwise follow an approach similar to any of the CUDA sample codes such as vectorAdd. Here is an example:
$ cat t1315.cu
#include <stdio.h>
#define A_LARGE_NUMBER 10
struct values{
int one, two, three;
};
values *vals;
__global__ void myKernel(values *d_vals){
printf("%d\n", d_vals[0].one);
}
void PopulateWithData(){
for (int i = 0; i < A_LARGE_NUMBER; i++){
vals[i].one = 1;
vals[i].two = 2;
vals[i].three = 3;
}
}
int main(){
vals = (values*)malloc(sizeof(values) * A_LARGE_NUMBER);
PopulateWithData(); //populates vals with random data
values* d_ptr;
cudaMalloc((void**)&d_ptr, A_LARGE_NUMBER * sizeof(values));
cudaMemcpy(d_ptr, vals, A_LARGE_NUMBER *sizeof(values),cudaMemcpyHostToDevice);
dim3 blocksPerGrid(1,1);
dim3 threadsPerBlock(1, 1);
myKernel<< <blocksPerGrid, threadsPerBlock >> >(d_ptr);
cudaDeviceSynchronize();
}
$ nvcc -arch=sm_35 -o t1315 t1315.cu
$ cuda-memcheck ./t1315
========= CUDA-MEMCHECK
1
========= ERROR SUMMARY: 0 errors
$
您所显示的内容还有其他一些基本(非CUDA)编码错误,我不会尝试全部解决。
You had a variety of other basic (non-CUDA) coding errors in what you had shown, I'm not going to try and run through them all.
如果您确实要保留 __ device __
指针变量,并使用该变量指向设备数据(结构数组),则仍然需要使用 cudaMalloc
,整个过程需要其他步骤。您可以按照答案此处。
If you really want to retain your __device__
pointer variable, and use that to point to the device data (array of structs) then you will still need to use cudaMalloc
, and the overall process takes additional steps. You can follow the example worked out in the answer here.
在该示例之后,对上述代码进行了一系列更改,以使其与 __ device __
指针变量一起使用而不是作为内核参数传递的指针:
Following that example, here's a set of changes to the above code to make it work with a __device__
pointer variable instead of a pointer passed as a kernel parameter:
$ cat t1315.cu
#include <stdio.h>
#define A_LARGE_NUMBER 10
struct values{
int one, two, three;
};
values *vals;
__device__ values *d_vals;
__global__ void myKernel(){
printf("%d\n", d_vals[0].one);
}
void PopulateWithData(){
for (int i = 0; i < A_LARGE_NUMBER; i++){
vals[i].one = 1;
vals[i].two = 2;
vals[i].three = 3;
}
}
int main(){
vals = (values*)malloc(sizeof(values) * A_LARGE_NUMBER);
PopulateWithData(); //populates vals with random data
values* d_ptr;
cudaMalloc((void**)&d_ptr, A_LARGE_NUMBER * sizeof(values));
cudaMemcpy(d_ptr, vals, A_LARGE_NUMBER *sizeof(values),cudaMemcpyHostToDevice);
cudaMemcpyToSymbol(d_vals, &d_ptr, sizeof(values*));
dim3 blocksPerGrid(1,1);
dim3 threadsPerBlock(1, 1);
myKernel<< <blocksPerGrid, threadsPerBlock >> >();
cudaDeviceSynchronize();
}
$ nvcc -arch=sm_35 -o t1315 t1315.cu
$ cuda-memcheck ./t1315
========= CUDA-MEMCHECK
1
========= ERROR SUMMARY: 0 errors
$
这篇关于将结构数组从主机复制到设备cuda的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!