将结构数组从主机复制到设备cuda [英] Copying array of structs from host to device cuda

查看:162
本文介绍了将结构数组从主机复制到设备cuda的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个结构如下:

Suppose I have a struct as follows:

typedef struct values{
int one, int two, int three
} values;

现在,假设我在主机上创建了一个值数组并填充了随机数据

Now, suppose I create an array of values on the host and populate with random data

values vals*;
__device__ values* d_vals;
int main(){
     vals = (values*)malloc(sizeof(values) * A_LARGE_NUMBER);
     PopulateWithDate(); //populates vals with random data
}

现在我希望能够复制设备的值,这样我就可以像这样在内核中访问它们:

Now I want to be able to copy the values to the device such that I can access them in my kernel like so:

__global__ void myKernel(){
     printf("%d", d_vals[0].one);//I don't really want to print, but whenever I try to access I get an error
}

无论如何我都会遇到非法的内存访问错误。

Whatever I try I get an illegal memory access was encountered error.

这是我的当前尝试:

int main(){
     vals = (values*)malloc(sizeof(values) * A_LARGE_NUMBER);
     PopulateWithDate(); //populates vals with random data

     values* d_ptr;
     cudaGetSymbolAddress((void**)&d_ptr, d_vals);
     cudaMalloc((void**)&d_ptr, A_LARGE_NUMBER * sizeof(values));

     cudaMemcpyToSymbol(d_ptr, &vals, sizeof(values) * A_LARGE_NUMBER);
     cudaDeviceSynchronize();
     dim3    blocksPerGrid(2, 2);
     dim3    threadsPerBlock(16, 16);

    myKernel<< <blocksPerGrid, threadsPerBlock >> >();
}


推荐答案

对于您显示的内容到目前为止,使用 __ device __ 指针变量只会产生不必要的复杂性。只需使用使用 cudaMalloc 的普通动态分配进行设备存储,否则就可以采用类似于任何CUDA示例代码的方法,例如vectorAdd。这是一个示例:

For what you have shown so far, using a __device__ pointer variable just creates needless complexity. Just use an ordinary dynamic allocation using cudaMalloc for device storage, and otherwise follow an approach similar to any of the CUDA sample codes such as vectorAdd. Here is an example:

$ cat t1315.cu
#include <stdio.h>
#define A_LARGE_NUMBER 10

struct values{
int one, two, three;
};

values *vals;

__global__ void myKernel(values *d_vals){
     printf("%d\n", d_vals[0].one);
}

void PopulateWithData(){
  for (int i = 0; i < A_LARGE_NUMBER; i++){
    vals[i].one = 1;
    vals[i].two = 2;
    vals[i].three = 3;
  }
}


int main(){
     vals = (values*)malloc(sizeof(values) * A_LARGE_NUMBER);
     PopulateWithData(); //populates vals with random data

     values* d_ptr;
     cudaMalloc((void**)&d_ptr, A_LARGE_NUMBER * sizeof(values));
     cudaMemcpy(d_ptr, vals, A_LARGE_NUMBER *sizeof(values),cudaMemcpyHostToDevice);
     dim3    blocksPerGrid(1,1);
     dim3    threadsPerBlock(1, 1);

    myKernel<< <blocksPerGrid, threadsPerBlock >> >(d_ptr);
    cudaDeviceSynchronize();
}
$ nvcc -arch=sm_35 -o t1315 t1315.cu
$ cuda-memcheck ./t1315
========= CUDA-MEMCHECK
1
========= ERROR SUMMARY: 0 errors
$

您所显示的内容还有其他一些基本(非CUDA)编码错误,我不会尝试全部解决。

You had a variety of other basic (non-CUDA) coding errors in what you had shown, I'm not going to try and run through them all.

如果您确实要保留 __ device __ 指针变量,并使用该变量指向设备数据(结构数组),则仍然需要使用 cudaMalloc ,整个过程需要其他步骤。您可以按照答案此处

If you really want to retain your __device__ pointer variable, and use that to point to the device data (array of structs) then you will still need to use cudaMalloc, and the overall process takes additional steps. You can follow the example worked out in the answer here.

在该示例之后,对上述代码进行了一系列更改,以使其与 __ device __ 指针变量一起使用而不是作为内核参数传递的指针:

Following that example, here's a set of changes to the above code to make it work with a __device__ pointer variable instead of a pointer passed as a kernel parameter:

$ cat t1315.cu
#include <stdio.h>
#define A_LARGE_NUMBER 10

struct values{
int one, two, three;
};

values *vals;
__device__ values *d_vals;

__global__ void myKernel(){
     printf("%d\n", d_vals[0].one);
}

void PopulateWithData(){
  for (int i = 0; i < A_LARGE_NUMBER; i++){
    vals[i].one = 1;
    vals[i].two = 2;
    vals[i].three = 3;
  }
}


int main(){
     vals = (values*)malloc(sizeof(values) * A_LARGE_NUMBER);
     PopulateWithData(); //populates vals with random data

     values* d_ptr;
     cudaMalloc((void**)&d_ptr, A_LARGE_NUMBER * sizeof(values));
     cudaMemcpy(d_ptr, vals, A_LARGE_NUMBER *sizeof(values),cudaMemcpyHostToDevice);
     cudaMemcpyToSymbol(d_vals, &d_ptr, sizeof(values*));
     dim3    blocksPerGrid(1,1);
     dim3    threadsPerBlock(1, 1);

    myKernel<< <blocksPerGrid, threadsPerBlock >> >();
    cudaDeviceSynchronize();
}
$ nvcc -arch=sm_35 -o t1315 t1315.cu
$ cuda-memcheck ./t1315
========= CUDA-MEMCHECK
1
========= ERROR SUMMARY: 0 errors
$

这篇关于将结构数组从主机复制到设备cuda的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆