复制包含指向 CUDA 设备的指针的结构 [英] Copying a struct containing pointers to CUDA device

查看:32
本文介绍了复制包含指向 CUDA 设备的指针的结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个项目,我需要我的 CUDA 设备在包含指针的结构上进行计算.

I'm working on a project where I need my CUDA device to make computations on a struct containing pointers.

typedef struct StructA {
    int* arr;
} StructA;

当我为结构分配内存然后将其复制到设备时,它只会复制结构而不是指针的内容.现在我正在通过首先分配指针来解决这个问题,然后设置主机结构以使用该新指针(位于 GPU 上).以下代码示例使用上面的结构描述了这种方法:

When I allocate memory for the struct and then copy it to the device, it will only copy the struct and not the content of the pointer. Right now I'm working around this by allocating the pointer first, then set the host struct to use that new pointer (which resides on the GPU). The following code sample describes this approach using the struct from above:

#define N 10

int main() {

    int h_arr[N] = {1,2,3,4,5,6,7,8,9,10};
    StructA *h_a = (StructA*)malloc(sizeof(StructA));
    StructA *d_a;
    int *d_arr;

    // 1. Allocate device struct.
    cudaMalloc((void**) &d_a, sizeof(StructA));

    // 2. Allocate device pointer.
    cudaMalloc((void**) &(d_arr), sizeof(int)*N);

    // 3. Copy pointer content from host to device.
    cudaMemcpy(d_arr, h_arr, sizeof(int)*N, cudaMemcpyHostToDevice);

    // 4. Point to device pointer in host struct.
    h_a->arr = d_arr;

    // 5. Copy struct from host to device.
    cudaMemcpy(d_a, h_a, sizeof(StructA), cudaMemcpyHostToDevice);

    // 6. Call kernel.
    kernel<<<N,1>>>(d_a);

    // 7. Copy struct from device to host.
    cudaMemcpy(h_a, d_a, sizeof(StructA), cudaMemcpyDeviceToHost);

    // 8. Copy pointer from device to host.
    cudaMemcpy(h_arr, d_arr, sizeof(int)*N, cudaMemcpyDeviceToHost);

    // 9. Point to host pointer in host struct.
    h_a->arr = h_arr;
}

我的问题是:这是这样做的方式吗?

这看起来工作量很大,我提醒你这是一个非常简单的结构.如果我的结构体包含很多指针或者结构体本身带有指针,那么分配和复制的代码会相当庞大和混乱.

It seems like an awful lot of work, and I remind you that this is a very simple struct. If my struct contained a lot of pointers or structs with pointers themselves, the code for allocation and copy will be quite extensive and confusing.

推荐答案

CUDA 6 引入了统一内存,这使得这个深拷贝"问题变得更加容易.有关详细信息,请参阅这篇文章.

CUDA 6 introduces Unified Memory, which makes this "deep copy" problem a lot easier. See this post for more details.

不要忘记您可以将结构按值传递给内核.此代码有效:

Don't forget that you can pass structures by value to kernels. This code works:

// pass struct by value (may not be efficient for complex structures)
__global__ void kernel2(StructA in)
{
    in.arr[threadIdx.x] *= 2;
}

这样做意味着您只需要将数组复制到设备,而不是结构:

Doing so means you only have to copy the array to the device, not the structure:

int h_arr[N] = {1,2,3,4,5,6,7,8,9,10};
StructA h_a;
int *d_arr;

// 1. Allocate device array.
cudaMalloc((void**) &(d_arr), sizeof(int)*N);

// 2. Copy array contents from host to device.
cudaMemcpy(d_arr, h_arr, sizeof(int)*N, cudaMemcpyHostToDevice);

// 3. Point to device pointer in host struct.
h_a.arr = d_arr;

// 4. Call kernel with host struct as argument
kernel2<<<N,1>>>(h_a);

// 5. Copy pointer from device to host.
cudaMemcpy(h_arr, d_arr, sizeof(int)*N, cudaMemcpyDeviceToHost);

// 6. Point to host pointer in host struct 
//    (or do something else with it if this is not needed)
h_a.arr = h_arr;

这篇关于复制包含指向 CUDA 设备的指针的结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆