CUDA 分配数组数组 [英] CUDA allocating array of arrays

查看:27
本文介绍了CUDA 分配数组数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 CUDA 中分配数组时遇到了一些麻烦.

I have some trouble with allocate array of arrays in CUDA.

void ** data;
cudaMalloc(&data, sizeof(void**)*N); // allocates without problems
for(int i = 0; i < N; i++) {
    cudaMalloc(data + i, getSize(i) * sizeof(void*)); // seg fault is thrown
}

我做错了什么?

推荐答案

您必须将指针分配到主机内存,然后为每个数组分配设备内存并将其指针存储在主机内存中.然后分配用于存储指针的内存到设备中然后将主机内存复制到设备内存中.一个例子价值 1000 字:

You have to allocate the pointers to a host memory, then allocate device memory for each array and store it's pointer in the host memory. Then allocate the memory for storing the pointers into the device and then copy the host memory to the device memory. One example is worth 1000 words:

__global__ void multi_array_kernel( int N, void** arrays ){
    // stuff
}


int main(){

    const int N_ARRAYS = 20;
    void *h_array = malloc(sizeof(void*) * N_ARRAYS);
    for(int i = 0; i < N_ARRAYS; i++){
        cudaMalloc(&h_array[i], i * sizeof(void*));
        //TODO: check error
    }
    void *d_array = cudaMalloc(sizeof(void*) * N_ARRAYS);

    // Copy to device Memory
    cudaMemcpy(d_array, h_array, sizeof(void*) * N_ARRAYS, cudaMemcpyHostToDevice);

    multi_array_kernel<1,1>(N_ARRAYS, d_array);
    cudaThreadSynchronize();

    for(int i = 0; i < N_ARRAYS; i++){
        cudaFree(h_array[i]); //host not device memory
        //TODO: check error
    }
    cudaFree(d_array);
    free(h_array);
}

这篇关于CUDA 分配数组数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆