从主机到设备的CUDA传输2D阵列 [英] CUDA-transfer 2D array from host to device

查看:136
本文介绍了从主机到设备的CUDA传输2D阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我主要是2D矩阵。我要从主机转移到设备。您能告诉我如何分配内存并将其传输到设备内存吗?

I have a 2D matrix in the main. I want to transfer if from host to device. Can you tell me how I can allocate memory for it and transfer it to the device memory?

#define N 5
__global__ void kernel(int a[N][N]){
}
int main(void){

    int a[N][N];
    cudaMalloc(?);
    cudaMemcpy(?);
    kernel<<<N,N>>>(?);

}


推荐答案

也许

#define N 5 
__global__ void kernel(int *a)
{
    // Thread indexing within Grid - note these are
    // in column major order.
    int tidx = threadIdx.x + blockIdx.x * blockDim.x;
    int tidy = threadIdx.y + blockIdx.y * blockDim.y;

    // a_ij = a[i][j], where a is in row major order
    int a_ij = a[tidy +  tidx*N];
} 

int main(void)
{
    int a[N][N], *a_device;
    const size_t a_size = sizeof(int) * size_t(N*N);
    cudaMalloc((void **)&a_device, a_size); 
    cudaMemcpy(a_device, a, a_size, cudaMemcpyHostToDevice); 
    kernel<<<N,N>>>(a_device); 
} 

您可能会错过的要点是,当您静态声明这样的数组时 A [N] [N] ,它实际上只是一个行主要指令段线性存储器。编译器在 a [i] [j] a [j + i * N] 之间自动转换发出代码。在GPU上,您必须使用第二种访问方式来读取从主机复制的内存。

The point you might have missed is that when you statically declare an array like this A[N][N], it is really just a row major ordered piece of linear memory. The compiler is automatically converting between a[i][j] and a[j + i*N] when it emits code. On the GPU, you must use the second form of access to read the memory you copy from the host.

这篇关于从主机到设备的CUDA传输2D阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆