CUDA双指针内存复制 [英] CUDA Double pointer memory copy

查看:309
本文介绍了CUDA双指针内存复制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了这样的示例代码。

  int ** d_ptr; 
cudaMalloc((void **)& d_ptr,sizeof(int *)* N);

int * tmp_ptr [N];
for(int i = 0; i cudaMalloc((void **)& tmp_ptr [i],sizeof(int)* SIZE);
cudaMemcpy(d_ptr,tmp_ptr,sizeof(tmp_ptr),cudaMemcpyHostToDevice);

这个代码运行得很好,但是在内核启动后,我无法接收结果。

  int * Mtx_on_GPU [N]; 
cudaMemcpy(Mtx_on_GPU,d_ptr,sizeof(int)* N * SIZE,cudaMemcpyDeviceToHost);

此时,发生段错误错误。但我不知道我错了什么。

  int * Mtx_on_GPU [N]; 
for(int i = 0; i cudaMemcpy(Mtx_on_GPU [i],d_ptr [i],sizeof(int)* SIZE,cudaMemcpyDeviceToHost);

此代码也有同样的错误。



我认为我的代码有一些错误,但我在白天无法找到它。



$

  cudaMemcpy(Mtx_on_GPU [i],d_ptr [i],sizeof(int)* SIZE,cudaMemcpyDeviceToHost); 

您正在尝试将数据从设备复制到主机(注意:

内存,所以你不能直接从主机端访问。该行应为

  cudaMemcpy(Mtx_on_GPU [i],temp_ptr [i],sizeof(int)* SIZE,cudaMemcpyDeviceToHost); 






这可能会变得更清楚当使用变量名:

  int ** devicePointersStoredInDeviceMemory; 
cudaMalloc((void **)& devicePointersStoredInDeviceMemory,sizeof(int *)* N);

int * devicePointersStoredInHostMemory [N];
for(int i = 0; i cudaMalloc((void **)& devicePointersStoredInHostMemory [i],sizeof(int)* SIZE);

cudaMemcpy(
devicePointersStoredInDeviceMemory,
devicePointersStoredInHostMemory,
sizeof(int *)* N,cudaMemcpyHostToDevice);

//在这里调用内核,传递devicePointersStoredInDeviceMemory
//作为参数
...

int * hostPointersStoredInHostMemory [N];
for(int i = 0; i int * hostPointer = hostPointersStoredInHostMemory [i];
//(为hostPointer分配内存!)

int * devicePointer = devicePointersStoredInHostMemory [i];

cudaMemcpy(hostPointer,devicePointer,sizeof(int)* SIZE,cudaMemcpyDeviceToHost);
}






comment:



d_ptr 是一个指针数组。但是该数组的内存分配有 cudaMalloc 。这意味着它位于设备上。与此相反,使用 int * Mtx_on_GPU [N]; 您在主机内存中分配N个指针。而不是指定数组大小,您也可以使用 malloc 。在比较以下分配时,它可能变得更清楚:

  int ** pointersStoredInDeviceMemory; 
cudaMalloc((void **)& pointersStoredInDeviceMemory,sizeof(int *)* N);

int ** pointersStoredInHostMemory;
pointersStoredInHostMemory =(void **)malloc(N * sizeof(int *));

//这是不可能的,因为数组分配有cudaMalloc:
int * pointerA = pointersStoredInDeviceMemory [0];

//这是可能的,因为数组被分配了malloc:
int * pointerB = pointersStoredInHostMemory [0];

这可能有点扭曲以跟踪




  • 存储指针存储的内存类型

  • 指针的内存类型指向



,但幸运的是,它几乎不会超过2个边界。


I wrote my sample code like this.

int ** d_ptr;
cudaMalloc( (void**)&d_ptr, sizeof(int*)*N );

int* tmp_ptr[N];
for(int i=0; i<N; i++)
    cudaMalloc( (void**)&tmp_ptr[i], sizeof(int)*SIZE );
cudaMemcpy(d_ptr, tmp_ptr, sizeof(tmp_ptr), cudaMemcpyHostToDevice);

And this code works well but after kernel launching I can't receive the result.

int* Mtx_on_GPU[N];
cudaMemcpy(Mtx_on_GPU, d_ptr, sizeof(int)*N*SIZE, cudaMemcpyDeviceToHost);

At this point, segment-fault-error occurs. But I don't know what I'm wrong.

int* Mtx_on_GPU[N];
for(int i=0; i<N; i++)
    cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

This code have also same error.

I think certainly my code has some mistakes but I can't find it during all daytime.

Give me some advice.

解决方案

In the last line

cudaMemcpy(Mtx_on_GPU[i], d_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);

you are trying to copy data from the device to the host (NOTE: I assume that you allocated host memory for the Mtx_on_GPU pointers!)

However, the pointers are stored in device memory, so you can't access the directly from host side. The line should be

cudaMemcpy(Mtx_on_GPU[i], temp_ptr[i], sizeof(int)*SIZE, cudaMemcpyDeviceToHost);


This may become clearer when using "overly elaborate" variable names:

int ** devicePointersStoredInDeviceMemory;
cudaMalloc( (void**)&devicePointersStoredInDeviceMemory, sizeof(int*)*N);

int* devicePointersStoredInHostMemory[N];
for(int i=0; i<N; i++)
    cudaMalloc( (void**)&devicePointersStoredInHostMemory[i], sizeof(int)*SIZE );

cudaMemcpy(
    devicePointersStoredInDeviceMemory, 
    devicePointersStoredInHostMemory,
    sizeof(int*)*N, cudaMemcpyHostToDevice);

// Invoke kernel here, passing "devicePointersStoredInDeviceMemory"
// as an argument
...

int* hostPointersStoredInHostMemory[N];
for(int i=0; i<N; i++) {
    int* hostPointer = hostPointersStoredInHostMemory[i]; 
    // (allocate memory for hostPointer here!)

    int* devicePointer = devicePointersStoredInHostMemory[i];

    cudaMemcpy(hostPointer, devicePointer, sizeof(int)*SIZE, cudaMemcpyDeviceToHost);
}


EDIT in response to the comment:

The d_ptr is "an array of pointers". But the memory of this array is allocated with cudaMalloc. That means that it is located on the device. In contrast to that, with int* Mtx_on_GPU[N]; you are "allocating" N pointers in host memory. Instead of specifying the array size, you could also have used malloc. It may become clearer when you compare the following allocations:

int** pointersStoredInDeviceMemory;
cudaMalloc((void**)&pointersStoredInDeviceMemory, sizeof(int*)*N);

int** pointersStoredInHostMemory;
pointersStoredInHostMemory = (void**)malloc(N * sizeof(int*));

// This is not possible, because the array was allocated with cudaMalloc:
int *pointerA = pointersStoredInDeviceMemory[0];

// This is possible because the array was allocated with malloc:    
int *pointerB = pointersStoredInHostMemory[0];

It may be a little bit brain-twisting to keep track of

  • the type of the memory where the pointers are stored
  • the type of the memory that the pointers are pointing to

but fortunately, it hardly becomes more than 2 indirections.

这篇关于CUDA双指针内存复制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆