CUDA分配和从GPU到CPU的返回数组 [英] Cuda allocation and return array from gpu to cpu

查看:235
本文介绍了CUDA分配和从GPU到CPU的返回数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Cuda中有以下代码(不是完整代码). 我正在尝试检查是否从主机到设备以及从主机正确复制了阵列 要托管的设备.

I have the following code in Cuda (it's not the full code). I'm trying to check if it copies properly the arrays from host to device and from device to host.

flVector使用一些数字和索引进行初始化.

flVector is initialized with a few numbers as well as indeces.

pass函数需要复制flVector并将indes移至设备存储器. 基本上,在调用pass函数之后,我试图再次将数组复制到设备上,但是现在又要从设备复制到主机上,然后打印值以检查值是否正确.

The pass function needs to copy flVector and indeces to the device memory. In the main, after I'm calling to pass function, I'm trying to copy again the arrays but now from device to host, and then print the values to check if the values are correct.

flat_h返回正确并且值正确,但是indeces返回带有垃圾值,并且我不知道代码有什么问题.

flat_h returns properly and the values are correct, but indeces returns with garbage values, and i don't know what is the problem with the code.

从传递函数返回两个变量,我使用return命令返回flOnDevice,并且我还传递了一个指向inOnDevice的指针来保存此数组. 这两个变量在设备端,然后我试图将它们复制回主机. 这只是检查是否一切正常..但是当我打印inOnDevice时,我得到的是垃圾值.为什么?

to return from the pass function two variables I used the return command to return flOnDevice, and i'm also passing a pointer to inOnDevice to save this array. this two variables are on the device side, and then i'm trying to copy them back to host. this is just a check to see that everything is going properly.. but when I print the inOnDevice i'm getting garbage values. why?

 int* pass(vector<int>& flVector, int* indeces, int inSize, int*   inOnDevice)
 {
   int* flOnDevice;

   cudaMalloc((void**) &(flOnDevice), sizeof(int) * flVector.size());

   cudaMemcpy(flOnDevice, &flVector[0], flVector.size()*sizeof(int),cudaMemcpyHostToDevice);

   cudaMalloc((void**) &(inOnDevice), sizeof(int) * inSize);

   cudaMemcpy(inOnDevice, indeces, inSize*sizeof(int), cudaMemcpyHostToDevice);
   return flOnDevice;
}

void main()
{
    int* insOnDevice = NULL;
    int* flOnDevice;

    flOnDevice = pass(flVector, indeces, indSize, inOnDevice);

    int* flat_h = (int*)malloc(flVector.size()*sizeof(int));
    int* inde_h = (int*)malloc(inSize*sizeof(int));


    cudaMemcpy(flat_h,flOnDevice,flVector.size()*sizeof(int),cudaMemcpyDeviceToHost);
    cudaMemcpy(inde_h,inOnDevice,inSize*sizeof(int),cudaMemcpyDeviceToHost);

    printf("flat_h: \n\n");
    for (int i =0; i < flVector.size(); i++)
        printf("%d, " , flat_h[i]);
    printf("\n\ninde_h: \n\n");
    for (int i =0; i < inSize; i++)
        printf("%d, " , inde_h[i]);
    printf("\n\n");
}

推荐答案

这不是您想的那样:

int* pass(vector<int>& flVector, int* indeces, int inSize, int*   inOnDevice)
{
...
  cudaMalloc((void**) &(inOnDevice), sizeof(int) * inSize);

以这种方式将指针传递给函数时,就是按值传递指针 .

When you pass a pointer to a function this way, you are passing the pointer by value.

如果您随后将该指针传递给值的地址放在函数内部,则该地址与函数调用上下文中的任何内容都没有关系.在函数pass中,有*inOnDevice local 副本,您可以通过随后的cudaMalloc操作修改该 local副本.

If you then take the address of that pointer-passed-by-value inside the function, that address has no connection to anything in the function calling context. Inside the function pass, there is a local copy of *inOnDevice, and you are modifying that local copy with the subsequent cudaMalloc operation.

相反,在这种情况下,您需要传递一个指向指针的指针(模拟按引用传递)或按引用传递.对于指针指向指针的示例,它看起来像这样:

Instead, you need to pass a pointer-to-a-pointer in this situation (simulated pass-by-reference) or else pass by reference. For the pointer-to-a-pointer example, it would look something like this:

int* pass(vector<int>& flVector, int* indeces, int inSize, int**   inOnDevice)
{
...
  cudaMalloc((void**) inOnDevice, sizeof(int) * inSize);

  cudaMemcpy(*inOnDevice, indeces, inSize*sizeof(int), cudaMemcpyHostToDevice);

main中:

flOnDevice = pass(flVector, indeces, indSize, &inOnDevice);

我想如果您使用过建议您,您会看到以下代码行返回了错误:

And I think if you had used proper cuda error checking as I suggested to you before, you would have seen an error returned from this line of code:

cudaMemcpy(inde_h,inOnDevice,inSize*sizeof(int),cudaMemcpyDeviceToHost);

这篇关于CUDA分配和从GPU到CPU的返回数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆