Cuda 分配和从 gpu 到 cpu 返回数组 [英] Cuda allocation and return array from gpu to cpu

查看:34
本文介绍了Cuda 分配和从 gpu 到 cpu 返回数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Cuda 中有以下代码(它不是完整代码).我正在尝试检查它是否正确地将阵列从主机复制到设备以及从要托管的设备.

I have the following code in Cuda (it's not the full code). I'm trying to check if it copies properly the arrays from host to device and from device to host.

flVector 用几个数字和 indeces 初始化.

flVector is initialized with a few numbers as well as indeces.

pass 函数需要将 flVector 和 indeces 复制到设备内存中.在主要方面,在我调用传递函数之后,我试图再次复制数组,但现在从设备复制到主机,然后打印值以检查值是否正确.

The pass function needs to copy flVector and indeces to the device memory. In the main, after I'm calling to pass function, I'm trying to copy again the arrays but now from device to host, and then print the values to check if the values are correct.

flat_h 返回正确且值正确,但 indeces 返回垃圾值,我不知道代码有什么问题.

flat_h returns properly and the values are correct, but indeces returns with garbage values, and i don't know what is the problem with the code.

为了从 pass 函数返回两个变量,我使用 return 命令返回 flOnDevice,并且我还传递了一个指向 inOnDevice 的指针来保存这个数组.这两个变量在设备端,然后我试图将它们复制回主机.这只是检查一切是否正常.但是当我打印 inOnDevice 时,我得到了垃圾值.为什么?

to return from the pass function two variables I used the return command to return flOnDevice, and i'm also passing a pointer to inOnDevice to save this array. this two variables are on the device side, and then i'm trying to copy them back to host. this is just a check to see that everything is going properly.. but when I print the inOnDevice i'm getting garbage values. why?

 int* pass(vector<int>& flVector, int* indeces, int inSize, int*   inOnDevice)
 {
   int* flOnDevice;

   cudaMalloc((void**) &(flOnDevice), sizeof(int) * flVector.size());

   cudaMemcpy(flOnDevice, &flVector[0], flVector.size()*sizeof(int),cudaMemcpyHostToDevice);

   cudaMalloc((void**) &(inOnDevice), sizeof(int) * inSize);

   cudaMemcpy(inOnDevice, indeces, inSize*sizeof(int), cudaMemcpyHostToDevice);
   return flOnDevice;
}

void main()
{
    int* insOnDevice = NULL;
    int* flOnDevice;

    flOnDevice = pass(flVector, indeces, indSize, inOnDevice);

    int* flat_h = (int*)malloc(flVector.size()*sizeof(int));
    int* inde_h = (int*)malloc(inSize*sizeof(int));


    cudaMemcpy(flat_h,flOnDevice,flVector.size()*sizeof(int),cudaMemcpyDeviceToHost);
    cudaMemcpy(inde_h,inOnDevice,inSize*sizeof(int),cudaMemcpyDeviceToHost);

    printf("flat_h: \n\n");
    for (int i =0; i < flVector.size(); i++)
        printf("%d, " , flat_h[i]);
    printf("\n\ninde_h: \n\n");
    for (int i =0; i < inSize; i++)
        printf("%d, " , inde_h[i]);
    printf("\n\n");
}

推荐答案

这不是你想的那样:

int* pass(vector<int>& flVector, int* indeces, int inSize, int*   inOnDevice)
{
...
  cudaMalloc((void**) &(inOnDevice), sizeof(int) * inSize);

当您以这种方式将指针传递给函数时,您正在按值传递指针.

When you pass a pointer to a function this way, you are passing the pointer by value.

如果您随后在函数内部获取按值传递的指针的地址,则该地址与函数调用上下文中的任何内容都没有联系.在函数 pass 中,有一个 *inOnDevicelocal 副本,您正在修改该 local 副本后续的 cudaMalloc 操作.

If you then take the address of that pointer-passed-by-value inside the function, that address has no connection to anything in the function calling context. Inside the function pass, there is a local copy of *inOnDevice, and you are modifying that local copy with the subsequent cudaMalloc operation.

相反,在这种情况下您需要传递一个指向指针的指针(模拟传递引用),否则需要传递引用.对于指向指针的示例,它看起来像这样:

Instead, you need to pass a pointer-to-a-pointer in this situation (simulated pass-by-reference) or else pass by reference. For the pointer-to-a-pointer example, it would look something like this:

int* pass(vector<int>& flVector, int* indeces, int inSize, int**   inOnDevice)
{
...
  cudaMalloc((void**) inOnDevice, sizeof(int) * inSize);

  cudaMemcpy(*inOnDevice, indeces, inSize*sizeof(int), cudaMemcpyHostToDevice);

main中:

flOnDevice = pass(flVector, indeces, indSize, &inOnDevice);

我想如果你使用过 正确的 cuda 错误检查,因为我之前向您建议,您会看到从这行代码返回的错误:

And I think if you had used proper cuda error checking as I suggested to you before, you would have seen an error returned from this line of code:

cudaMemcpy(inde_h,inOnDevice,inSize*sizeof(int),cudaMemcpyDeviceToHost);

这篇关于Cuda 分配和从 gpu 到 cpu 返回数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆