复制多分支树GPU内存 [英] Copying a multi-branch tree to GPU memory

查看:99
本文介绍了复制多分支树GPU内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有节点的树,我试图把它复制到GPU内存。节点如下:

i have a tree of nodes and i am trying to copy it to GPU memory. the Node looks like this:

struct Node
{
   char *Key;
   int ChildCount;
   Node *Children;
}

和我的复制功能如下:

void CopyTreeToDevice(Node* node_s, Node* node_d)
{


     //allocate node on device and copy host node
     cudaMalloc( (void**)&node_d, sizeof(Node));
     cudaMemcpy(node_d, node_s, sizeof(Node), cudaMemcpyHostToDevice);

     //test
     printf("ChildCount of node_s looks to be : %d\n", node_s->ChildCount);
     printf("Key of node_s looks to be : %s\n", node_s->Key);

     Node *temp;
     temp =(Node *) malloc(sizeof(Node));
     cudaMemcpy(temp, node_d, sizeof(Node), cudaMemcpyDeviceToHost);
     printf("ChildCount of node_d on device is actually : %d\n", temp->ChildCount);
     printf("Key of node_d on device is actually : %s\n", temp->Key);
     free(temp);



     //       continue with child nodes
     if(node_s->ChildCount > 0)
     {
         //problem here
         cudaMalloc( (void**)&(node_d->Children), sizeof(Node)*(node_s->ChildCount));

         cudaMemcpy(node_d->Children, node_s->Children, 
                    sizeof(Node)*node_s->ChildCount, cudaMemcpyHostToDevice);

         for(int i=0;i<node_s->ChildCount;i++)
         {
                 CopyTreeToDevice(&(node_s->Children[i]), &(node_d->Children[i]));
         }
     }

}

但我有与行的问题:

But i have a problem with the line :

cudaMalloc( (void**)&(node_d->Children), sizeof(Node)*(node_s->ChildCount));

让我访问冲突exception.Test节在初始化领域的文章smoothly.no问题。

Gives me access violation exception.Test section works smoothly.no problem at initializing fields.

下面是测试部分的输出:

Here is the output of test section :

ChildCount of node_s looks to be : 35
Key of node_s looks to be : root
ChildCount of node_d on device is actually : 35
Key of node_d on device is actually : root

是什么原因呢?

What is the reason for this?

感谢。

推荐答案

node_d-&GT;儿童是驻留在设备code的变量。你不能用你的主机code直接使用它,因为你与你的第二个 cudaMalloc 做的。 Morover,复制主机指针装置使没有太大意义,因为你不能解引用他们在设备code。

node_d->Children is a variable which resides in device code. You cannot use it directly by your host code, as you do with your second cudaMalloc. Morover, copying host-pointers to device makes not much sense as you cannot dereference them in the device code.

有一个更好,更快的方法是:

A nicer and much quicker approach would be to:


  • preallocate你的整个树大数组。

  • 使用数组索引来代替指针。指数的有效性将取决于转让pserved和从设备$ P $。

  • 分配整个阵列一旦设备上。拥有多个 memAlloc 可能是低效(特别是在Windows系统中,当显示器连接到GPU)。此外,由于 memAlloc 返回它总是对齐到512字节的地址,你几乎不能分配的内存较小的块。因此,根据当前的code,每一个孩子阵列将消耗至少512字节,即使里面有只有2个孩子。

  • 全阵列式复制一次从主机到设备。这是更快,比有多个MEMCOPY说明,即使你实际拷贝内存,并且未使用一些额外的区域。

  • Preallocate a big array for your whole tree.
  • Use an array index instead of pointers. The validity of indices will be preserved upon transfers to and from device.
  • Allocate the whole array once on the device. Having multiple memAlloc may be inefficient (especially in Windows systems, when monitor is connected to that GPU). Also, since memAlloc returns an address which is always aligned to 512 bytes, you practically cannot allocate smaller chunks of memory. So, according to your current code, every children array will consume at least 512 bytes, even if there are only 2 children inside.
  • Copy the whole array once from host to device. This is much faster, than having multiple memCopy instructions, even if you actually copy some extra region of memory which is unused.

这篇关于复制多分支树GPU内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆