结构的cudaMalloc和结构相同的元件 [英] cudaMalloc of a structure and an element of same structure

查看:209
本文介绍了结构的cudaMalloc和结构相同的元件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道在设备(内存明智)会发生什么,当我分配一个结构,然后分配(?),然后复制相同结构的指针元素。

I would like to know what happens on the device (memory wise) when I allocate a structure and then allocate(?) and copy a pointer element of the same structure.

我需要 cudaMalloc 元素中的 *一个一遍吗?

Do I need cudaMalloc of the element *a again?

举例code:

typedef struct {
  int *a;
  ...
} StructA;

int main() 
{
  int row, col, numS = 10; // defined at runtime

  StructA *d_A = (StructA*)malloc(numS * sizeof(StructA));
  int *h_A = d_a->a;

  cudaMalloc( (void**)&(d_A), numS * sizeof(StructA) );

  cudaMalloc( &(d_A->a), row*col*sizeof(int) ); // no (void**) needed?
  cudaMemcpy( d_A->a, h_A, row*col*sizeof(int), cudaMemcpyHostToDevice );

  kernel<<<grid, block>>>(d_A); // Passing pointer to StructA in device
  ...
}

内核定义:

__global__ kernel(StructA *d_A)
{
  d_A->a = ...;
  ...
}

这问题是<一个又一个分机href=\"http://stackoverflow.com/questions/19404965/how-to-use-cudamalloc-cudamemcpy-for-a-pointer-to-a-structure-containing-point\">this问题以及相关的<一个href=\"http://stackoverflow.com/questions/22078399/cudamalloc-cast-void-and-struct-member-allocation-copy-and-sizeof\">this问题。

推荐答案

我建议你把一些精力来编译和适当CUDA的错误检查运行您codeS。学习间preT编译器输出和运行时输出将让你更好的,更智能,更高效的$ C $铬。我也建议审查书面记录我previously指出你的here.它处理这个​​确切的话题,包括挂钩工作的例子。这个问题是一个重复。

I would suggest that you put some effort into compiling and running your codes with proper cuda error checking. Learning to interpret the compiler output and runtime output will make you a better, smarter, more efficient coder. I also suggest reviewing the writeup I previously pointed you at here. It deals with this exact topic, and includes linked worked examples. This question is a duplicate of that one.

有各种错误:

StructA *d_A = (StructA*)malloc(numS * sizeof(StructA));

code的上述行的主机的内存大小 StructA 的结构,并设置指针<$创建一个分配C $ C> D_A 指向该​​分配的开始。没有错的时刻。

The above line of code creates an allocation in host memory for a structure of size StructA, and sets the pointer d_A pointing to the start of that allocation. Nothing wrong at the moment.

cudaMalloc( (void**)&(d_A), numS * sizeof(StructA) );

code的上述行的设备 StructA 的大小的内存,并设置指针<$ C创建一个分配$ C> D_A 指向该​​分配的开始。这有效地消灭了previous指针和分配。 (在previous主机分配仍然是地方,但你不能访问它。它基本上失去了。)当然,这不是你的意图。

The above line of code creates an allocation in device memory of the size of StructA, and sets the pointer d_A pointing to the start of that allocation. This has effectively wiped out the previous pointer and allocation. (The previous host allocation is still somewhere, but you can't access it. It's basically lost.) Surely that was not your intent.

int *h_A = d_a->a;

现在 D_A (我假设你的意思 D_A ,而不是 D_A )已被指定为设备内存指针, - &GT; 操作将取消引用该指针定位元素 A 。这是的非法的宿主code和将抛出一个错误(赛格故障)。

Now that d_A (I assume you meant d_A, not d_a) has been assigned as a device memory pointer, the -> operation will dereference that pointer to locate the element a. This is illegal in host code and will throw an error (seg fault).

cudaMalloc( &(d_A->a), row*col*sizeof(int) );

code此行也有类似的问题。我们不能 cudaMalloc 生活在设备内存的指针。 cudaMalloc 创建住在主机的内存,但在设备内存引用一个位置指针。此操作及(d_A-&gt;一种)。被提领一设备指针,这是在主机code非法

This line of code has a similar issue. We cannot cudaMalloc a pointer that lives in device memory. cudaMalloc creates pointers that live in host memory but reference a location in device memory. This operation &(d_A->a) is dereferencing a device pointer, which is illegal in host code.

有一个正确的code会是这样的:

A proper code would be something like this:

$ cat t363.cu
#include <stdio.h>

typedef struct {
  int *a;
  int foo;
} StructA;

__global__ void kernel(StructA *data){

  printf("The value is %d\n", *(data->a + 2));
}

int main()
{
  int  numS = 1; // defined at runtime

  //allocate host memory for the structure storage
  StructA *h_A = (StructA*)malloc(numS * sizeof(StructA));
  //allocate host memory for the storage pointed to by the embedded pointer
  h_A->a = (int *)malloc(10*sizeof(int));
  // initialize data pointed to by the embedded pointer
  for (int i = 0; i <10; i++) *(h_A->a+i) = i;
  StructA *d_A;  // pointer for device structure storage
  //allocate device memory for the structure storage
  cudaMalloc( (void**)&(d_A), numS * sizeof(StructA) );
  // create a pointer for cudaMalloc to use for embedded pointer device storage
  int *temp;
  //allocate device storage for the embedded pointer storage
  cudaMalloc((void **)&temp, 10*sizeof(int));
  //copy this newly created *pointer* to it's proper location in the device copy of the structure
  cudaMemcpy(&(d_A->a), &temp, sizeof(int *), cudaMemcpyHostToDevice);
  //copy the data pointed to by the embedded pointer from the host to the device
  cudaMemcpy(temp, h_A->a, 10*sizeof(int), cudaMemcpyHostToDevice);

  kernel<<<1, 1>>>(d_A); // Passing pointer to StructA in device
  cudaDeviceSynchronize();
}
$ nvcc -arch=sm_20 -o t363 t363.cu
$ cuda-memcheck ./t363
========= CUDA-MEMCHECK
The value is 2
========= ERROR SUMMARY: 0 errors
$

您会注意到,我没有工作了,你正在处理 StructA 的数组(即 NUMS > 1),这将需要一个循环。我将让你通过我$ P $这里psented逻辑和工作我的<一个href=\"http://stackoverflow.com/questions/15431365/cudamemcpy-segmentation-fault/15435592#15435592\">$p$pvious挂答案来看看你能不能制定出该循环的细节。此外,为了清楚/简洁起见,我省去通常的<一href=\"http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api\">cuda错误检查但请用它在你的codeS。最后,如​​果你还没有得出结论,但这个过程(有时称为深复制操作),是普通CUDA有点乏味。沿着这些线路previous建议是扁平化这样的结构(使他们不contiain指针),但你也可以探索 cudaMallocManaged 即的Unified内存在CUDA 6

You'll note that I haven't worked out the case where you are dealing with an array of StructA (i.e. numS > 1), that will require a loop. I'll leave it to you to work through the logic I've presented here and in my previous linked answer to see if you can work out the details of that loop. Furthermore, for the sake of clarity/brevity I've dispensed with the usual cuda error checking but please use it in your codes. Finally, this process (sometimes called a "deep copy operation") is somewhat tedious in ordinary CUDA if you haven't concluded that yet. Previous recommendations along these lines are to "flatten" such structures (so that they don't contiain pointers), but you can also explore cudaMallocManaged i.e. Unified Memory in CUDA 6.

这篇关于结构的cudaMalloc和结构相同的元件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆