如何动态分配一个内核内部数组? [英] How to dynamically allocate arrays inside a kernel?

查看:346
本文介绍了如何动态分配一个内核内部数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要动态分配内核函数里面的一些阵列。一个我该怎么做?

I need to dynamically allocate some arrays inside the kernel function. How can a I do that?

我的code是类似的东西:

My code is something like that:

__global__ func(float *grid_d,int n, int nn){  
    int i,j;  
    float x[n],y[nn];  
    //Do some really cool and heavy computations here that takes hours.  
}

但是,这是行不通的。如果这是主机code里面,我可以使用malloc。 cudaMalloc需要在设备上的主机指针,等。里面的核函数我没有主机的指针。

But that will not work. If this was inside the host code I could use malloc. cudaMalloc needs a pointer on host, and other on device. Inside the kernel function I don't have the host pointer.

那么,我应该怎么办?

如果时间过长(几秒钟)来分配所有的阵列(我需要大小为n的约4和大小ン5),这将不会是一个问题。由于内核可能会运行20分钟,至少。

If takes too long (some seconds) to allocate all the arrays (I need about 4 of size n and 5 of size nn), this won't be a problem. Since the kernel will probably run for 20 minutes, at least.

谢谢您的关注。

推荐答案

动态内存分配只支持计算能力2.x和更新的硬件。您可以使用C ++的新关键字或malloc的内核,所以你的例子可以变成:

Dynamic memory allocation is only supported on compute capability 2.x and newer hardware. You can use either the C++ new keyword or malloc in the kernel, so your example could become:

__global__ func(float *grid_d,int n, int nn){  
    int i,j;  
    float *x = new float[n], *y = new float[nn];   
}

这对分配有上下文的生命周期本地内存运行时的堆内存,所以一定要释放内存内核运行完成后,如果你的目的不是为了再次使用内存后。你还应该注意的是运行时堆内存不能从主机API的直接访问,所以你不能一个内核作为参数内部分配的指针传递给 cudaMemcpy ,例如

This allocates memory on a local memory runtime heap which has the lifetime of the context, so make sure you free the memory after the kernel finishes running if your intention is not to use the memory again. You should also note that runtime heap memory cannot be accessed directly from the host APIs, so you cannot pass a pointer allocated inside a kernel as an argument to cudaMemcpy, for example.

这篇关于如何动态分配一个内核内部数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆