如何在内核中动态分配数组? [英] How to dynamically allocate arrays inside a kernel?

查看:23
本文介绍了如何在内核中动态分配数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在内核函数内部动态分配一些数组.我该怎么做?

I need to dynamically allocate some arrays inside the kernel function. How can a I do that?

我的代码是这样的:

__global__ func(float *grid_d,int n, int nn){  
    int i,j;  
    float x[n],y[nn];  
    //Do some really cool and heavy computations here that takes hours.  
}

但这行不通.如果这是在主机代码中,我可以使用 malloc.cudaMalloc 需要主机上的指针和设备上的其他指针.在内核函数内部我没有主机指针.

But that will not work. If this was inside the host code I could use malloc. cudaMalloc needs a pointer on host, and other on device. Inside the kernel function I don't have the host pointer.

那么,我该怎么办?

如果分配所有数组的时间太长(几秒钟)(我需要大约 4 个大小为 n 的数组和 5 个大小为 nn 的数组),这不会有问题.因为内核可能至少会运行 20 分钟.

If takes too long (some seconds) to allocate all the arrays (I need about 4 of size n and 5 of size nn), this won't be a problem. Since the kernel will probably run for 20 minutes, at least.

推荐答案

仅在计算能力 2.x 和更新的硬件上支持动态内存分配.您可以在内核中使用 C++ new 关键字或 malloc,因此您的示例可以变为:

Dynamic memory allocation is only supported on compute capability 2.x and newer hardware. You can use either the C++ new keyword or malloc in the kernel, so your example could become:

__global__ func(float *grid_d,int n, int nn){  
    int i,j;  
    float *x = new float[n], *y = new float[nn];   
}

这会在具有上下文生命周期的本地内存运行时堆上分配内存,因此如果您不打算再次使用内存,请确保在内核完成运行后释放内存.您还应该注意,不能直接从主机 API 访问运行时堆内存,因此您不能将内核中分配的指针作为参数传递给 cudaMemcpy,例如.

This allocates memory on a local memory runtime heap which has the lifetime of the context, so make sure you free the memory after the kernel finishes running if your intention is not to use the memory again. You should also note that runtime heap memory cannot be accessed directly from the host APIs, so you cannot pass a pointer allocated inside a kernel as an argument to cudaMemcpy, for example.

这篇关于如何在内核中动态分配数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆