opencl内核中的动态全局内存分配 [英] Dynamic global memory allocation in opencl kernel

查看:361
本文介绍了opencl内核中的动态全局内存分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以从内核动态分配全局内存? 在CUDA中可以实现,但是我想知道在Intel GPU的OpenCL中是否也可以实现.

Is it possible to dynamically allocate global memory from the kernel? In CUDA it is possible but I would like to know if this is also possible in OpenCL on Intel GPUs.

例如:

__kernel void foo()

{

,
,
,

call malloc or clCreateBuffer here


} 

有可能吗?如果是,究竟如何?

is it possible? If yes how exactly?

推荐答案

否,当前在OpenCL中不允许.

No, this is not currently allowed in OpenCL.

您可以通过先创建一个非常大的缓冲区来实现自己的堆,然后通过分发偏移量(使用atomic_add以避免同步问题)来分配"缓冲区的区域.但是,在大多数情况下,我怀疑最好重新考虑您的算法,并提出一种首先不需要动态内存分配的方法.

You could implement your own heap by creating one very large buffer up front, and then 'allocate' regions of the buffer by handing out offsets (using atomic_add to avoid synchronisation issues). However, in most cases I suspect it would be better just to rethink your algorithm and come up with an approach that doesn't require dynamic memory allocation in the first place.

这是一个使用预分配缓冲区模拟内核内部动态堆分配的示例.下一个空闲元素的堆和索引作为参数传递到内核中,并且需要传递到我们的malloc函数中.在OpenCL 2.0中,我们可以使用程序范围的全局变量来避免这样做.

Here's an example that uses a preallocated buffer to emulate dynamic heap allocation inside kernels. The heap and index of the next free element are passed into the kernel as arguments, and need to passed onto our malloc function. In OpenCL 2.0, we could use program scope global variables to avoid the need to do this.

global void* malloc(size_t size, global uchar *heap, global uint *next)
{
  uint index = atomic_add(next, size);
  return heap+index;
}

kernel void foo(global uchar *heap, global uint *next)
{
  // Allocate some memory from heap
  global void *data = malloc(4, heap, next);
  ...
}

这篇关于opencl内核中的动态全局内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆