在OpenCL内核中动态创建本地数组 [英] Create local array dynamic inside OpenCL kernel

查看:197
本文介绍了在OpenCL内核中动态创建本地数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个OpenCL内核,需要将一个数组处理为多个数组,其中每个子数组总和保存在本地缓存数组中.

I have a OpenCL kernel that needs to process a array as multiple arrays where each sub-array sum is saved in a local cache array.

例如,想象一下fowling数组:

For example, imagine the fowling array:

[[1, 2, 3, 4], [10, 30, 1, 23]]

  • 每个工作组都有一个数组(例如,我们有2个工作组);
  • 每个工作项处理两个数组索引(例如,将值索引乘以local_id),其中工作项结果保存在工作组共享数组中.

    • Each work-group gets a array (in the exemple we have 2 work-groups);
    • Each work-item process two array indexes (for example multiply the value index the local_id), where the work-item result is saved in a work-group shared array.

      __kernel void test(__global int **values, __global int *result, const int array_size){
          __local int cache[array_size];
      
          // initialise
          if (get_local_id(0) == 0){
              for (int i = 0; i < array_size; i++)
                  cache[i] = 0;
          }
      
          barrier (CLK_LOCAL_MEM_FENCE);
      
          if(get_global_id(0) < 4){
              for (int i = 0; i<2; i++)
                  cache[get_local_id(0)] += values[get_group_id(0)][i] * 
                                                               get_local_id(0);
          }
      
          barrier (CLK_LOCAL_MEM_FENCE);
      
          if(get_local_id(0) == 0){
              for (int i = 0; i<array_size; i++)
                  result[get_group_id(0)] += cache[i];
          }
      }
      

    • 问题是我无法使用内核参数定义高速缓存阵列的大小,但是我需要这样做才能拥有动态内核.

      The problem is that I can not define the cache array size by using a kernel parameter, but i need to in order to have a dynamic kernel.

      如何动态创建它?就像c中的malloc函数...

      How can I create it dynamically? like malloc function in c...

      或者唯一可用的解决方案是向我的内核函数发送一个临时数组?

      Or the only solution available is to send a temp array to my kernel function?

      推荐答案

      这可以通过添加__local数组作为内核参数来实现:

      This can be achieved by adding __local array as a kernel parameter:

      __kernel void test(__global int **values, __global int *result, 
          const int array_size, __local int * cache)
      

      并提供所需的内核参数大小:

      and providing desired size of the kernel parameter:

      clSetKernelArg(kernel, 3, array_size*sizeof(int), NULL);
      

      本地内存将在内核调用时分配.请注意,可能需要进行额外的检查,以确保所需的本地内存大小不超过设备限制.

      The local memory will be allocated upon the kernel invocation. Note, that extra checks may be necessary to ensure that required local memory size does not exceed the device limit.

      这篇关于在OpenCL内核中动态创建本地数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆