OpenCl内核中的嵌套循环 [英] Nested loops in OpenCl Kernel

查看:177
本文介绍了OpenCl内核中的嵌套循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始尝试研究OpenCl,并试图将以下代码转换为高效的OpenCl内核:

I have recently started trying to study OpenCl and am trying to convert the following code into an efficient OpenCl kernel:

for(int i = 0; i < VECTOR_SIZE; i++)
{
    for(int j = 0; j < 100; j++)
    {
        C[i] = sqrt(A[i] + sqrt(A[i] * B[i])) * sqrt(A[i] + sqrt(A[i] * B[i]));
    }
}

到目前为止,这是我使用不同的教程得出的.我的问题是,我可以以某种方式摆脱内核中的外部循环.您是否可以说这是上述C ++代码的不错的实现,并且无法做进一步的事情来使其更加高效或接近于openCL程序应具有的样子.

This is what I have come up with so far using different tutorials. My question is, can I somehow get rid of the outer loop in my kernel. Would you say that this is an okey implementation of the above C++ code and no further thing can be done to make it more efficient or close to how an openCL program is supposed to be like.

此外,到目前为止,我已经阅读的所有教程都用const char *编写内核.这背后的原因是什么,这是编写OPenCL内核的唯一方法,或者通常我们将它们编码在其他文件中,然后将其包含在我们的常规代码中.

Also, all the tutorials that I have read so far have the kernels written in a const char *. What is reason behind this and is this the only way OPenCL kernels are written or usually we code them in some other file and then include it in our regular code or something.

谢谢

     const char *RandomComputation =
"__kernel                                   \n"
"void RandomComputation(                              "
"                  __global float *A,       \n"
"                  __global float *B,       \n"
"                  __global float *C)       \n"
"{                                          \n"
"    //Get the index of the work-item       \n"
"    int index = get_global_id(0);          \n"
"   for (int j = 0; j < 100 ; j++)          \n"
"   {                                       \n"
"    C[index] = sqrt(A[index] + sqrt(A[index] * B[index])) * sqrt(A[index] + sqrt(A[index] * B[index])); \n"
"}                                          \n"
"}                                          \n";

推荐答案

当您想在OpenCL内核中使用嵌套循环时,请像本例一样使用二维作为矩阵乘法.

When you want to use nested loop in OpenCL kernel , use the two dimension like this example as matrix multiplication .

__kernel void matrixMul(__global float* C, 
      __global float* A, 
      __global float* B, 
      int wA, int wB)
{
   int tx = get_global_id(0); 
   int ty = get_global_id(1);
   float value = 0;
   for (int k = 0; k < wA; ++k)
   {
     float elementA = A[ty * wA + k];
     float elementB = B[k * wB + tx];
     value += elementA * elementB;
   }
   C[ty * wA + tx] = value;
}

您是否需要在此处

这篇关于OpenCl内核中的嵌套循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆