在OpenCL内核中是否有任何方法可以使特定线程在某些情况下等待其他线程 [英] Is there any way of making a particular thread to wait for other threads upon some condition in OpenCL kernel

查看:68
本文介绍了在OpenCL内核中是否有任何方法可以使特定线程在某些情况下等待其他线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



    __kernel
    void example(__global int *a, __global int *dependency, uint cols)
    {
        int j = genter code hereet_global_id(0);
        int i = get_global_id(1);
        if(i > 0 && j > 0)
        {
            while(1) 
        {
           test = 1;                
            }
            //Wait for the dependents

        -----------------------------

        --------------------------
        }
    }

在上面的内核代码中,为什么在所有线程中都跳过了while循环而没有无限循环.任何想法. 我正在处理一个有趣的问题,该问题要求线程根据某些条件等待其他线程完成,但是每次在GPU上运行时,每次以上或while(wait_condition)都会被跳过.

In the above kernel code why the while loop is just skipped in all the threads with out infinitely looping. Any ideas on this. I'm working on some interesting problem which requires a thread to wait for some other threads to finish based on some criteria but every time while of above or while(wait_condition) is skipped when it is being run on GPU.

还有其他方法可以使特定线程等待GPU上的OpenCL内核中的其他线程吗?

Is there any other way of making a particular thread to wait for the other threads in OpenCL kernel on GPU?

提前谢谢!

推荐答案

从高层次上讲,GPU是数据并行计算设备.他们喜欢对不同的数据运行相同的任务.当他们的任务做不同的事情时,他们做得不好.

At the high level, GPUs are data parallel computing devices. They like to run the same task on different data. They don't do well when their tasks do different things.

您的代码说明了任务并行问题.所以我的高级问题是您要解决哪种类型的问题??如果这是一个任务并行问题,那么也许GPU不是最佳解决方案.多核CPU是否可以替代?

Your code is illustrative of a task parallel problem. So my high level question is what type of problem are you solving.? If it's a a task parallel problem then perhaps a GPU isn't the best solution. Would a multi-core CPU be an alternative?

您的代码是典型的自旋锁".代码在哪里循环直到值更改.它通常用于数据库中的短期轻量级锁定.即使是在CPU上,这也是危险的代码,因为错误或错误会锁定CPU或GPU.对于CPU代码,自旋锁通常由中断计时器保护. 用法是

You code is a typical of a 'spinlock'. Where the code loops until a value changes. Its often used for short term light weight locking in databases. This is dangerous code even on a CPU, as a mistake or error can lockup the CPU or GPU. For CPU code, a spinlock is usually protected with a interrupt timer. The usage is

1)设置一个计时器 2)旋转直到值改变 3)继续或超时

1) set a timer 2) spin until a value changes 3) continue or time-out

因此,在必需的毫秒数之后,代码被中断并引发错误.因此,为了安全起见,如果您使用自旋锁模式,请在适当数量的循环完成后在while语句中添加循环出口.

So after the requisite number of ms the code is interrupted and an error is thrown. So if you use the spinlock pattern, for safety, add a loop exit in the while statement after a suitable number of loops have been completed.

在OpenCL简化算法中,其典型值为零线程(get_global_id(0)== 0) 返回最终的单例结果.在此之前,将使用屏障调用来同步所有线程

In OpenCL reduction algorithms, its typical for the zero thread (get_global_id(0) == 0) to return the final singleton result. Prior to this all threads would been synchronized using a barrier call

__kernel
void  mytask( ...  , global float * result )
{
    int thread = get_global_id(0);

    ...  your code

    barrier( CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE ) // flush  global and local  variables or enqueue a memory fence see OpenCL spec for details


    if ( thread == 0)  //  Zero thread
      result[0] =  value;  //  set the singleton result as the zeroth array element

}

这篇关于在OpenCL内核中是否有任何方法可以使特定线程在某些情况下等待其他线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆