OpenCL和GPU全局同步 [英] OpenCL and GPU global synchronization

查看:165
本文介绍了OpenCL和GPU全局同步的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有人尝试过通过快速屏障同步进行块间GPU通信"一文中描述的gpu_sync函数?所描述的所有代码看起来非常简单且易于实现,但它一直冻结着我的GPU.我确定我在做一些愚蠢的事情,但我看不到.谁能帮我吗?

Has anyone tried the gpu_sync functions described in the article "Inter-Block GPU Communication via Fast Barrier Synchronization"? All the codes described seems pretty simple and easy to implement but it keeps freezing up my GPU. I'm sure I'm doing something stupid but I can't see what. Can anyone help me?

我正在使用的策略是"GPU无锁同步"部分中介绍的策略,这是我已实现的OpenCL源代码:

The strategy I'm using is the one described in the section "GPU Lock-Free Synchronization" and here is the OpenCL source code I've implemented:

static void globalSync(uint iGoalValue,
                   volatile __global int *globalSyncFlagsIN,
                   volatile __global int *globalSyncFlagsOUT)
{
 const size_t iLocalThreadID  = get_local_id(0);
 const size_t iWorkGroupID    = get_group_id(0);
 const size_t iWorkGroupCount = get_num_groups(0);

 //Only the first thread on each SM is used for synchronization
 if (iLocalThreadID == 0)
 { globalSyncFlagsIN[iWorkGroupID] = iGoalValue; }

 if (iWorkGroupID == 0)
 {
  if (iLocalThreadID < iWorkGroupCount)
  {
   while (globalSyncFlagsIN[iLocalThreadID] != iGoalValue) {
    // Nothing to do here
   }
  }

  barrier(CLK_GLOBAL_MEM_FENCE);

  if (iLocalThreadID < iWorkGroupCount)
  { globalSyncFlagsOUT[iLocalThreadID] = iGoalValue; }
 }

 if (iLocalThreadID == 0)
 {
  while (globalSyncFlagsOUT[iWorkGroupID] != iGoalValue) {
   // Nothing to do here 
  }
 }

 barrier(CLK_GLOBAL_MEM_FENCE);
} 

谢谢.

推荐答案

我还没有尝试运行代码,但是上述文章中的代码从CUDA到OpenCL的直接转换是:

I haven't tried running the code, but the direct translation from CUDA to OpenCL of the code from the article mentioned above would be:

{  
    int tid_in_blk = get_local_id(0) * get_local_size(1)
        + get_local_id(1);
    int nBlockNum = get_num_groups(0) * get_num_groups(1);
    int bid = get_group_id(0) * get_num_groups(1) + get_group_id(1);


    if (tid_in_blk == 0) {
        Arrayin[bid] = goalVal;
    }

    if (bid == 1) {
        if (tid_in_blk < nBlockNum) {
            while (Arrayin[tid_in_blk] != goalVal){

            }
        }
        barrier(CLK_LOCAL_MEM_FENCE);

        if (tid_in_blk < nBlockNum) {
            Arrayout[tid_in_blk] = goalVal;
        }
    }

    if (tid_in_blk == 0) {
        while (Arrayout[bid] != goalVal) {

        }
    }
}

请注意线程ID和组ID以及使用本地内存屏障而不是全局内存屏障的区别.

Please note the difference in thread and group IDs and in using local memory barrier instead of global one.

这篇关于OpenCL和GPU全局同步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆