cuda块同步 [英] cuda block synchronization

查看:352
本文介绍了cuda块同步的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有b个块数,每个块有t个线程数。
我可以使用

  __syncthreads()

来同步特定块中的线程。例如

  __ global__ void aFunction()
{
for(i = 0; i < i ++)
{
//执行某事
__syncthreads();
}
}

但我的问题是同步所有线程块。我如何做到这一点?

解决方案

没有本地方法来同步所有块的所有线程。
事实上,CUDA中的块的概念是,一些可能只有在一些其他块已经结束其工作之后才被启动,例如,如果它正在运行的GPU太弱,无法并行处理它们。 / p>

如果你确保你不会产生太多的块,你可以尝试同步所有块之间自己,例如通过使用原子操作来主动等待。这是慢的,吃了你的GPU内存控制器,被认为是一个黑客,应该避免。



我可以建议的最好的方法是简单地终止你的内核在同步点,然后启动一个新的内核,将继续你的工作。在大多数情况下,它实际上会比使用提到的黑客更快(或至少 - 与类似的速度)。


I have b number of blocks and each block has t number of threads. I can use

 __syncthreads()

to synchronize the threads that are in a particular block. for example

__global__ void aFunction()
{
    for(i=0;i<10;i++)
    {
       //execute something
        __syncthreads();
    }
}

But my problem is to synchronize all the threads in all the blocks. How can I do this?

解决方案

There is no native way to synchronise all threads from all blocks. In fact, the concept of blocks in CUDA is that some may be launched only after some other blocks already ended its work, for example, if the GPU it is running on is too weak to process them all in parallel.

If you ensure that you don't spawn too many blocks, you can try to synchronise all blocks between themselves, e.g. by actively-waiting using atomic operations. This is however slow, eating up your GPU memory controller, is considered "a hack" and should be avoided.

The best way that I can suggest, is to simply terminate your kernel at the synchronisation point, and then launch a new kernel which would continue with your job. In most circumstances it will actually perform faster (or at least - with simmilar speeds) than using the mentioned hack.

这篇关于cuda块同步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆