__syncthreads()是否同步网格中的所有线程? [英] Does __syncthreads() synchronize all threads in the grid?

查看:121
本文介绍了__syncthreads()是否同步网格中的所有线程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

...或者仅仅是当前扭曲或块中的线程?

...or just the threads in the current warp or block?

此外,当特定块中的线程遇到(在内核中)以下行时

Also, when the threads in a particular block encounter (in the kernel) the following line

__shared__  float srdMem[128];

他们是否会(每个块)仅声明一次此空间?

will they just declare this space once (per block)?

它们显然都是异步运行的,因此,如果块22中的线程23是到达该行的第一个线程,然后块22中的线程69是到达该行的最后一个线程,则线程69将知道它已经

They all obviously operate asynchronously so if Thread 23 in Block 22 is the first thread to reach this line, and then Thread 69 in Block 22 is the last one to reach this line, Thread 69 will know that it already has been declared?

推荐答案

__ syncthreads()命令是 block级别同步障碍。这意味着当块中的所有线程到达屏障时,使用它是安全的。也可以在条件代码中使用 __ syncthreads(),但仅当所有线程对这些代码的评估相同时,否则执行可能会挂起或产生意外的副作用 [4]

The __syncthreads() command is a block level synchronization barrier. That means it is safe to be used when all threads in a block reach the barrier. It is also possible to use __syncthreads() in conditional code but only when all threads evaluate identically such code otherwise the execution is likely to hang or produce unintended side effects [4].

使用 __ syncthreads()的示例:(

__global__ void globFunction(int *arr, int N) 
{
    __shared__ int local_array[THREADS_PER_BLOCK];  //local block memory cache           
    int idx = blockIdx.x* blockDim.x+ threadIdx.x;

    //...calculate results
    local_array[threadIdx.x] = results;

    //synchronize the local threads writing to the local memory cache
    __syncthreads();

    // read the results of another thread in the current thread
    int val = local_array[(threadIdx.x + 1) % THREADS_PER_BLOCK];

    //write back the value to global memory
    arr[idx] = val;        
}

要同步网格中的所有线程,当前没有没有本机API调用。在网格级别上同步线程的一种方法是使用连续内核调用,因为此时所有线程都结束并从同一点重新开始。通常也称为CPU同步或隐式同步。因此它们都是同步的。

To synchronize all threads in a grid currently there is not native API call. One way of synchronizing threads on a grid level is using consecutive kernel calls as at that point all threads end and start again from the same point. It is also commonly called CPU synchronization or Implicit synchronization. Thus they are all synchronized.

使用此技术的示例():

Example of using this technique (source):

关于第二问题。 ,它确实声明了每个块指定的共享内存量。请注意,每个 SM 测量的可用共享内存量。因此,应该小心非常小心地使用共享内存启动配置

Regarding the second question. Yes, it does declare the amount of shared memory specified per block. Take into account that the quantity of available shared memory is measured per SM. So one should be very careful how the shared memory is used along with the launch configuration.

这篇关于__syncthreads()是否同步网格中的所有线程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆