删除线程后我可以使用 __syncthreads() 吗? [英] Can I use __syncthreads() after having dropped threads?

查看:22
本文介绍了删除线程后我可以使用 __syncthreads() 吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我使用 return 故意删除线程的块中使用 __syncthreads() 是否安全?

Is it safe to use __syncthreads() in a block where I have purposefully dropped threads using return?

文档声明 __syncthreads() 必须由块中的每个线程调用 否则会导致死锁,但实际上我从未遇到过这样的情况行为.

The documentation states that __syncthreads() must be called by every thread in the block or else it will lead to a deadlock, but in practice I have never experienced such behavior.

示例代码:

__global__ void kernel(float* data, size_t size) {
    // Drop excess threads if user put too many in kernel call.
    // After the return, there are `size` active threads.
    if (threadIdx.x >= size) {
        return;
    }

    // ... do some work ...

    __syncthreads(); // Is this safe?

    // For the rest of the kernel, we need to drop one excess thread
    // After the return, there are `size - 1` active threads
    if (threadIdx.x + 1 == size) {
        return;
    }

     // ... do more work ...

    __syncthreads(); // Is this safe?
}

推荐答案

简短问题的答案是否".__syncthreads() 指令周围的 Warp 级分支分歧将导致死锁并导致内核挂起.不保证您的代码示例是安全或正确的.实现代码的正确方法是这样的:

The answer to the short question is "No". Warp level branch divergence around a __syncthreads() instruction will cause a deadlock and result in a kernel hang. Your code example is not guaranteed to be safe or correct. The correct way to implement the code would be like this:

__global__ void kernel(...)

    if (tidx < N) {
        // Code stanza #1
    }

    __syncthreads();


    if (tidx < N) {
        // Code stanza #2
    }

    // etc
}

以便 __syncthreads() 指令无条件执行.

so that the __syncthreads() instructions are executed unconditionally.

只是为了添加一些额外的信息来确认这个断言,__syncthreads() 调用被编译到所有架构上的 PTX bar.sync 指令中.PTX2.0 指南 (p133) 记录了 bar.sync 并包含以下警告:

Just to add a bit of additional information which confirms this assertion, __syncthreads() calls get compiled into the PTX bar.sync instruction on all architectures. The PTX2.0 guide (p133) documents bar.sync and includes the following warning:

Barriers 是在 per-warp 的基础上执行的,就好像一个线程中的所有线程一样经线处于活动状态.因此,如果 warp 中的任何线程执行 bar指令,就好像warp中的所有线程都执行了酒吧指令.经线中的所有线程都停止,直到障碍完成,并且屏障的到达计数增加经线大小(不是经线中的活动线程数).在条件执行的代码,bar 指令应该只在以下情况下使用众所周知,所有线程都以相同的方式评估条件(经线不发散).由于障碍是在每个经线上执行的基础上,可选的线程数必须是经纱大小的倍数.

Barriers are executed on a per-warp basis as if all the threads in a warp are active. Thus, if any thread in a warp executes a bar instruction, it is as if all the threads in the warp have executed the bar instruction. All threads in the warp are stalled until the barrier completes, and the arrival count for the barrier is incremented by the warp size (not the number of active threads in the warp). In conditionally executed code, a bar instruction should only be used if it is known that all threads evaluate the condition identically (the warp does not diverge). Since barriers are executed on a per-warp basis, the optional thread count must be a multiple of the warp size.

因此,尽管有任何相反的断言,但围绕 __syncthreads() 调用进行条件分支是不安全的,除非您可以 100% 确定任何给定 warp<中的每个线程/strong> 遵循相同的代码路径,不会出现扭曲分歧.

So despite any assertions to the contrary, it is not safe to have conditional branching around a __syncthreads() call unless you can be 100% certain that every thread in any given warp follows the same code path and no warp divergence can occur.

这篇关于删除线程后我可以使用 __syncthreads() 吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆