分支和谓词指令 [英] Branch and predicated instructions

查看:27
本文介绍了分支和谓词指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

第 5.4.2 节 声明分支分歧由分支指令"或在某些条件下谓词指令"处理.我不明白两者之间的区别,以及为什么一个比另一个带来更好的性能.

Section 5.4.2 of the CUDA C Programming Guide states that branch divergence is handled either by "branch instructions" or, under certain conditions, "predicated instructions". I don't understand the difference between the two, and why one leads to better performance than the other.

此评论表明分支指令会导致更多执行指令的数量,由于分支地址解析和获取"而停止,以及由于分支本身"和为分歧而记账"导致的开销,而谓词指令仅导致进行条件测试和设置的指令执行延迟"谓词".为什么?

This comment suggests that branch instructions lead to a greater number of executed instructions, stalling due to "branch address resolution and fetch", and overhead due to "the branch itself" and "book keeping for divergence", while predicated instructions incur only the "instruction execution latency to do the condition test and set the predicate". Why?

推荐答案

指令谓词是指一条指令由一个线程根据谓词有条件地执行.谓词为真的线程执行指令,其余线程不执行任何操作.

Instruction predication means that an instruction is conditionally executed by a thread depending on a predicate. Threads for which the predicate is true execute the instruction, the rest do nothing.

例如:

var = 0;

// Not taken by all threads
if (condition) {
    var = 1;
} else {
    var = 2;
}

output = var;

会导致(不是实际的编译器输出):

Would result in (not actual compiler output):

       mov.s32 var, 0;       // Executed by all threads.
       setp pred, condition; // Executed by all threads, sets predicate.

@pred  mov.s32 var, 1;       // Executed only by threads where pred is true.
@!pred mov.s32 var, 2;       // Executed only by threads where pred is false.
       mov.s32 output, var;  // Executed by all threads.

总而言之,这是 if 的 3 条指令,没有分支.效率很高.

All in all, that's 3 instructions for the if, no branching. Very efficient.

具有分支的等效代码如下所示:

The equivalent code with branches would look like:

       mov.s32 var, 0;       // Executed by all threads.
       setp pred, condition; // Executed by all threads, sets predicate.

@!pred bra IF_FALSE;         // Conditional branches are predicated instructions.
IF_TRUE:                    // Label for clarity, not actually used.
       mov.s32 var, 1;
       bra IF_END;
IF_FALSE:
       mov.s32 var, 2;
IF_END:
       mov.s32 output, var;

注意它有多长(if 的 5 条指令).条件分支需要禁用部分扭曲,执行第一条路径,然后回滚到扭曲发散的点并执行第二条路径,直到两者收敛.它需要更长的时间,需要额外的簿记,更多的代码加载(特别是在有很多指令要执行的情况下),因此需要更多的内存.所有这些都使得分支比简单的谓词慢.

Notice how much longer it is (5 instructions for the if). The conditional branch requires disabling part of the warp, executing the first path, then rolling back to the point where the warp diverged and executing the second path until both converge. It takes longer, requires extra bookkeeping, more code loading (particularly in the case where there are many instructions to execute) and hence more memory requests. All that make branching slower than simple predication.

实际上,在这个非常简单的条件赋值的情况下,编译器可以做得更好,if 只需 2 条指令:

And actually, in the case of this very simple conditional assignment, the compiler can do even better, with only 2 instructions for the if:

mov.s32 var, 0;       // Executed by all threads.
setp pred, condition; // Executed by all threads, sets predicate.
selp var, 1, 2, pred; // Sets var depending on predicate (true: 1, false: 2).

这篇关于分支和谓词指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆