goto指令在CUDA代码中的经内分支的影响 [英] The impact of goto instruction at intra-warp divergence in CUDA code

查看:343
本文介绍了goto指令在CUDA代码中的经内分支的影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于CUDA中的简单内部经线程差异,我知道SM选择一个再会聚点(PC地址),并在两个/多个路径中执行指令,同时禁用没有执行的线程的执行效果

例如,在下面的代码中:

For simple intra-warp thread divergence in CUDA, what I know is that SM selects a re-convergence point (PC address), and executes instructions in both/multiple paths while disabling effects of execution for the threads that haven't taken the path.
For example, in below piece of code:

if( threadIdx.x < 16 ) {
    A:
    // do something.
} else {
    B:
    // do something else.
}
C:
// rest of code.

C warp调度程序在 A B 时调度指令,同时禁用 A 用于较低半弯曲的 B 上的上半弯曲和禁用指令。当它达到 C 时,将为warp内的所有线程启用指令。

C is the re-convergence point, warp scheduler schedules instructions at both A and B, while disabling instructions at A for upper half-warp and disabling instructions at B for lower half-warp. When it reaches C, instructions will be enabled for all the threads inside the warp.

我的问题是SM能够处理包括 goto 指令的代码吗?

例如,如果我在使用 goto 实现的CUDA代码中有以下控制流,

My question is will SM be able to handle the code including the goto instruction properly like above? Or there's no guarantee that chosen re-convergence point is the optimum?
For instance, if I have below control flow in my CUDA code implemented using goto

A:
// some code here.
B:
// some code here too.
if( threadIdx.x < 16 ) {
    C:
    // do something.
    goto A;
}
// do something else.
goto B;



<作为由如果指令引起的内部翘曲发散的再收敛点?

will SM be smart enough to decide B as the re-convergence point for intra-warp divergence caused by if instruction?

推荐答案

一般来说, goto 是非结构化控制流,干扰许多编译器优化,而不考虑平台。 CUDA C编译器应该以功能正确的方式处理带有 goto 的代码,但性能可能不是最佳的。

In general, goto is unstructured control flow that interferes with many compiler optimizations, regardless of platform. The CUDA C compiler should handle code with goto in a functionally correct way, but performance may be suboptimal.

这种次最佳性能的一部分可能是编译器放置收敛点。您可以使用 cuobjdump --dump-sass 检查生成的机器码(SASS)中的收敛点。 SSY 指令记录收敛点,并且指令上的 .S 后缀表示控制转移到最后记录收敛点。

Part of that suboptimal performance may be the compiler's placement of convergence points. You can examine the convergence points in the generated machine code (SASS) with cuobjdump --dump-sass. An SSY instruction records a convergence points, and a .S suffix on an instruction indicates that control is transferred to the last recorded convergence point.

这篇关于goto指令在CUDA代码中的经内分支的影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆