GPU上的分支预测 [英] Branch predication on GPU
问题描述
我有一个关于GPU中分支预测的问题。据我所知,在GPU中,他们做分支的预测。
I have a question about branch predication in GPUs. As far as I know, in GPUs, they do predication with branches.
例如,我有一个这样的代码:
For example I have a code like this:
if (C)
A
else
B
40个周期,B需要50个周期来完成执行,如果假设一个翘曲,A和B都被执行,那么总共需要90个周期来完成这个分支吗?或者它们与A和B重叠,即,当A的一些指令被执行时,然后等待存储器请求,然后执行B的一些指令,然后等待存储器,等等?
感谢
so if A takes 40 cycles and B takes 50 cycles to finish execution, if assuming for one warp, both A and B are executed, so does it take in total 90 cycles to finish this branch? Or do they overlap A and B, i.e., when some instructions of A are executed, then wait for memory request, then some instructions of B are executed, then wait for memory, and so on? Thanks
推荐答案
迄今为止发布的所有支持CUDA的架构都像SIMD机器。当在warp内存在分支发散时,两个代码路径由warp中的所有线程执行,而不跟在活动路径之后的线程执行NOP的功能等同(我想我记得有条件执行
All of the CUDA capable architectures released so far operate like an SIMD machine. When there is branch divergence within a warp, both code paths are executed by all the threads in the warp, with the threads which are not following the active path executing the functional equivalent of a NOP (I think I recall that there is a conditional execution flag attached to each thread in a warp which allows non executing threads to be masked off).
所以在你的例子中,90个周期的答案可能是一个更好的近似的真的发生比替代。
So in your example, the 90 cycles answer is probably a better approximation of what really happens than the alternative.
这篇关于GPU上的分支预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!