GPU上的分支预测 [英] Branch predication on GPU

查看:1354
本文介绍了GPU上的分支预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于GPU中分支预测的问题。据我所知,在GPU中,他们做分支的预测。

I have a question about branch predication in GPUs. As far as I know, in GPUs, they do predication with branches.

例如,我有一个这样的代码:

For example I have a code like this:

if (C)
 A
else
 B

40个周期,B需要50个周期来完成执行,如果假设一个翘曲,A和B都被执行,那么总共需要90个周期来完成这个分支吗?或者它们与A和B重叠,即,当A的一些指令被执行时,然后等待存储器请求,然后执行B的一些指令,然后等待存储器,等等?
感谢

so if A takes 40 cycles and B takes 50 cycles to finish execution, if assuming for one warp, both A and B are executed, so does it take in total 90 cycles to finish this branch? Or do they overlap A and B, i.e., when some instructions of A are executed, then wait for memory request, then some instructions of B are executed, then wait for memory, and so on? Thanks

推荐答案

迄今为止发布的所有支持CUDA的架构都像SIMD机器。当在warp内存在分支发散时,两个代码路径由warp中的所有线程执行,而不跟在活动路径之后的线程执行NOP的功能等同(我想我记得有条件执行

All of the CUDA capable architectures released so far operate like an SIMD machine. When there is branch divergence within a warp, both code paths are executed by all the threads in the warp, with the threads which are not following the active path executing the functional equivalent of a NOP (I think I recall that there is a conditional execution flag attached to each thread in a warp which allows non executing threads to be masked off).

所以在你的例子中,90个周期的答案可能是一个更好的近似的真的发生比替代。

So in your example, the 90 cycles answer is probably a better approximation of what really happens than the alternative.

这篇关于GPU上的分支预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆