CUDA:为什么逐位运算符有时比逻辑运算符更快? [英] CUDA: Why are bitwise operators sometimes faster than logical operators?

查看:1456
本文介绍了CUDA:为什么逐位运算符有时比逻辑运算符更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从内核中挤压出最后一点性能时,我通常会发现用逻辑运算符&& 按位运算符(& 和<$ c $)键入 || c> | )使内核更快一点。这是通过查看CUDA Visual Profiler中的内核时间摘要来观察的。

When I am down to squeezing the last bit of performance out of a kernel, I usually find that replacing the logical operators (&& and ||) with bitwise operators (& and |) makes the kernel a little bit faster. This was observed by looking at the kernel time summary in CUDA Visual Profiler.

那么,为什么比CUDA中的逻辑运算符更快我必须承认,他们不是总是更快,但很多次。我不知道什么魔法可以给这个加速。

So, why are bitwise operators faster than logical operators in CUDA? I must admit that they are not always faster, but a lot of times they are. I wonder what magic can give this speedup.

免责声明:我知道逻辑运算符短路和按位运算符不。我很清楚这些操作符如何被滥用,导致错误的代码。只有当结果逻辑保持不变,有一个加速,因此获得的加速对我有用时,我才使用这个替换: - )

Disclaimer: I am aware that logical operators short-circuit and bitwise operators do not. I am well aware of how these operators can be misused resulting in wrong code. I use this replacement with care only when the resulting logic remains the same, there is a speedup and the speedup thus obtained matters to me :-)

推荐答案

逻辑运算符通常会导致分支,特别是当需要遵守短路评估规则时。对于正常的CPU,这可以意味着分支误预测,并且对于CUDA,它可以意味着经线散度。按位操作不需要短路评估,因此代码流是线性的(即无分支)。

Logical operators will often result in branches, particularly when the rules of short circuit evaluation need to be observed. For normal CPUs this can mean branch misprediction and for CUDA it can mean warp divergence. Bitwise operations do not require short circuit evaluation so the code flow is linear (i.e. branchless).

这篇关于CUDA:为什么逐位运算符有时比逻辑运算符更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆