AND 比整数模运算更快? [英] AND faster than integer modulo operation?

查看:24
本文介绍了AND 比整数模运算更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以重新表达:

  • 我 % 米

如:

  • 我&(m-1)

哪里,

  • i 是一个无符号整数
  • m 是 2 的幂

我的问题是:AND 操作会更快吗?现代 CPU 在单个指令中不支持硬件中的整数模吗?我对ARM感兴趣,但在其指令集中没有看到模运算.

My question is: is the AND operation any faster? Don't modern CPUs support integer modulo in hardware in a single instruction? I'm interested in ARM, but don't see the modulo operation in its instruction set.

推荐答案

比单指令"更复杂这些日子.现代 CPU 是复杂的野兽,需要将它们的指令分解为问题/执行/延迟.它还通常取决于除法/模的宽度 - 涉及多少位.

It's more complicated than "single instruction" these days. Modern CPUs are complex beasts and need their instructions broken down into issue/execute/latency. It also usually depends on the width of the divide/modulo - how many bits are involved.

无论如何,我不知道 32 位除法是任何内核(无论是否为 ARM)上的单周期延迟.关于现代"ARM 有整数除法指令,但仅限于某些实现,最值得注意的是最常见的 - Cortex A8 和 A9.

In any case, I'm not aware of 32 bit division being single cycle latency on any core, ARM or not. On "modern" ARM there are integer divide instructions, but only on some implementations, and most notably not on the most common ones - Cortex A8 and A9.

在某些情况下,编译器可以省去将除法/模数转换为位移位/掩码运算的麻烦.但是,这只有在编译时已知值时才有可能.在您的情况下,如果编译器可以确定看到m"始终是 2 的幂,那么它将优化它以进行位操作,但是如果它是传递给函数的变量(或否则计算),那么它不能,并且将诉诸完整的除法/模数.这种代码构造通常有效(但并非总是如此 - 取决于您的优化器的智能程度):

In some cases, the compiler can save you the trouble of converting a divide/modulo into bit shift/mask operations. However, this is only possible if the value is known at compile time. In your case, if the compiler can see for sure that 'm' is always a power a two, then it'll optimize it to bit ops, but if it's a variable passed into a function (or otherwise computed), then it can't, and will resort to a full divide/modulo. This kind of code construction often works (but not always - depends how smart your optimizer is):

unsigned page_size_bits = 12;     // optimization works even without const here

unsigned foo(unsigned address) {
  unsigned page_size = 1U << page_size_bits;
  return address / page_size;
}

诀窍是让编译器知道page_size"是二的幂.我知道 gcc 和变体会对此进行特殊处理,但我不确定其他编译器.

The trick is to let the compiler know that the "page_size" is a power of two. I know that gcc and variants will special-case this, but I'm not sure about other compilers.

作为任何内核 - ARM 与否(甚至 x86)的经验法则,更喜欢位移/掩码来除法/取模,尤其是对于任何不是编译时常量的东西.即使您的内核有硬件鸿沟,手动完成也会更快.

As a rule of thumb for any core - ARM or not (even x86), prefer bit shift/mask to divide/modulo, especially for anything that isn't a compile-time constant. Even if your core has hardware divide, it'll be faster to do it manually.

(另外,有符号除法必须向 0 截断,div/余数才能产生负数,所以即使 x % 4 也比 x & 3 用于已签名的 int x.)

(Also, signed division has to truncate towards 0, and div / remainder have be able to produce negative numbers, so even x % 4 is more expensive than x & 3 for signed int x.)

这篇关于AND 比整数模运算更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆