GCC/ARM 上的快速除法 [英] Fast Division on GCC/ARM
问题描述
据我所知,大多数编译器会通过乘法然后向右移位来进行快速除法.例如,如果您检查 this SO thread 它说当您询问Microsoft 编译器除以 10 时,会将被除数乘以 0x1999999A(即 2^32/10),然后将结果除以 2^32(使用向右移动 32 次).
As far as I know most compilers will do fast division by multiplying and then bit shifting to the right. For instance, if you check this SO thread it says that when you ask the Microsoft compiler to do division by 10 it will multiply the dividend by 0x1999999A (which is 2^32/10) and then divide the result by 2^32 (using 32 shifts to the right).
(编者注:链接的答案直到错误为止.编译器不会这样做,因为它对所有输入都不准确.编译器确实会进行乘法和移位,但使用更复杂的方法来确定魔术常数和移位计数:为什么 GCC 使用乘法实现整数除法时的奇怪数字?)
(Editor's note: that linked answer was wrong until. Compilers don't do that because it's not exact for all inputs. Compilers do multiply and shift, but with a more involved way of determining the magic constant and shift counts: Why does GCC use multiplication by a strange number in implementing integer division?)
到目前为止一切顺利.
不过,一旦我使用 GCC 在 ARM 上测试了相同的除法 10,编译器做了一些稍微不同的事情.首先它将被除数乘以 0x66666667 (2^34/10),然后将结果除以 2^34.到目前为止,它与 Microsoft 相同,只是使用了更高的乘数.然而,在那之后,它从结果中减去 (dividend/2^31).
Once I tested the same division by 10 on ARM using GCC, though, the compiler did something slightly different. First it multiplied the dividend by 0x66666667 (2^34/10), then it divided the result by 2^34. Thus far it's the same as Microsoft, except using a higher multiplier. After that, however, it subtracted (dividend/2^31) from the result.
我的问题:为什么在 ARM 版本上有额外的减法?你能给我一个数字例子,如果没有那个减法,结果会是错误的吗?
My question: why on the ARM version there's that extra subtraction? Can you give me an numeric example where without that subtraction the result will be wrong?
如果你想检查生成的代码,它在下面(有我的评论):
If you want to check the generated code, it's below (with my comments):
ldr r2, [r7, #4] @--this loads the dividend from memory into r2
movw r3, #:lower16:1717986919 @--moves the lower 16 bits of the constant
movt r3, #:upper16:1717986919 @--moves the upper 16 bits of the constant
smull r1, r3, r3, r2 @--multiply long, put lower 32 bits in r1, higher 32 in r3
asr r1, r3, #2 @--r3>>2, then store in r1 (effectively >>34, since r3 was higher 32 bits of multiplication)
asr r3, r2, #31 @--dividend>>31, then store in r3
rsb r3, r3, r1 @--r1 - r3, store in r3
str r3, [r7, #0] @--this stores the result in memory (from r3)
推荐答案
然而,在那之后,它从结果中减去了 (dividend/2^31).
After that, however, it subtracted (dividend/2^31) from the result.
其实就是减去dividend>>31
,对于负的dividend
为-1
,对于非负的被除数为0,当负整数右移是算术右移时(和int
是 32 位宽).
Actually, it subtracts dividend >> 31
, which is -1
for negative dividend
, and 0 for non-negative dividend, when right-shifting negative integers is arithmetic right-shift (and int
is 32 bits wide).
0x6666667 = (2^34 + 6)/10
所以对于 x <0
,我们有,用 -10 < 写
,x = 10*k + r
r <= 0
So for x < 0
, we have, writing x = 10*k + r
with -10 < r <= 0
,
0x66666667 * (10*k+r) = (2^34+6)*k + (2^34 + 6)*r/10 = 2^34*k + 6*k + (2^34+6)*r/10
现在,负整数的算术右移产生v/2^n
的下限,所以
Now, arithmetic right shift of negative integers yields the floor of v / 2^n
, so
(0x66666667 * x) >> 34
结果
k + floor((6*k + (2^34+6)*r/10) / 2^34)
所以我们需要看到
-2^34 < 6*k + (2^34+6)*r/10 < 0
正确的不等式很容易,k
和 r
都是非正的,而且不是都为 0.
The right inequality is easy, both k
and r
are non-positive, and not both are 0.
对于左不等式,还需要更多的分析.
For the left inequality, a bit more analysis is needed.
r >= -9
所以(2^34+6)*r/10
的绝对值最多为2^34+6 - (2^34+6)/10
.
|k| <= 2^31/10,
所以 <代码>|6*k|<= 3*2^31/5.
还有待验证
6 + 3*2^31/5 < (2^34+6)/10
1288490194 < 1717986919
是的,没错.
这篇关于GCC/ARM 上的快速除法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!