快速司GCC / ARM [英] Fast Division on GCC/ARM

查看:292
本文介绍了快速司GCC / ARM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知大多数编译器将乘以然后执行快速除法位右移。举例来说,如果你检查这个SO线程它说,当你问微软编译器执行除以10将通过0x1999999A乘被除数(即2 ^ 32/10),然后通过2 ^ 32(使用32向右移动)划分的结果。

到目前为止好。

在我使用GCC ARM的测试相同除以10,不过,编译器做了一些略有不同。首先,它通过0x66666667(2 ^ 34/10)乘以股息,然后除以2 ^ 34分的结果。迄今为止,这是一样的微软,除了使用较高的乘数。在这之后,但是,它减去(股息/ 2 ^ 31)从结果。

我的提问:为什么在ARM版有额外的减法?你能给我一个数字例子,如果没有相减的结果将是错误的?

如果您想检查生成的code,它的下面(有我的评语):

  LDR R2,[R7,#4] @  - 这个加载股息从内存到R2
        MOVW R3,#:lower16:1717986919 @ - 移动恒定的低16位
        MOVT R3,#:upper16:1717986919 @ - 移动恒定的高16位
        SMULL R1,R3,R3,R2 @ - 乘法长,把低32位中,R1,R3中高32
        ASR R1,R3,#2 @ - R 3 GT;大于2,然后将其存储在R1(有效>> 34,由于R3较高乘法的32位)
        ASR R3,R2,#31 @ - 红利>> 31,然后存储在R3
        RSB R3,R3,R1 @ - R1 - R3,店R3
        STR R3,[R7,#0] // @ - 这个结果存储在内存中(从R3)


解决方案

  

此后,然而,它从结果中减去(股息/ 2 ^ 31)。


其实,它减去红利>> 31 ,这是 1 股息 0的非负股息当右移负整数算术是右移位(和 INT 在32比特宽)。

  0x6666667 =(2 ^ 34 + 6)/ 10

因此​​,对于 X< 0 ,我们有,写 X = 10 * K + R -10℃;为r = 0

  0x66666667 *(10 * K + R)=(2 ^ 34 + 6)* K +(2 ^ 34 + 6)* R / 10 = 2 ^ 34 * K + 6 * K +(2 ^ 34 + 6)* R / 10

现在,负整数算术右移产生的地板 V / 2 ^ N ,所以

 (0x66666667 * X)GT;> 34

结果

  K +楼((6 * K +(2 ^ 34 + 6)* R / 10)/ 2 ^ 34)

因此​​,我们需要看到的是

  -2 ^ 34  - ; 6 * K +(2 ^ 34 + 6)* R / 10下; 0

正确的不平等很容易,无论是 K 研究都是非正的,不能同时为0。

有关左侧的不平等,就需要更多的分析。

  R> = -9

这样的绝对值(2 ^ 34 + 6)* R / 10 最多为 2 ^ 34 + 6 - (2 ^ 34 + 6)/ 10

  | K | < = 2 ^ 31/10,

所以 | 6 * K | < = 3 * 2 ^ 31/5

和它仍然以验证

  6 + 3 * 2 ^ 31/5< (2 ^ 34 + 6)/ 10
1288490194< 1717986919

是的,真实的。

As far as I know most compilers will do fast division by multiplying and then bit shifting to the right. For instance, if you check this SO thread it says that when you ask the Microsoft compiler to do division by 10 it will multiply the dividend by 0x1999999A (which is 2^32/10) and then divide the result by 2^32 (using 32 shifts to the right).

So far so good.

Once I tested the same division by 10 on ARM using GCC, though, the compiler did something slightly different. First it multiplied the dividend by 0x66666667 (2^34/10), then it divided the result by 2^34. Thus far it's the same as Microsoft, except using a higher multiplier. After that, however, it subtracted (dividend/2^31) from the result.

My question: why on the ARM version there's that extra subtraction? Can you give me an numeric example where without that subtraction the result will be wrong?

If you want to check the generated code, it's below (with my comments):

        ldr     r2, [r7, #4] @--this loads the dividend from memory into r2
        movw    r3, #:lower16:1717986919 @--moves the lower 16 bits of the constant 
        movt    r3, #:upper16:1717986919 @--moves the upper 16 bits of the constant
        smull   r1, r3, r3, r2 @--multiply long, put lower 32 bits in r1, higher 32 in r3
        asr     r1, r3, #2 @--r3>>2, then store in r1 (effectively >>34, since r3 was higher 32 bits of multiplication)
        asr     r3, r2, #31 @--dividend>>31, then store in r3
        rsb     r3, r3, r1 @--r1 - r3, store in r3
        str     r3, [r7, #0] @--this stores the result in memory (from r3) 

解决方案

After that, however, it subtracted (dividend/2^31) from the result.

Actually, it subtracts dividend >> 31, which is -1 for negative dividend, and 0 for non-negative dividend, when right-shifting negative integers is arithmetic right-shift (and int is 32 bits wide).

0x6666667 = (2^34 + 6)/10

So for x < 0, we have, writing x = 10*k + r with -10 < r <= 0,

0x66666667 * (10*k+r) = (2^34+6)*k + (2^34 + 6)*r/10 = 2^34*k + 6*k + (2^34+6)*r/10

Now, arithmetic right shift of negative integers yields the floor of v / 2^n, so

(0x66666667 * x) >> 34

results in

k + floor((6*k + (2^34+6)*r/10) / 2^34)

So we need to see that

-2^34 < 6*k + (2^34+6)*r/10 < 0

The right inequality is easy, both k and r are non-positive, and not both are 0.

For the left inequality, a bit more analysis is needed.

r >= -9

so the absolute value of (2^34+6)*r/10 is at most 2^34+6 - (2^34+6)/10.

|k| <= 2^31/10,

so |6*k| <= 3*2^31/5.

And it remains to verify that

6 + 3*2^31/5 < (2^34+6)/10
1288490194   < 1717986919

Yup, true.

这篇关于快速司GCC / ARM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆