如何模块(%)工作的GCC实现,为什么它不使用DIV指令? [英] How does the GCC implementation of module (%) work, and why does it not use the div instruction?
问题描述
我试图找出如何在装配计算模10,所以我编译GCC下面的C code,看看它想出了。
I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.
unsigned int i=999;
unsigned int j=i%10;
要我惊讶的是我得到了
movl -4(%ebp), %ecx
movl $-858993459, %edx
movl %ecx, %eax
mull %edx
shrl $3, %edx
movl %edx, %eax
sall $2, %eax
addl %edx, %eax
addl %eax, %eax
movl %ecx, %edx
subl %eax, %edx
movl %edx, %eax
movl %eax, -12(%ebp)
其中-4(%ebp的)或i是输入和-12(%ebp的)或J是答案。我测试过这一点,它并不管你做什么号-4(%EBP)工作。
Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).
我的问题是如何做到这一点code的工作,它是如何比使用div操作更好。
My question is how does this code work and how is it better than using the div operand.
推荐答案
第二个问题,第一: DIV
是一个非常缓慢的指令(超过20个时钟周期)。上面的序列由更多的指令,但他们都比较快,所以它在速度方面的净赢。
Second question first: div
is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.
前五指令(直至并包括 SHRL
)计算的i / 10(我将解释如何在一分钟内)。
The first five instructions (up to and including the shrl
) compute i/10 (I'll explain how in a minute).
接下来的几个指令再乘以10的结果,但避免了 MUL
/ IMUL
的说明(这是否是一个双赢与否取决于你的目标的确切处理器 - 新x86s有非常快的乘数,但旧的没有)
The next few instructions multiply the result by 10 again, but avoiding the mul
/imul
instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).
movl %edx, %eax ; eax=i/10
sall $2, %eax ; eax=(i/10)*4
addl %edx, %eax ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl %eax, %eax ; eax=(i/10)*5*2 = (i/10)*10
这是再从减去我
再次获得 I - (I / 10)* 10
是 I%10
(对于无符号数)。
This is then subtracted from i
again to obtain i - (i/10)*10
which is i % 10
(for unsigned numbers).
最后,关于第i / 10的计算:基本的想法是通过10与乘以1/10取代除法。该编译器的这个定点近似用乘以(2 **一十分之三十五+ 1) - 这是加载到 EDX
魔法值,虽然它的输出作为签署价值,即使它真的无符号 - 和35右移结果这原来是给所有的32位整数正确的结果。
Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx
, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.
有算法来确定这种近似这保证误差小于1(整数意味着它是正确的值),GCC明显使用一个:)
There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)
最后再说一句:如果你想真正看到GCC计算模,使除数变量(如函数参数),所以它不能做这样的优化。无论如何,在x86,您可以使用计算模 DIV
。 DIV
预计 64位股息EDX:EAX
(EDX中的高32位,EAX低32位 - 明确EDX到零,如果你和一个32位数字的工作),并把通过任何操作数指定(如 DIV EBX
分歧 EDX: EAX
按 EBX
)。它返回 EAX
商和在 EDX
剩余部分。 IDIV
确实为签署价值是一样的。
Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div
. div
expects the 64-bit dividend in edx:eax
(high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx
divides edx:eax
by ebx
). It returns the quotient in eax
and the remainder in edx
. idiv
does the same for signed values.
这篇关于如何模块(%)工作的GCC实现,为什么它不使用DIV指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!