模(%)的GCC实现如何工作,为什么不使用div指令? [英] How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?
问题描述
我试着计算如何在汇编中计算模10,所以我在gcc中编译了下面的c代码来看看它出现了什么。 >
unsigned int i = 999;
unsigned int j = i%10;
令我惊讶的是,我得到了
movl -4(%ebp),%ecx
movl $ -858993459,%edx
movl%ecx,%eax
mull%edx
shrl $ 3,%edx
movl%edx,%eax
sall $ 2,%eax
addl%edx,%eax
addl%eax,%eax
movl% ecx,%edx
subl%eax,%edx
movl%edx,%eax
movl%eax,-12(%ebp)
<-4>其中-4(%ebp)或i是输入,-12(%ebp)或j是答案。我已经测试过,无论你编号为-4(%ebp),它都可以工作。
我的问题是这个代码是如何工作的,它有多好第二个问题: div
是
一个非常慢的指令(超过20个时钟周期)。上面的顺序包含了更多的指令,但它们都相对较快,所以它在速度方面是一个净赢。
前五条指令(包括 shrl
)计算i / 10(我将在一分钟内解释它)。 接下来的几条指令将结果再乘以10,但避免 mul
/ imul
指示(不管这是赢还是取决于确切的处理器,你的目标 - 新的x86的有很快的乘数,但较旧的那些不)。
movl%edx ,%eax; eax = i / 10
sall $ 2,%eax; eax =(i / 10)* 4
addl%edx,%eax; eax =(i / 10)* 4 +(i / 10)=(i / 10)* 5
addl%eax,%eax; eax =(i / 10)* 5 * 2 =(i / 10)* 10
然后再从 i
中减去以获得 i - (i / 10)* 10
,它是 i%10
(对于无符号数)。
最后,关于i / 10的计算:基本思想是将除数除以10乘以1/10。编译器通过乘以(2 ** 35/10 + 1) - 这是加载到 edx
中的魔法值来进行定点逼近,尽管它作为即使它没有被签名的值 - 并将结果右移35,结果为所有32位整数提供了正确结果。
确定这种确保误差小于1的近似值(对于整数意味着它是正确的值),GCC显然使用一个:)
最后的评论:If你想实际看到GCC计算一个模,使除数变量(例如一个函数参数),所以它不能做这种优化。无论如何,在x86上,使用 div
来计算模数。 div
期望 edx:eax
中的64位除法(edx中的高32位,eax中的低32位 - 如果你使用32位数字,将edx清零),并将其除以你指定的任何操作数(例如 div ebx
divides edx: eax
由 ebx
)。它返回 eax
中的商数和 edx
中的余数。 idiv
对签名值也是如此。
I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.
unsigned int i=999;
unsigned int j=i%10;
To my surprise I got
movl -4(%ebp), %ecx
movl $-858993459, %edx
movl %ecx, %eax
mull %edx
shrl $3, %edx
movl %edx, %eax
sall $2, %eax
addl %edx, %eax
addl %eax, %eax
movl %ecx, %edx
subl %eax, %edx
movl %edx, %eax
movl %eax, -12(%ebp)
Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).
My question is how does this code work and how is it better than using the div operand.
Second question first: div
is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.
The first five instructions (up to and including the shrl
) compute i/10 (I'll explain how in a minute).
The next few instructions multiply the result by 10 again, but avoiding the mul
/imul
instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).
movl %edx, %eax ; eax=i/10
sall $2, %eax ; eax=(i/10)*4
addl %edx, %eax ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl %eax, %eax ; eax=(i/10)*5*2 = (i/10)*10
This is then subtracted from i
again to obtain i - (i/10)*10
which is i % 10
(for unsigned numbers).
Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx
, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.
There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)
Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div
. div
expects the 64-bit dividend in edx:eax
(high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx
divides edx:eax
by ebx
). It returns the quotient in eax
and the remainder in edx
. idiv
does the same for signed values.
这篇关于模(%)的GCC实现如何工作,为什么不使用div指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!