模(%)的GCC实现如何工作,为什么不使用div指令? [英] How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?

查看:141
本文介绍了模(%)的GCC实现如何工作,为什么不使用div指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 

我试着计算如何在汇编中计算模10,所以我在gcc中编译了下面的c代码来看看它出现了什么。 > unsigned int i = 999;
unsigned int j = i%10;

令我惊讶的是,我得到了

  movl -4(%ebp),%ecx 
movl $ -858993459,%edx
movl%ecx,%eax
mull%edx
shrl $ 3,%edx
movl%edx,%eax
sall $ 2,%eax
addl%edx,%eax
addl%eax,%eax
movl% ecx,%edx
subl%eax,%edx
movl%edx,%eax
movl%eax,-12(%ebp)



<-4>其中-4(%ebp)或i是输入,-12(%ebp)或j是答案。我已经测试过,无论你编号为-4(%ebp),它都可以工作。



我的问题是这个代码是如何工作的,它有多好第二个问题: div

一个非常慢的指令(超过20个时钟周期)。上面的顺序包含了更多的指令,但它们都相对较快,所以它在速度方面是一个净赢。



前五条指令(包括 shrl )计算i / 10(我将在一分钟内解释它)。 接下来的几条指令将结果再乘以10,但避免 mul / imul 指示(不管这是赢还是取决于确切的处理器,你的目标 - 新的x86的有很快的乘数,但较旧的那些不)。

  movl%edx ,%eax; eax = i / 10 
sall $ 2,%eax; eax =(i / 10)* 4
addl%edx,%eax; eax =(i / 10)* 4 +(i / 10)=(i / 10)* 5
addl%eax,%eax; eax =(i / 10)* 5 * 2 =(i / 10)* 10

然后再从 i 中减去以获得 i - (i / 10)* 10 ,它是 i%10 (对于无符号数)。

最后,关于i / 10的计算:基本思想是将除数除以10乘以1/10。编译器通过乘以(2 ** 35/10 + 1) - 这是加载到 edx 中的魔法值来进行定点逼近,尽管它作为即使它没有被签名的值 - 并将结果右移35,结果为所有32位整数提供了正确结果。



确定这种确保误差小于1的近似值(对于整数意味着它是正确的值),GCC显然使用一个:)

最后的评论:If你想实际看到GCC计算一个模,使除数变量(例如一个函数参数),所以它不能做这种优化。无论如何,在x86上,使用 div 来计算模数。 div 期望 edx:eax 中的64位除法(edx中的高32位,eax中的低32位 - 如果你使用32位数字,将edx清零),并将其除以你指定的任何操作数(例如 div ebx divides edx: eax ebx )。它返回 eax 中的商数和 edx 中的余数。 idiv 对签名值也是如此。


I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.

unsigned int i=999;
unsigned int j=i%10;

To my surprise I got

movl    -4(%ebp), %ecx
movl    $-858993459, %edx
movl    %ecx, %eax
mull    %edx
shrl    $3, %edx
movl    %edx, %eax
sall    $2, %eax
addl    %edx, %eax
addl    %eax, %eax
movl    %ecx, %edx
subl    %eax, %edx
movl    %edx, %eax
movl    %eax, -12(%ebp)

Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).

My question is how does this code work and how is it better than using the div operand.

解决方案

Second question first: div is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.

The first five instructions (up to and including the shrl) compute i/10 (I'll explain how in a minute).

The next few instructions multiply the result by 10 again, but avoiding the mul/imul instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).

movl    %edx, %eax   ; eax=i/10
sall    $2, %eax     ; eax=(i/10)*4
addl    %edx, %eax   ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl    %eax, %eax   ; eax=(i/10)*5*2 = (i/10)*10

This is then subtracted from i again to obtain i - (i/10)*10 which is i % 10 (for unsigned numbers).

Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.

There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)

Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div. div expects the 64-bit dividend in edx:eax (high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx divides edx:eax by ebx). It returns the quotient in eax and the remainder in edx. idiv does the same for signed values.

这篇关于模(%)的GCC实现如何工作,为什么不使用div指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆