模（％）的GCC实现如何工作，为什么不使用div指令？ [英] How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?

查看：141 发布时间：2018/4/20 15:57:27 gcc assembly optimization x86

本文介绍了模（％）的GCC实现如何工作，为什么不使用div指令？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 
 我试着计算如何在汇编中计算模10，所以我在gcc中编译了下面的c代码来看看它出现了什么。 >  unsigned int i = 999; 
 unsigned int j = i％10;

令我惊讶的是，我得到了

  movl -4（％ebp），％ecx 
 movl $ -858993459，％edx 
 movl％ecx，％eax 
 mull％edx 
 shrl $ 3，％edx 
 movl％edx，％eax 
 sall $ 2，％eax 
 addl％edx，％eax 
 addl％eax，％eax 
 movl％ ecx，％edx 
 subl％eax，％edx 
 movl％edx，％eax 
 movl％eax，-12（％ebp）

<-4>其中-4（％ebp）或i是输入，-12（％ebp）或j是答案。我已经测试过，无论你编号为-4（％ebp），它都可以工作。

我的问题是这个代码是如何工作的，它有多好第二个问题： div 是

一个非常慢的指令（超过20个时钟周期）。上面的顺序包含了更多的指令，但它们都相对较快，所以它在速度方面是一个净赢。

前五条指令（包括 shrl ）计算i / 10（我将在一分钟内解释它）。接下来的几条指令将结果再乘以10，但避免 mul / imul 指示（不管这是赢还是取决于确切的处理器，你的目标 - 新的x86的有很快的乘数，但较旧的那些不）。

  movl％edx ，％eax; eax = i / 10 
 sall $ 2，％eax; eax =（i / 10）* 4 
 addl％edx，％eax; eax =（i / 10）* 4 +（i / 10）=（i / 10）* 5 
 addl％eax，％eax; eax =（i / 10）* 5 * 2 =（i / 10）* 10

然后再从 i 中减去以获得 i - （i / 10）* 10 ，它是 i％10 （对于无符号数）。

最后，关于i / 10的计算：基本思想是将除数除以10乘以1/10。编译器通过乘以（2 ** 35/10 + 1） - 这是加载到 edx 中的魔法值来进行定点逼近，尽管它作为即使它没有被签名的值 - 并将结果右移35，结果为所有32位整数提供了正确结果。

确定这种确保误差小于1的近似值（对于整数意味着它是正确的值），GCC显然使用一个：）

最后的评论：If你想实际看到GCC计算一个模，使除数变量（例如一个函数参数），所以它不能做这种优化。无论如何，在x86上，使用 div 来计算模数。 div 期望 edx：eax 中的64位除法（edx中的高32位，eax中的低32位 - 如果你使用32位数字，将edx清零），并将其除以你指定的任何操作数（例如 div ebx divides edx： eax 由 ebx ）。它返回 eax 中的商数和 edx 中的余数。 idiv 对签名值也是如此。

I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.
unsigned int i=999; unsigned int j=i%10;
To my surprise I got
movl -4(%ebp), %ecx movl $-858993459, %edx movl %ecx, %eax mull %edx shrl $3, %edx movl %edx, %eax sall $2, %eax addl %edx, %eax addl %eax, %eax movl %ecx, %edx subl %eax, %edx movl %edx, %eax movl %eax, -12(%ebp)
Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).

My question is how does this code work and how is it better than using the div operand.
解决方案
Second question first: div is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.

The first five instructions (up to and including the shrl) compute i/10 (I'll explain how in a minute).

The next few instructions multiply the result by 10 again, but avoiding the mul/imul instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).
movl %edx, %eax ; eax=i/10 sall $2, %eax ; eax=(i/10)*4 addl %edx, %eax ; eax=(i/10)*4 + (i/10) = (i/10)*5 addl %eax, %eax ; eax=(i/10)*5*2 = (i/10)*10
This is then subtracted from i again to obtain i - (i/10)*10 which is i % 10 (for unsigned numbers).

Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.

There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)

Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div. div expects the 64-bit dividend in edx:eax (high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx divides edx:eax by ebx). It returns the quotient in eax and the remainder in edx. idiv does the same for signed values.

这篇关于模（％）的GCC实现如何工作，为什么不使用div指令？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

模（％）的GCC实现如何工作，为什么不使用div指令？ [英] How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

模（％）的GCC实现如何工作，为什么不使用div指令？ [英] How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭