如何模块(%)工作的GCC实现,为什么它不使用DIV指令? [英] How does the GCC implementation of module (%) work, and why does it not use the div instruction?

查看:204
本文介绍了如何模块(%)工作的GCC实现,为什么它不使用DIV指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出如何在装配计算模10,所以我编译GCC下面的C code,看看它想出了。

I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.

unsigned int i=999;
unsigned int j=i%10;

要我惊讶的是我得到了

movl    -4(%ebp), %ecx
movl    $-858993459, %edx
movl    %ecx, %eax
mull    %edx
shrl    $3, %edx
movl    %edx, %eax
sall    $2, %eax
addl    %edx, %eax
addl    %eax, %eax
movl    %ecx, %edx
subl    %eax, %edx
movl    %edx, %eax
movl    %eax, -12(%ebp)

其中-4(%ebp的)或i是输入和-12(%ebp的)或J是答案。我测试过这一点,它并不管你做什么号-4(%EBP)工作。

Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).

我的问题是如何做到这一点code的工作,它是如何比使用div操作更好。

My question is how does this code work and how is it better than using the div operand.

推荐答案

第二个问题,第一: DIV 是一个非常缓慢的指令(超过20个时钟周期)。上面的序列由更多的指令,但他们都比较快,所以它在速度方面的净赢。

Second question first: div is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.

前五指令(直至并包括 SHRL )计算的i / 10(我将解释如何在一分钟内)。

The first five instructions (up to and including the shrl) compute i/10 (I'll explain how in a minute).

接下来的几个指令再乘以10的结果,但避免了 MUL / IMUL 的说明(这是否是一个双赢与否取决于你的目标的确切处理器 - 新x86s有非常快的乘数,但旧的没有)

The next few instructions multiply the result by 10 again, but avoiding the mul/imul instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).

movl    %edx, %eax   ; eax=i/10
sall    $2, %eax     ; eax=(i/10)*4
addl    %edx, %eax   ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl    %eax, %eax   ; eax=(i/10)*5*2 = (i/10)*10

这是再从减去我再次获得 I - (I / 10)* 10 I%10 (对于无符号数)。

This is then subtracted from i again to obtain i - (i/10)*10 which is i % 10 (for unsigned numbers).

最后,关于第i / 10的计算:基本的想法是通过10与乘以1/10取代除法。该编译器的这个定点近似用乘以(2 **一十分之三十五+ 1) - 这是加载到 EDX 魔法值,虽然它的输出作为签署价值,即使它真的无符号 - 和35右移结果这原来是给所有的32位整数正确的结果。

Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.

有算法来确定这种近似这保证误差小于1(整数意味着它是正确的值),GCC明显使用一个:)

There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)

最后再说一句:如果你想真正看到GCC计算模,使除数变量(如函数参数),所以它不能做这样的优化。无论如何,在x86,您可以使用计算模 DIV DIV 预计 64位股息EDX:EAX (EDX中的高32位,EAX低32位 - 明确EDX到零,如果你和一个32位数字的工作),并把通过任何操作数指定(如 DIV EBX 分歧 EDX: EAX EBX )。它返回 EAX 商和在 EDX 剩余部分。 IDIV 确实为签署价值是一样的。

Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div. div expects the 64-bit dividend in edx:eax (high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx divides edx:eax by ebx). It returns the quotient in eax and the remainder in edx. idiv does the same for signed values.

这篇关于如何模块(%)工作的GCC实现,为什么它不使用DIV指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆