模(%)的GCC实现是如何工作的,为什么不使用div指令? [英] How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?

查看:36
本文介绍了模(%)的GCC实现是如何工作的,为什么不使用div指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出如何在汇编中计算模 10,所以我在 gcc 中编译了以下 c 代码以查看它的结果.

I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.

unsigned int i=999;
unsigned int j=i%10;

出乎我的意料

movl    -4(%ebp), %ecx
movl    $-858993459, %edx
movl    %ecx, %eax
mull    %edx
shrl    $3, %edx
movl    %edx, %eax
sall    $2, %eax
addl    %edx, %eax
addl    %eax, %eax
movl    %ecx, %edx
subl    %eax, %edx
movl    %edx, %eax
movl    %eax, -12(%ebp)

其中 -4(%ebp) 或 "i" 是输入,-12(%ebp) 或 "j" 是答案.我已经对此进行了测试,无论您制作什么数字 -4(%ebp),它都可以工作.

Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).

我的问题是这段代码是如何工作的,它比使用 div 操作数有什么好处.

My question is how does this code work and how is it better than using the div operand.

推荐答案

第一个问题:div 是一条非常慢的指令(超过 20 个时钟周期).上面的序列包含更多的指令,但它们都相对较快,因此在速度方面是绝对胜利.

Second question first: div is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.

前五个指令(直到并包括 shr​​l)计算 i/10(我将在一分钟内解释如何).

The first five instructions (up to and including the shrl) compute i/10 (I'll explain how in a minute).

接下来的几条指令再次将结果乘以 10,但避免使用 mul/imul 指令(这是否成功取决于您的确切处理器)重新定位 - 较新的 x86 具有非常快的乘法器,但较旧的则没有).

The next few instructions multiply the result by 10 again, but avoiding the mul/imul instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).

movl    %edx, %eax   ; eax=i/10
sall    $2, %eax     ; eax=(i/10)*4
addl    %edx, %eax   ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl    %eax, %eax   ; eax=(i/10)*5*2 = (i/10)*10

然后再次从 i 中减去,得到 i - (i/10)*10i % 10 (对于无符号数字).

This is then subtracted from i again to obtain i - (i/10)*10 which is i % 10 (for unsigned numbers).

最后,关于i/10的计算:基本思想是用乘以1/10代替除以10.编译器通过乘以 (2**35/10 + 1) 来对此进行定点近似 - 这是加载到 edx 中的魔法值,尽管它输出为有符号值,即使它确实是无符号 - 并将结果右移 35.结果证明所有 32 位整数都能得到正确的结果.

Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.

有算法可以确定这种近似值,保证误差小于 1(对于整数意味着它是正确的值),而 GCC 显然使用了一个 :)

There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)

最后一句话:如果你想真正看到 GCC 计算模数,请制作除数变量(例如函数参数),这样它就不能做这种优化.无论如何,在 x86 上,您使用 div 计算模.div 期望 edx:eax 中的 64 位红利(edx 中的高 32 位,eax 中的低 32 位 - 如果您正在使用 32-bit 数)并除以您指定的任何操作数(例如 div ebxedx:eax 除以 ebx).它返回 eax 中的商和 edx 中的余数.idiv 对有符号的值做同样的事情.

Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div. div expects the 64-bit dividend in edx:eax (high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx divides edx:eax by ebx). It returns the quotient in eax and the remainder in edx. idiv does the same for signed values.

这篇关于模(%)的GCC实现是如何工作的,为什么不使用div指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆