如何实现在64位code高效的32位DivMod [英] How do I implement an efficient 32 bit DivMod in 64 bit code

查看:190
本文介绍了如何实现在64位code高效的32位DivMod的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用其操作完全在32位操作数 DivMod 功能。在RTL的实施在16位变量返回的值。它的声明是:

 程序DivMod(分红:红衣主教;除数:字; VAR结果,余数:字);

所以,我不能使用,因为我的投入可能会溢出的返回值。

天真帕斯卡尔实施看起来是这样的:

 程序DivMod(被除数,除数:红衣主教;出商,余数:红衣主教);
开始
  商:=股利DIV除数;
  余数:= MOD股息除数;
结束;

这工作出色,但执行师的两倍。由于该功能是通过我的code,它是一个性能瓶颈的部分叫,我想只有一次执行除法。为此,我利用这个问题SERG的32位DivMod:<一href=\"http://stackoverflow.com/questions/2397656/is-there-a-divmod-that-is-not-limited-to-words-65535\">Is那里是*不*仅限于一个字DivMod?(小于= 65535)

 程序DivMod(被除数,除数:红衣主教;出商,余数:红衣主教);
ASM
        PUSH EBX
        MOV EBX,EDX
        XOR EDX,EDX
        DIV EBX
        MOV [ECX],EAX
        MOV EBX,余
        MOV [EBX],EDX
        POP EBX
结束;

这完美的作品。

但现在我想对64位code版本的功能。请注意,我还是想在32位运算操作,并返回32位值。

我应该重新编写使用64位汇编功能,或者是足够用从运行在RTL,然后返回,64位值 DivMod 超载?

我特别想知道是否有书面的64位code,做32位操作的性能优势。是,即使可能吗?或者我会简单地结束了重新实施 UINT64 参数 DivMod 超载?如果它是值得推行一个定制的64位版本的ASM,我怎么会去这样做,指出操作数和操作都是32位的。

我认为是这样的,但我不是专家,并有可能已经得到一些错误:

 程序DivMod(被除数,除数:红衣主教;出商,余数:红衣主教);
ASM
        MOV EAX,ECX //移动红利来EAX
        MOV ECX,EDX //移动除数ECX
        XOR EDX,EDX // zeroise EDX
        通过ECX EAX:DIV ECX // EDX分
        MOV [R8],EAX //保存商数
        MOV [R9],EDX //保存剩余
结束;


解决方案

我挖了一个深一点。我认为这是完全合理地实施这一上的 UINT64 版本之上。这将是这样的:

 程序DivMod(被除数,除数:红衣主教;出商,余数:红衣主教);
VAR
  Quotient64,Remainder64:UINT64;
开始
  DivMod(被除数,除数,Quotient64,Remainder64);
  商:= Quotient64;
  余数:= Remainder64;
结束;

我不认为表现会相比,最优化的ASM版本非常显著的影​​响。

不过,我相信这个问题是x64 ASM code是正确的。在 MOV 说明所有32位操作数的罚款。和 DIV 也可作为在ASM code中的注释说明。英特尔<一个href=\"http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf\"相对=nofollow>为文档 DIV转/ M32 说:


  

无符号除法EDX:EAX由R / M32,存储在EAX←商,EDX结果←余数


和让我们一起来看看这个code中的Delphi编译器做什么:

  VAR
  A,B,C,D:红衣主教;
....
一:= 666;
A:= 42;
C:=一个div B:
D:=modB:

这是生产的code是:


Project39.dpr.14:一:= 666;
0000000000423A68 C7450C9A020000 MOV [RBP + $ 0C],$ 0000029a
Project39.dpr.15 B:= 42;
0000000000423A6F C745082A000000 MOV [RBP + $ 08],$ 0000002A
Project39.dpr.16:C:=一个div B:
0000000000423A76 8B450C MOV EAX,[RBP + $ 0C]
0000000000423A79 33D2 XOR EDX,EDX
0000000000423A7B F77508 DIV DWORD PTR [RBP + $ 08]
0000000000423A7E 894504 MOV [RBP + $ 04],EAX
Project39.dpr.17:D:=modB:
0000000000423A81 8B450C MOV EAX,[RBP + $ 0C]
0000000000423A84 33D2 XOR EDX,EDX
0000000000423A86 F77508 DIV DWORD PTR [RBP + $ 08]
0000000000423A89 895500 MOV [RBP + $ 00],EDX

我没有任何期望,32位除法会比64位除法更有效率,但这并不重要。这似乎更自然的与32位操作数执行32位操作。

I want to use a DivMod function that operates exclusively on 32 bit operands. The implementation in the RTL returns values in 16 bit variables. Its declaration is:

procedure DivMod(Dividend: Cardinal; Divisor: Word; var Result, Remainder: Word);

So, I cannot use that since my inputs may overflow the return values.

The naive Pascal implementation looks like this:

procedure DivMod(Dividend, Divisor: Cardinal; out Quotient, Remainder: Cardinal);
begin
  Quotient := Dividend div Divisor;
  Remainder := Dividend mod Divisor;
end;

This works splendidly but performs the division twice. Since the function is called by part of my code that is in a performance bottleneck, I would like to perform the division once only. To that end I am using Serg's 32 bit DivMod from this question: Is there a DivMod that is *not* Limited to Words (<=65535)?

procedure DivMod(Dividend, Divisor: Cardinal; out Quotient, Remainder: Cardinal);
asm
        PUSH EBX
        MOV  EBX,EDX
        XOR  EDX,EDX
        DIV  EBX
        MOV  [ECX],EAX
        MOV  EBX,Remainder
        MOV  [EBX],EDX
        POP  EBX
end;

This works perfectly.

But now I would like a version of the function for 64 bit code. Note that I still want to operate on 32 bit operands, and return 32 bit values.

Should I re-write the function using 64 bit assembler, or is it sufficient to use the DivMod overload from the RTL that operates on, and returns, 64 bit values?

Specifically I would like to know if there is a performance benefit in writing 64 bit code that does 32 bit operations. Is that even possible? Or would I simply end up re-implementing the DivMod overload with UInt64 parameters? If it is worth implementing a bespoke 64 bit asm version, how would I go about doing it, noting that the operands and operations are 32 bit.

I think that it would look like this, but I am no expert and likely have got something wrong:

procedure DivMod(Dividend, Divisor: Cardinal; out Quotient, Remainder: Cardinal);
asm
        MOV   EAX,ECX   // move Dividend to EAX
        MOV   ECX,EDX   // move Divisor to ECX
        XOR   EDX,EDX   // zeroise EDX
        DIV   ECX       // divide EDX:EAX by ECX
        MOV   [R8],EAX  // save quotient
        MOV   [R9],EDX  // save remainder
end;

解决方案

I dug a bit deeper. I think it would be perfectly reasonably to implement this on top of the UInt64 version. That would look like this:

procedure DivMod(Dividend, Divisor: Cardinal; out Quotient, Remainder: Cardinal);
var
  Quotient64, Remainder64: UInt64;
begin
  DivMod(Dividend, Divisor, Quotient64, Remainder64);
  Quotient := Quotient64;
  Remainder := Remainder64;
end;

I don't think the performance would be very significantly affected in comparison to the most optimal asm version.

However, I believe that the x64 asm code in the question is correct. The MOV instructions are all fine with 32 bit operands. And the DIV is also as described in the comment in the asm code. The Intel documentation for DIV r/m32 says:

Unsigned divide EDX:EAX by r/m32, with result stored in EAX ← Quotient, EDX ← Remainder.

And let's take a look at what the Delphi compiler does with this code:

var
  a, b, c, d: Cardinal;
....
a := 666;
b := 42;
c := a div b;
d := a mod b;

The code that is produced is:

    
Project39.dpr.14: a := 666;
0000000000423A68 C7450C9A020000   mov [rbp+$0c],$0000029a
Project39.dpr.15: b := 42;
0000000000423A6F C745082A000000   mov [rbp+$08],$0000002a
Project39.dpr.16: c := a div b;
0000000000423A76 8B450C           mov eax,[rbp+$0c]
0000000000423A79 33D2             xor edx,edx
0000000000423A7B F77508           div dword ptr [rbp+$08]
0000000000423A7E 894504           mov [rbp+$04],eax
Project39.dpr.17: d := a mod b;
0000000000423A81 8B450C           mov eax,[rbp+$0c]
0000000000423A84 33D2             xor edx,edx
0000000000423A86 F77508           div dword ptr [rbp+$08]
0000000000423A89 895500           mov [rbp+$00],edx

I don't have any expectation that the 32 bit divide will be more efficient than a 64 bit divide, but that doesn't really matter. It seems more natural to perform the 32 bit operation with 32 bit operands.

这篇关于如何实现在64位code高效的32位DivMod的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆