最佳汇编或编译的三个值中的最低 [英] Best assembly or compilation for minimum of three values

查看:163
本文介绍了最佳汇编或编译的三个值中的最低的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看用GCC-4.8 x86_64的产生code和不知道是否有一个更好的(更快)的方式来计算最低的三个值的。

I'm looking at code generated by GCC-4.8 for x86_64 and wondering if there is a better (faster) way to compute the minimum of three values.

下面是从Python的收藏的模块,计算最低 M rightindex + 1 leftindex

Here's an excerpt from Python's collections module that computes the minimum of m, rightindex+1, and leftindex:

    ssize_t m = n;
    if (m > rightindex + 1)
        m = rightindex + 1;
    if (m > leftindex)
        m = leftindex;

GCC产生连续依赖性code。与CMOVs:

GCC generates serially dependent code with CMOVs:

leaq    1(%rbp), %rdx
cmpq    %rsi, %rdx
cmovg   %rsi, %rdx
cmpq    %rbx, %rdx
cmovg   %rbx, %rdx

有没有更快的code,可以通过去除数据相关性乱序执行的并行处理器承担的优势?如果存在已知的技巧用于计算最小多个值,而无需使用的条件和predicated说明我不知道。如果有一些饱和算法内部函数,这将有助于在这种情况下我也想知道。

Is there faster code that can take advantage of processor out-of-order parallel execution by removing the data dependencies? I'm wondering if there are known tricks for computing the minimum of multiple values without using conditionals or predicated instructions. Am also wondering if there are some saturating arithmetic intrinsics that would help in this situation.

编辑:


  • 如图所示,code使用符号的算术运算,而是一个无符号运算答案会有所帮助。

  • 我问一个最小的三也很感兴趣,其中n是小的。
  • -N最小的
  • 莱纳斯对CMOV告诫:<一href=\"http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus\">http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

  • As shown the code uses signed arithmetic, but an unsigned arithmetic answer would help as well.
  • I asked about a minimum-of-three but also am interested in minimum-of-n where n is small.
  • Linus's admonitions on CMOV: http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

推荐答案

两个无符号数最少有经典的解决方案:

Minimum of two unsigned numbers has classical solution:

; eax = min(eax, ebx), ecx - scratch register.
.min2:
    sub     ebx, eax
    sbb     ecx, ecx
    and     ecx, ebx
    add     eax, ecx

这方法可能比CMOV的解决方案,但对更高速度的指示必须由并行执行其它指令中分离出来。

This approach is probably faster than the solution with cmov, but for higher speed the instructions have to be separated by other instructions for parallel execution.

该方法的三个数字的实现是可能的:

Implementation of this method for three numbers is possible:

; eax = min(eax, ebx, edx), ecx - scratch register.
.min3:
    sub     ebx, eax
    sbb     ecx, ecx
    and     ecx, ebx
    add     eax, ecx

    sub     edx, eax
    sbb     ecx, ecx
    and     ecx, edx
    add     eax, ecx

闯闯是测试与条件跳转的变体。对于现代的处理器,它可能是更快,尤其是当跳跃是高度predictable:

Another try is to test the variant with conditional jumps. For the modern processors, it might be even faster, especially if the jumps are highly predictable:

.min3:
    cmp     eax, ebx
    jle     @f
    mov     eax, ebx
@@:
    cmp     eax, edx
    jle     @f
    mov     eax, edx
@@:

这篇关于最佳汇编或编译的三个值中的最低的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆