最佳汇编或编译的三个值中的最低 [英] Best assembly or compilation for minimum of three values
问题描述
我看用GCC-4.8 x86_64的产生code和不知道是否有一个更好的(更快)的方式来计算最低的三个值的。
I'm looking at code generated by GCC-4.8 for x86_64 and wondering if there is a better (faster) way to compute the minimum of three values.
下面是从Python的收藏的模块,计算最低 M
的 rightindex + 1 $ C摘录$ C>和
leftindex
:
Here's an excerpt from Python's collections module that computes the minimum of m
, rightindex+1
, and leftindex
:
ssize_t m = n;
if (m > rightindex + 1)
m = rightindex + 1;
if (m > leftindex)
m = leftindex;
GCC产生连续依赖性code。与CMOVs:
GCC generates serially dependent code with CMOVs:
leaq 1(%rbp), %rdx
cmpq %rsi, %rdx
cmovg %rsi, %rdx
cmpq %rbx, %rdx
cmovg %rbx, %rdx
有没有更快的code,可以通过去除数据相关性乱序执行的并行处理器承担的优势?如果存在已知的技巧用于计算最小多个值,而无需使用的条件和predicated说明我不知道。如果有一些饱和算法内部函数,这将有助于在这种情况下我也想知道。
Is there faster code that can take advantage of processor out-of-order parallel execution by removing the data dependencies? I'm wondering if there are known tricks for computing the minimum of multiple values without using conditionals or predicated instructions. Am also wondering if there are some saturating arithmetic intrinsics that would help in this situation.
编辑:
- 如图所示,code使用符号的算术运算,而是一个无符号运算答案会有所帮助。
- 我问一个最小的三也很感兴趣,其中n是小的。 -N最小的
- 莱纳斯对CMOV告诫:<一href=\"http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus\">http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus
- As shown the code uses signed arithmetic, but an unsigned arithmetic answer would help as well.
- I asked about a minimum-of-three but also am interested in minimum-of-n where n is small.
- Linus's admonitions on CMOV: http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus
推荐答案
两个无符号数最少有经典的解决方案:
Minimum of two unsigned numbers has classical solution:
; eax = min(eax, ebx), ecx - scratch register.
.min2:
sub ebx, eax
sbb ecx, ecx
and ecx, ebx
add eax, ecx
这方法可能比CMOV的解决方案,但对更高速度的指示必须由并行执行其它指令中分离出来。
This approach is probably faster than the solution with cmov, but for higher speed the instructions have to be separated by other instructions for parallel execution.
该方法的三个数字的实现是可能的:
Implementation of this method for three numbers is possible:
; eax = min(eax, ebx, edx), ecx - scratch register.
.min3:
sub ebx, eax
sbb ecx, ecx
and ecx, ebx
add eax, ecx
sub edx, eax
sbb ecx, ecx
and ecx, edx
add eax, ecx
闯闯是测试与条件跳转的变体。对于现代的处理器,它可能是更快,尤其是当跳跃是高度predictable:
Another try is to test the variant with conditional jumps. For the modern processors, it might be even faster, especially if the jumps are highly predictable:
.min3:
cmp eax, ebx
jle @f
mov eax, ebx
@@:
cmp eax, edx
jle @f
mov eax, edx
@@:
这篇关于最佳汇编或编译的三个值中的最低的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!