高效的整数比较功能 [英] Efficient integer compare function
问题描述
的比较
函数是一个有两个参数 A
和 B
键,返回描述它们的顺序整数。如果 A
比小b
,结果是一些负整数。如果 A
比大b
,结果是正整数。否则, A
和 B
都是平等的,并且结果是零。
The compare
function is a function that takes two arguments a
and b
and returns an integer describing their order. If a
is smaller than b
, the result is some negative integer. If a
is bigger than b
, the result is some positive integer. Otherwise, a
and b
are equal, and the result is zero.
此功能常被用来参数化排序和从标准库搜索算法
This function is often used to parametrize sorting and searching algorithms from standard libraries.
实施比较
的字符功能相当容易;您只需减去参数:
Implementing the compare
function for characters is quite easy; you simply subtract the arguments:
int compare_char(char a, char b)
{
return a - b;
}
此工作,因为两个字符之间的差,一般假定为适合的整数。 (注意,这个假设不成立的系统,其中的sizeof(字符)==的sizeof(INT)
)
This works because the difference between two characters is generally assumed to fit into an integer. (Note that this assumption does not hold for systems where sizeof(char) == sizeof(int)
.)
这招不能工作比较整数,因为两个整数之间的差异一般不适合到一个整数。例如, INT_MAX - (-1)= INT_MIN
提示 INT_MAX
比小 - 1
(技术上,溢出导致不确定的行为,但让我们假设模运算)。
This trick cannot work to compare integers, because the difference between two integers generally does not fit into an integer. For example, INT_MAX - (-1) = INT_MIN
suggests that INT_MAX
is smaller than -1
(technically, the overflow leads to undefined behavior, but let's assume modulo arithmetic).
那么,如何才能实现有效的整数比较功能?这是我第一次尝试:
So how can we implement the compare function efficiently for integers? Here is my first attempt:
int compare_int(int a, int b)
{
int temp;
int result;
__asm__ __volatile__ (
"cmp %3, %2 \n\t"
"mov $0, %1 \n\t"
"mov $1, %0 \n\t"
"cmovg %0, %1 \n\t"
"mov $-1, %0 \n\t"
"cmovl %0, %1 \n\t"
: "=r"(temp), "=r"(result)
: "r"(a), "r"(b)
: "cc");
return result;
}
是否可以在不到6指令做了什么?是否有更有效不太直接的方式?
Can it be done in less than 6 instructions? Is there a less straightforward way that is more efficient?
推荐答案
下面一直被证明是对我来说是非常有效的:
The following has always proven to be fairly efficient for me:
return (a < b) ? -1 : (a > b);
使用的gcc -O2 -S
,这编译为以下五个指令:
With gcc -O2 -S
, this compiles down to the following five instructions:
xorl %edx, %edx
cmpl %esi, %edi
movl $-1, %eax
setg %dl
cmovge %edx, %eax
作为后续 Ambroz Bizjak的好伴侣回答的,我不相信,他的测试程序相同的程序集code什么张贴以上。而且,当我更仔细地研究了编译器的输出,我注意到,作为被张贴在任何我们的答案编译器不产生相同的指令。所以,我把他的测试程序,修改手工装配输出以匹配我们发布,并比较所产生的时间。 看来两个版本比较大致相同。
As a follow-up to Ambroz Bizjak's excellent companion answer, I was not convinced that his program tested the same assembly code what was posted above. And, when I was studying the compiler output more closely, I noticed that the compiler was not generating the same instructions as was posted in either of our answers. So, I took his test program, hand modified the assembly output to match what we posted, and compared the resulting times. It seems the two versions compare roughly identically.
./opt_cmp_branchless: 0m1.070s
./opt_cmp_branch: 0m1.037s
我张贴的每个程序的装配全让别人也可以尝试同样的实验,并确认或反驳我的观察。
I am posting the assembly of each program in full so that others may attempt the same experiment, and confirm or contradict my observation.
以下是用 cmovge
指令((版本A&LT; B)-1:(A&GT; B)
)
.file "cmp.c"
.text
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "%d=0\n"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB20:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
pushq %rbx
.cfi_def_cfa_offset 24
.cfi_offset 3, -24
movl $arr.2789, %ebx
subq $8, %rsp
.cfi_def_cfa_offset 32
.L9:
leaq 4(%rbx), %rbp
.L10:
call rand
movb %al, (%rbx)
addq $1, %rbx
cmpq %rbx, %rbp
jne .L10
cmpq $arr.2789+4096, %rbp
jne .L9
xorl %r8d, %r8d
xorl %esi, %esi
orl $-1, %edi
.L12:
xorl %ebp, %ebp
.p2align 4,,10
.p2align 3
.L18:
movl arr.2789(%rbp), %ecx
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L15:
movl arr.2789(%rax), %edx
xorl %ebx, %ebx
cmpl %ecx, %edx
movl $-1, %edx
setg %bl
cmovge %ebx, %edx
addq $4, %rax
addl %edx, %esi
cmpq $4096, %rax
jne .L15
addq $4, %rbp
cmpq $4096, %rbp
jne .L18
addl $1, %r8d
cmpl $500, %r8d
jne .L12
movl $.LC0, %edi
xorl %eax, %eax
call printf
addq $8, %rsp
.cfi_def_cfa_offset 24
xorl %eax, %eax
popq %rbx
.cfi_def_cfa_offset 16
popq %rbp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE20:
.size main, .-main
.local arr.2789
.comm arr.2789,4096,32
.section .note.GNU-stack,"",@progbits
版本低于使用方法网点((A&GT; B) - (A&LT; B)
)
.file "cmp.c"
.text
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "%d=0\n"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB20:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
pushq %rbx
.cfi_def_cfa_offset 24
.cfi_offset 3, -24
movl $arr.2789, %ebx
subq $8, %rsp
.cfi_def_cfa_offset 32
.L9:
leaq 4(%rbx), %rbp
.L10:
call rand
movb %al, (%rbx)
addq $1, %rbx
cmpq %rbx, %rbp
jne .L10
cmpq $arr.2789+4096, %rbp
jne .L9
xorl %r8d, %r8d
xorl %esi, %esi
.L19:
movl %ebp, %ebx
xorl %edi, %edi
.p2align 4,,10
.p2align 3
.L24:
movl %ebp, %ecx
xorl %eax, %eax
jmp .L22
.p2align 4,,10
.p2align 3
.L20:
movl arr.2789(%rax), %ecx
.L22:
xorl %edx, %edx
cmpl %ebx, %ecx
setg %cl
setl %dl
movzbl %cl, %ecx
subl %ecx, %edx
addl %edx, %esi
addq $4, %rax
cmpq $4096, %rax
jne .L20
addq $4, %rdi
cmpq $4096, %rdi
je .L21
movl arr.2789(%rdi), %ebx
jmp .L24
.L21:
addl $1, %r8d
cmpl $500, %r8d
jne .L19
movl $.LC0, %edi
xorl %eax, %eax
call printf
addq $8, %rsp
.cfi_def_cfa_offset 24
xorl %eax, %eax
popq %rbx
.cfi_def_cfa_offset 16
popq %rbp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE20:
.size main, .-main
.local arr.2789
.comm arr.2789,4096,32
.section .note.GNU-stack,"",@progbits
这篇关于高效的整数比较功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!