高效的整数比较功能 [英] Efficient integer compare function

查看：122 发布时间：2016/7/18 19:47:17 c assembly x86 inline-assembly

本文介绍了高效的整数比较功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

的比较函数是一个有两个参数 A 和 B 键，返回描述它们的顺序整数。如果 A 比小b ，结果是一些负整数。如果 A 比大b ，结果是正整数。否则， A 和 B 都是平等的，并且结果是零。

The compare function is a function that takes two arguments a and b and returns an integer describing their order. If a is smaller than b, the result is some negative integer. If a is bigger than b, the result is some positive integer. Otherwise, a and b are equal, and the result is zero.

此功能常被用来参数化排序和从标准库搜索算法

This function is often used to parametrize sorting and searching algorithms from standard libraries.

实施比较的字符功能相当容易;您只需减去参数：

Implementing the compare function for characters is quite easy; you simply subtract the arguments:

int compare_char(char a, char b)
{
    return a - b;
}

此工作，因为两个字符之间的差，一般假定为适合的整数。（注意，这个假设不成立的系统，其中的sizeof（字符）==的sizeof（INT））

This works because the difference between two characters is generally assumed to fit into an integer. (Note that this assumption does not hold for systems where sizeof(char) == sizeof(int).)

这招不能工作比较整数，因为两个整数之间的差异一般不适合到一个整数。例如， INT_MAX - （-1）= INT_MIN 提示 INT_MAX 比小 - 1 （技术上，溢出导致不确定的行为，但让我们假设模运算）。

This trick cannot work to compare integers, because the difference between two integers generally does not fit into an integer. For example, INT_MAX - (-1) = INT_MIN suggests that INT_MAX is smaller than -1 (technically, the overflow leads to undefined behavior, but let's assume modulo arithmetic).

那么，如何才能实现有效的整数比较功能？这是我第一次尝试：

So how can we implement the compare function efficiently for integers? Here is my first attempt:

int compare_int(int a, int b)
{
    int temp;
    int result;
    __asm__ __volatile__ (
        "cmp %3, %2 \n\t"
        "mov $0, %1 \n\t"

        "mov $1, %0 \n\t"
        "cmovg %0, %1 \n\t"

        "mov $-1, %0 \n\t"
        "cmovl %0, %1 \n\t"
    : "=r"(temp), "=r"(result)
    : "r"(a), "r"(b)
    : "cc");
    return result;
}

是否可以在不到6指令做了什么？是否有更有效不太直接的方式？

Can it be done in less than 6 instructions? Is there a less straightforward way that is more efficient?

推荐答案

下面一直被证明是对我来说是非常有效的：

The following has always proven to be fairly efficient for me:

return (a < b) ? -1 : (a > b);

使用的gcc -O2 -S ，这编译为以下五个指令：

With gcc -O2 -S, this compiles down to the following five instructions:

xorl    %edx, %edx
cmpl    %esi, %edi
movl    $-1, %eax
setg    %dl
cmovge  %edx, %eax

作为后续 Ambroz Bizjak的好伴侣回答的，我不相信，他的测试程序相同的程序集code什么张贴以上。而且，当我更仔细地研究了编译器的输出，我注意到，作为被张贴在任何我们的答案编译器不产生相同的指令。所以，我把他的测试程序，修改手工装配输出以匹配我们发布，并比较所产生的时间。 看来两个版本比较大致相同。

As a follow-up to Ambroz Bizjak's excellent companion answer, I was not convinced that his program tested the same assembly code what was posted above. And, when I was studying the compiler output more closely, I noticed that the compiler was not generating the same instructions as was posted in either of our answers. So, I took his test program, hand modified the assembly output to match what we posted, and compared the resulting times. It seems the two versions compare roughly identically.

./opt_cmp_branchless: 0m1.070s
./opt_cmp_branch:     0m1.037s

我张贴的每个程序的装配全让别人也可以尝试同样的实验，并确认或反驳我的观察。

I am posting the assembly of each program in full so that others may attempt the same experiment, and confirm or contradict my observation.

以下是用 cmovge 指令（（版本A＆LT; B）-1：（A＆GT; B））

        .file   "cmp.c"
        .text
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d=0\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
.LFB20:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        pushq   %rbx
        .cfi_def_cfa_offset 24
        .cfi_offset 3, -24
        movl    $arr.2789, %ebx
        subq    $8, %rsp
        .cfi_def_cfa_offset 32
.L9:
        leaq    4(%rbx), %rbp
.L10:
        call    rand
        movb    %al, (%rbx)
        addq    $1, %rbx
        cmpq    %rbx, %rbp
        jne     .L10
        cmpq    $arr.2789+4096, %rbp
        jne     .L9
        xorl    %r8d, %r8d
        xorl    %esi, %esi
        orl     $-1, %edi
.L12:
        xorl    %ebp, %ebp
        .p2align 4,,10
        .p2align 3
.L18:
        movl    arr.2789(%rbp), %ecx
        xorl    %eax, %eax
        .p2align 4,,10
        .p2align 3
.L15:
        movl    arr.2789(%rax), %edx
        xorl    %ebx, %ebx
        cmpl    %ecx, %edx
        movl    $-1, %edx
        setg    %bl
        cmovge  %ebx, %edx
        addq    $4, %rax
        addl    %edx, %esi
        cmpq    $4096, %rax
        jne     .L15
        addq    $4, %rbp
        cmpq    $4096, %rbp
        jne     .L18
        addl    $1, %r8d
        cmpl    $500, %r8d
        jne     .L12
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        addq    $8, %rsp
        .cfi_def_cfa_offset 24
        xorl    %eax, %eax
        popq    %rbx
        .cfi_def_cfa_offset 16
        popq    %rbp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE20:
        .size   main, .-main
        .local  arr.2789
        .comm   arr.2789,4096,32
        .section        .note.GNU-stack,"",@progbits

版本低于使用方法网点（（A＆GT; B） - （A＆LT; B））

        .file   "cmp.c"
        .text
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d=0\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
.LFB20:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        pushq   %rbx
        .cfi_def_cfa_offset 24
        .cfi_offset 3, -24
        movl    $arr.2789, %ebx
        subq    $8, %rsp
        .cfi_def_cfa_offset 32
.L9:
        leaq    4(%rbx), %rbp
.L10:
        call    rand
        movb    %al, (%rbx)
        addq    $1, %rbx
        cmpq    %rbx, %rbp
        jne     .L10
        cmpq    $arr.2789+4096, %rbp
        jne     .L9
        xorl    %r8d, %r8d
        xorl    %esi, %esi
.L19:
        movl    %ebp, %ebx
        xorl    %edi, %edi
        .p2align 4,,10
        .p2align 3
.L24:
        movl    %ebp, %ecx
        xorl    %eax, %eax
        jmp     .L22
        .p2align 4,,10
        .p2align 3
.L20:
        movl    arr.2789(%rax), %ecx
.L22:
        xorl    %edx, %edx
        cmpl    %ebx, %ecx
        setg    %cl
        setl    %dl
        movzbl  %cl, %ecx
        subl    %ecx, %edx
        addl    %edx, %esi
        addq    $4, %rax
        cmpq    $4096, %rax
        jne     .L20
        addq    $4, %rdi
        cmpq    $4096, %rdi
        je      .L21
        movl    arr.2789(%rdi), %ebx
        jmp     .L24
.L21:
        addl    $1, %r8d
        cmpl    $500, %r8d
        jne     .L19
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        addq    $8, %rsp
        .cfi_def_cfa_offset 24
        xorl    %eax, %eax
        popq    %rbx
        .cfi_def_cfa_offset 16
        popq    %rbp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE20:
        .size   main, .-main
        .local  arr.2789
        .comm   arr.2789,4096,32
        .section        .note.GNU-stack,"",@progbits

这篇关于高效的整数比较功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

高效的整数比较功能 [英] Efficient integer compare function

问题描述

推荐答案

相关文章

.NET Framework最新文章

热门教程

热门工具

登录关闭

高效的整数比较功能 [英] Efficient integer compare function

问题描述

推荐答案

相关文章

.NET Framework最新文章

热门教程

热门工具

登录 关闭

登录关闭