在C中,为什么用"signed int"表示?比"unsigned int"更快? [英] In C, why is "signed int" faster than "unsigned int"?

查看:134
本文介绍了在C中,为什么用"signed int"表示?比"unsigned int"更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C语言中,为什么signed intunsigned int快?是的,我知道在此网站上已经多次询问并回答了此问题(下面的链接).但是,大多数人说没有区别.我编写了代码,无意间发现了明显的性能差异.

In C, why is signed int faster than unsigned int? True, I know that this has been asked and answered multiple times on this website (links below). However, most people said that there is no difference. I have written code and accidentally found a significant performance difference.

为什么我的代码的未签名"版本比签名"版本慢(即使测试相同的数字)? (我有一个x86-64 Intel处理器.)

Why would the "unsigned" version of my code be slower than the "signed" version (even when testing the same number)? (I have a x86-64 Intel processor).

相似链接

  • Faster comparing signed than unsigned ints
  • performance of unsigned vs signed integers

编译命令: gcc -Wall -Wextra -pedantic -O3 -Wl,-O3 -g0 -ggdb0 -s -fwhole-program -funroll-loops -pthread -pipe -ffunction-sections -fdata-sections -std=c11 -o ./test ./test.c && strip --strip-all --strip-unneeded --remove-section=.note --remove-section=.comment ./test

注意:如果我在所有数字上明确声明signed int,没有什么区别.

NOTE: There is no difference if I explicitly declare signed int on all numbers.

int isprime(int num) {
    // Test if a signed int is prime
    int i;
    if (num % 2 == 0 || num % 3 == 0)
        return 0;
    else if (num % 5 == 0 || num % 7 == 0)
        return 0;
    else {
        for (i = 11; i < num; i += 2) {
            if (num % i == 0) {
                if (i != num)
                    return 0;
                else
                    return 1;
            }
        }
    }
    return 1;
}

unsigned int版本

unsigned int version

int isunsignedprime(unsigned int num) {
    // Test if an unsigned int is prime
    unsigned int i;
    if (num % (unsigned int)2 == (unsigned int)0 || num % (unsigned int)3 == (unsigned int)0)
        return 0;
    else if (num % (unsigned int)5 == (unsigned int)0 || num % (unsigned int)7 == (unsigned int)0)
        return 0;
    else {
        for (i = (unsigned int)11; i < num; i += (unsigned int)2) {
            if (num % i == (unsigned int)0) {
                if (i != num)
                    return 0;
                else
                    return 1;
            }
        }
    }
    return 1;
}

使用以下代码在文件中对此进行测试:

Test this in a file with the below code:

int main(void) {
    printf("%d\n", isprime(294967291));
    printf("%d\n", isprime(294367293));
    printf("%d\n", isprime(294967293));
    printf("%d\n", isprime(294967241)); // slow
    printf("%d\n", isprime(294967251));
    printf("%d\n", isprime(294965291));
    printf("%d\n", isprime(294966291));
    printf("%d\n", isprime(294963293));
    printf("%d\n", isprime(294927293));
    printf("%d\n", isprime(294961293));
    printf("%d\n", isprime(294917293));
    printf("%d\n", isprime(294167293));
    printf("%d\n", isprime(294267293));
    printf("%d\n", isprime(294367293)); // slow
    printf("%d\n", isprime(294467293));
    return 0;
}

结果(time ./test):

Signed - real 0m0.949s
Unsigned - real 0m1.174s

推荐答案

您的问题确实令人着迷,因为未签名版本始终会产生慢10%至20%的代码.但是代码中存在多个问题:

Your question is genuinely intriguing as the unsigned version consistently produces code that is 10 to 20% slower. Yet there are multiple problems in the code:

  • 这两个函数都为2357返回0,这是不正确的.
  • 测试if (i != num) return 0; else return 1;完全没有用,因为循环主体仅针对i < num运行.这样的测试对于小型主要测试很有用,但使用特殊大小写并没有真正的用途.
  • 未签名版本中的强制类型转换是多余的.
  • 向终端生成文本输出的基准测试代码不可靠,您应该使用clock()函数对CPU密集型函数进行计时,而无需进行任何干预I/O.
  • 由于循环运行num / 2次而不是sqrt(num)次,因此用于素数测试的算法完全无效.
  • Both functions return 0 for 2, 3, 5 and 7, which is incorrect.
  • The test if (i != num) return 0; else return 1; is completely useless as the loop body is only run for i < num. Such a test would be useful for the small prime tests but special casing them is not really useful.
  • the casts in the unsigned version are redundant.
  • benchmarking code that produces textual output to the terminal is unreliable, you should use the clock() function to time CPU intensive functions without any intervening I/O.
  • the algorithm for prime testing is utterly inefficient as the loop runs num / 2 times instead of sqrt(num).

让我们简化代码并运行一些精确的基准测试

Let's simplify the code and run some precise benchmarks:

#include <stdio.h>
#include <time.h>

int isprime_slow(int num) {
    if (num % 2 == 0)
        return num == 2;
    for (int i = 3; i < num; i += 2) {
        if (num % i == 0)
            return 0;
    }
    return 1;
}

int unsigned_isprime_slow(unsigned int num) {
    if (num % 2 == 0)
        return num == 2;
    for (unsigned int i = 3; i < num; i += 2) {
        if (num % i == 0)
            return 0;
    }
    return 1;
}

int isprime_fast(int num) {
    if (num % 2 == 0)
        return num == 2;
    for (int i = 3; i * i <= num; i += 2) {
        if (num % i == 0)
            return 0;
    }
    return 1;
}

int unsigned_isprime_fast(unsigned int num) {
    if (num % 2 == 0)
        return num == 2;
    for (unsigned int i = 3; i * i <= num; i += 2) {
        if (num % i == 0)
            return 0;
    }
    return 1;
}

int main(void) {
    int a[] = {
        294967291, 0, 294367293, 0, 294967293, 0, 294967241, 1, 294967251, 0,
        294965291, 0, 294966291, 0, 294963293, 0, 294927293, 1, 294961293, 0,
        294917293, 0, 294167293, 0, 294267293, 0, 294367293, 0, 294467293, 0,
    };
    struct testcase { int (*fun)(); const char *name; int t; } test[] = {
        { isprime_slow, "isprime_slow", 0 },
        { unsigned_isprime_slow, "unsigned_isprime_slow", 0 },
        { isprime_fast, "isprime_fast", 0 },
        { unsigned_isprime_fast, "unsigned_isprime_fast", 0 },
    };

    for (int n = 0; n < 4; n++) {
        clock_t t = clock();
        for (int i = 0; i < 30; i += 2) {
            if (test[n].fun(a[i]) != a[i + 1]) {
                printf("%s(%d) != %d\n", test[n].name, a[i], a[i + 1]);
            }
        }
        test[n].t = clock() - t;
    }
    for (int n = 0; n < 4; n++) {
        printf("%21s: %4d.%03dms\n", test[n].name, test[n].t / 1000), test[n].t % 1000);
    }
    return 0;
}

在OS/X上用clang -O2编译的代码产生以下输出:

The code compiled with clang -O2 on OS/X produces this output:

         isprime_slow:  788.004ms
unsigned_isprime_slow:  965.381ms
         isprime_fast:    0.065ms
unsigned_isprime_fast:    0.089ms

这些计时与OP在不同系统上观察到的行为一致,但显示出由更有效的迭代测试引起的显着改进: 10000倍

These timings are consistent with the OP's observed behavior on a different system, but show the dramatic improvement caused by the more efficient iteration test: 10000 times faster!

关于问题为什么带unsigned的函数会更慢?,让我们看一下生成的代码(

Regarding the question Why is the function slower with unsigned?, let's look at the generated code (gcc 7.2 -O2):

isprime_slow(int):
        ...
.L5:
        movl    %edi, %eax
        cltd
        idivl   %ecx
        testl   %edx, %edx
        je      .L1
.L4:
        addl    $2, %ecx
        cmpl    %esi, %ecx
        jne     .L5
.L6:
        movl    $1, %edx
.L1:
        movl    %edx, %eax
        ret

unsigned_isprime_slow(unsigned int):
        ...
.L19:
        xorl    %edx, %edx
        movl    %edi, %eax
        divl    %ecx
        testl   %edx, %edx
        je      .L22
.L18:
        addl    $2, %ecx
        cmpl    %esi, %ecx
        jne     .L19
.L20:
        movl    $1, %eax
        ret
       ...
.L22:
        xorl    %eax, %eax
        ret

内部循环非常相似,相同数量的指令,相似的指令.但是,这里有一些可能的解释:

The inner loops are very similar, same number of instructions, similar instructions. Here are however some potential explanations:

  • cltdeax寄存器的符号扩展到edx寄存器,这可能会导致指令延迟,因为eax被紧随其后的指令movl %edi, %eax修改了.但这会使签名版本比未签名版本慢,而不是更快.
  • 对于无符号版本,循环的初始指令可能未对齐,但是不太可能,因为更改源代码中的顺序不会影响时序.
  • 尽管有符号和无符号除法操作码的寄存器内容相同,但idivl指令所花费的周期可能少于divl指令.确实,有符号除法的运算精度要比无符号除法的精度低一点,但是对于这种小的更改,差异似乎很大.
  • 我怀疑在idivl的芯片实现上要付出更多的努力,因为有符号除法比无符号除法更常见(根据英特尔多年的编码统计数据来衡量).
  • 正如rcgldr所评论的那样,在查看Intel进程的指令表时,对于Ivy Bridge,DIV 32位需要10个微操作(19到27个周期),IDIV 9个微操作(19到26个周期).基准时间与这些时间一致.额外的微操作可能是由于DIV(64/32位)相对于IDIV(63/31位)的操作数更长.
  • cltd extends the sign of the eax register into the edx register, which may be causing an instruction delay because eax is modified by the immediately preceeding instruction movl %edi, %eax. Yet this would make the signed version slower than the unsigned one, not faster.
  • the loops' initial instructions might be misaligned for the unsigned version, but it is unlikely as changing the order in the source code has no effect on the timings.
  • Although the register contents are identical for the signed and unsigned division opcodes, it is possible that the idivl instruction take fewer cycles than the divl instruction. Indeed the signed division operates on one less bit of precision than the unsigned division, but the difference seems quite large for this small change.
  • I suspect more effort was put into the silicon implementation of idivl because signed divisions are more common that unsigned divisions (as measured by years of coding statistics at Intel).
  • as commented by rcgldr, looking at instruction tables for Intel process, for Ivy Bridge, DIV 32 bit takes 10 micro ops, 19 to 27 cycles, IDIV 9 micro ops, 19 to 26 cycles. The benchmark times are consistent with these timings. The extra micro-op may be due to the longer operands in DIV (64/32 bits) as opposed to IDIV (63/31 bits).

这个令人惊讶的结果应该教给我们一些教训:

This surprising result should teach us a few lessons:

  • 优化是一件困难的事,要谦虚而拖延.
  • 正确性经常被优化所破坏.
  • 选择一个更好的算法远胜于优化.
  • 始终使用基准代码,不要相信您的直觉.

这篇关于在C中,为什么用"signed int"表示?比"unsigned int"更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆