64 位可执行文件的运行速度比 32 位版本慢 [英] 64-bit executable runs slower than 32-bit version

查看:45
本文介绍了64 位可执行文件的运行速度比 32 位版本慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 64 位的 Ubuntu 13.04 系统.我很想知道 32 位应用程序如何在 64 位系统上与 64 位应用程序进行比较,因此我将以下 C 程序编译为 32 位和 64 位可执行文件,并记录它们执行的时间.我使用 gcc 标志为 3 种不同的架构进行编译:

I have a 64-bit Ubuntu 13.04 system. I was curious to see how 32-bit applications perform against 64-bit applications on a 64-bit system so I compiled the following C program as 32-bit and 64-bit executable and recorded the time they took to execute. I used gcc flags to compile for 3 different architectures:

  • -m32:Intel 80386 架构(int、long、指针都设置为 32位(ILP32))
  • -m64:AMD 的 x86-64 架构(int 32 位;long,指针 64 位 (LP64))
  • -mx32:AMD 的 x86-64 架构(int、long、指针都设置为 32 位(ILP32),但 CPU 处于 long 模式,有 16 个 64b 寄存器,并且注册调用 ABI)
  • -m32: Intel 80386 architecture (int, long, pointer all set to 32 bits (ILP32))
  • -m64: AMD's x86-64 architecture (int 32 bits; long, pointer 64 bits (LP64))
  • -mx32: AMD's x86-64 architecture (int, long, pointer all set to 32 bits (ILP32), but CPU in long mode with sixteen 64b registers, and register call ABI)
// this program solves the
// project euler problem 16.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <assert.h>
#include <sys/time.h>

int sumdigit(int a, int b);

int main(void) {
    int a = 2;
    int b = 10000;
    struct timeval start, finish;
    unsigned int i;
    gettimeofday(&start, NULL);
    for(i = 0; i < 1000; i++)
        (void)sumdigit(a, b);
    gettimeofday(&finish, NULL);
    printf("Did %u calls in %.4g seconds
", 
            i, 
            finish.tv_sec - start.tv_sec + 1E-6 * (finish.tv_usec - start.tv_usec));
    return 0;
}

int sumdigit(int a, int b) {
    // numlen = number of digit in a^b
    // pcount = power of 'a' after ith iteration
    // dcount = number of digit in a^(pcount)

    int numlen = (int) (b * log10(a)) + 1;
    char *arr = calloc(numlen, sizeof *arr);
    int pcount = 0;
    int dcount = 1;
    arr[numlen - 1] = 1;
    int i, sum, carry;

    while(pcount < b) {
        pcount += 1;

        sum = 0; 
        carry = 0;

        for(i = numlen - 1; i >= numlen - dcount; --i) {
            sum = arr[i] * a + carry;
            carry = sum / 10;
            arr[i] = sum % 10;
        }

        while(carry > 0) {
            dcount += 1;
            sum = arr[numlen - dcount] + carry;
            carry = sum / 10;
            arr[numlen - dcount] = sum % 10;
        } 
    }

    int result = 0;
    for(i = numlen - dcount; i < numlen; ++i)
        result += arr[i];

    free(arr);
    return result;
}

我用来获取不同可执行文件的命令:

The commands I used to get different executable:

gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_x32 -lm -mx32
gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_32 -lm -m32
gcc -std=c99 -Wall -Wextra -Werror -pedantic -pedantic-errors pe16.c -o pe16_64 -lm

这是我得到的结果:

ajay@ajay:c$ ./pe16_x32
Did 1000 calls in 89.19 seconds

ajay@ajay:c$ ./pe16_32
Did 1000 calls in 88.82 seconds

ajay@ajay:c$ ./pe16_64
Did 1000 calls in 92.05 seconds

为什么 64 位版本的运行速度比 32 位版本慢?我读到 64 位体系结构改进了指令集,与 32 位体系结构相比,通用寄存器增加了两倍,可以进行更多优化.我什么时候可以在 64 位系统上获得更好的性能?

Why does the 64-bit version runs slower than the 32-bit one? I read that the 64-bit architecture has improved instruction set and twice more general purpose registers compared to the 32-bit architecture which allows for more optimizations. When can I expect a better performance on a 64-bit system?

编辑我使用 -O3 标志打开优化,现在结果是:

Edit I turned on the optimization using -O3 flag and now the results are:

ajay@ajay:c$ ./pe16_x32
Did 1000 calls in 38.07 seconds

ajay@ajay:c$ ./pe16_32
Did 1000 calls in 38.32 seconds

ajay@ajay:c$ ./pe16_64
Did 1000 calls in 38.27 seconds

推荐答案

比较没有优化的代码的性能是毫无意义的.如果您关心性能,您将永远只使用优化的代码.

Comparing performance of code without optimisations is rather pointless. If you care about performance, you'll only ever use optimised code.

当您启用优化时,您会发现性能差异可以忽略不计.这是可以预料的.您执行的操作都是基于整数的操作,在所有情况下都使用相同大小的数据.由于 32 位和 64 位代码在相同的整数硬件单元上运行,因此您应该期待相同的性能.

And when you enable optimisations you find that the performance differences are negligible. That is to be expected. The operations you perform are all integer based operations, using data of the same size in all cases. Since the 32 bit and 64 bit code run on the same integer hardware units you should expect the same performance.

您没有使用任何浮点运算,这是由于浮点硬件单元不同(x64 使用 SSE,x86 可能使用 x87)导致 32 位和 64 位代码之间有时存在差异的一个领域.

You are not using any floating point operations which is one area where there are sometimes differences between 32 and 64 bit code due to different floating point hardware units (x64 uses SSE, x86 may use x87).

简而言之,结果完全符合预期.

In short, the results are exactly as expected.

这篇关于64 位可执行文件的运行速度比 32 位版本慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆