Mac和Linux下C ++程序(与GCC一起编译)的巨大性能差异 [英] Huge performance difference of a C++ program (compiled with GCC) under Mac and Linux

查看：122 发布时间：2020/5/21 19:08:36 c++ performance macos gcc operating-system

本文介绍了Mac和Linux下C ++程序(与GCC一起编译)的巨大性能差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近我用C ++写了一个小程序(说实话，它是更多的C +类)，并在Mac和Linux机器上测试了性能.

Recently I wrote a small program in C++ (well, to be really honest it's more C plus classes) and tested the performance on both a Mac and Linux machine.

即使硬件具有可比性，但性能与我的看法还是有很大出入.

Even though the hardware is comparable, the performance is so different than I really thing there is something strange going on.

首先是一些细节:

输入:约200MB压缩数据

Input: about 200MB compressed data

程序的操作:先解压缩数据，然后将其加载到内存中，然后执行许多数据访问以执行数据之间的联接.该程序是顺序的(没有其他线程或进程).

Operations of the program: it decompresses the data, then loads it in memory, and perform many data access to perform joins between the data. The program is sequential (no additional threads or processes).

输出:屏幕上将显示一些字符串

Output: some strings to be displayed on the screen

代码是在Linux计算机上使用GCC 4.8.1和在Mac计算机上使用GCC 4.8.2编译的.在这两种情况下，都使用以下参数调用编译器:

The code is compiled using GCC 4.8.1 in the Linux machine and GCC 4.8.2 in the Mac machine. In both cases the compiler is called with the arguments:

gcc -c -O3 -fPIC -MD -MF $(patsubst %.o,%.d,$@) //The last three arguments are to create the dependencies between the files

Mac(OS = mac mavericks 10.9)是一台Macbook pro，配备了2.3 GHz Intel Core I7(是四核)256KB L2高速缓存，6MB L3高速缓存，8GB DDR3 1600Mhz和256GB SSD磁盘.

The Mac (OS=mac mavericks 10.9) machine is a macbook pro equipped with a 2,3 GHz Intel core I7 (it's a quadcore) 256KB L2 cache, 6MB L3 cache, 8GB of DDR3 1600Mhz, and a 256 GB SSD disk.

Linux机器(内核2.6.32-358)具有Intel E5-2620 2.0 GHz(六核)16MB高速缓存，64GB DDR3 1600Mhz和256GB SSD磁盘.两台机器都应该使用Sandy Bridge架构(也许Mac是常春藤桥，但是无论如何这都不会有太大的不同).

The Linux machine (kernel 2.6.32-358) has a Intel E5-2620 2.0 GHz (it's a sixcore) 16MB cache, 64GB of DDR3 1600Mhz, and a 256 GB SSD disk. Both machines should use the Sandy Bridge architecture (maybe the Mac is ivy bridge, but anyway this shouldn't make a big difference).

现在，如果我在linux机器上启动程序，则需要217毫秒才能完成，而如果在Mac机器上启动，则需要132毫秒:这会使linux代码慢1.6倍！！

Now if I launch the program on the linux machine then it takes 217ms to finish while if I launch it in the Mac machine it takes 132ms: this makes the linux code 1.6 times slower!!

现在，我知道这两台机器具有不同的操作系统和硬件，但是我发现这种减速幅度太大，无法由这些因素来证明，我觉得背后必须有其他原因.

Now, I understand that the two machines have different OS and hardware, but I find a such slowdown too large to be justified by these factors, and I feel that there must be some other reason behind it.

请注意，在所有数据都加载到内存中之后才开始计时，我确定程序在这段时间内不会交换到磁盘上.因此，我可以排除问题出在SSD磁盘上.

Notice that this timings were being taken after all the data is loaded in memory, and I'm sure the program does not swap to disk during this time. Therefore, I can exclude that the problem is the SSD disk.

现在，我真的不知道是什么原因导致了这种放缓?内存基本上是相等的，而CPU只是慢一点.

Now, I really don't know what could have caused such slowdown? The memory is basically equivalent, while the CPU is only a bit slower.

难道GCC在linux上比Mac产生了明显更差的代码?

Could it be that GCC produced a sensibly worse code on a linux than a mac?

难道是Linux操作系统明显比Mac差?

Could it be that the Linux OS is sensibly worse than the Mac?

我发现很难相信这两件事.有帮助吗?

I find both things hard to believe. Any help?

我意识到我没有提到我如何进行计时:嗯，我使用了boost chrono库，并且我只测量调用主函数所需的时间.像这样:

I realized that I didn't mention how I do the timings: well, I use the boost chrono library, and I measure only the time necessary to invoke the main function. Something like:

time = now();
function();
duration = now() - time;
print(duration);

经过一些测试，我们设法通过一个更简单(更愚蠢)的程序重现了性能差异:

After some tests, we managed to reproduce the difference of performance with a much simpler (and silly) program:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

char in1[10000000];
char in2[10000000];

static inline uint64_t rdtscp (void) {
    uint64_t low, high;
    uint64_t aux;

    __asm__ __volatile__ (
                    ".byte 0x0f,0x01,0xf9"
                    : "=a" (low), "=d" (high), "=c" (aux)
                    );

    return low | (high << 32);
}

int main(int argc, char** argv) {

    uint64_t counter = rdtscp();

    for(int i = 0; i < 10000000; ++i) {
            in1[i] = (char)i * 200;
            in2[i] = (char)i * 100;
    }

    int joins = 0;
    for(int j = 0; j < 10000000; ++j) {
            int el = in1[j];
            for(int m = 0; m < 10000000; m++) {
                    if (in2[m] == el) {
                            joins++;
                            break;
                    }
            }
    }
    printf("Joins %d Cycles total %ld\n", joins, (rdtscp() - counter));

    return 0;
}

请不要看程序的操作.他们没有意义.我们试图重现的是对内存的访问序列以及对它们的简单操作.

Please don't look at the operations of the program. They make little sense. What we tried to reproduce is a sequence of access to memory and simple operations with them.

我们在Mac上启动了该程序，输出为:

We launched this program on the Mac and the output was:

Joins 10000000 Cycles total 589015641

在linux机器上是:

While on the linux machine it was:

Joins 10000000 Cycles total 838198832

很明显，Linux版本需要更多的CPU周期，这可能是访问内存所必需的.现在的问题是:为什么内存访问速度较慢?

Clearly the linux version requires many more CPU cycles, which are probably needed to access the memory. Now the question is: why is the memory access slower?

一个原因可能是in1和in2不适合CPU高速缓存，这需要访问某些RAM.正如Roy Longbottom所指出的，Linux中的内存确实是ECC，这可能是性能较低的原因.如果我们将其与稍低的CPU速度(沙地和常春藤桥之间的区别)相结合，那么我们可能会对这种区别有很好的解释.

One reason could be that in1 and in2 don't fit in the CPU caches, and this requires some RAM accesses. As pointed by Roy Longbottom the memory in linux is indeed ECC and this could be the reason behing the lower performance. If we combine this with the slightly lower CPU speed, the difference between sandy and ivy bridge then we probably have a good explanation for such difference.

无论如何，谢谢大家的提示！

Anyway, thanks all for the tips!

Mac和Linux下C ++程序(与GCC一起编译)的巨大性能差异 [英] Huge performance difference of a C++ program (compiled with GCC) under Mac and Linux

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

Mac和Linux下C ++程序(与GCC一起编译)的巨大性能差异 [英] Huge performance difference of a C++ program (compiled with GCC) under Mac and Linux

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭