使用-O3或-Ofast编译您的基准测试代码或删除代码是否现实? [英] Is it realistic to use -O3 or -Ofast to compile your benchmark code or will it remove code?

查看:310
本文介绍了使用-O3或-Ofast编译您的基准测试代码或删除代码是否现实?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在编译 -O3 时的基准代码时,我对它在延迟中所做的区别留下了深刻的印象,所以我开始怀疑编译器是否通过去除作弊代码莫名其妙。有没有办法检查?我可以安全地使用 -O3 进行基准测试吗?预计速度提高15倍是否现实?

When compiling the benchmark code below with -O3 I was impressed by the difference it made in latency so i began to wonder whether the compiler is not "cheating" by removing code somehow. Is there a way to check for that? Am I safe to benchmark with -O3? Is it realistic to expect 15x gains in speed?

没有 -O3 的结果:平均值: 239 nanos Min:230纳米(900万次迭代)

结果 -O3 :平均值: 14 nanos, Min:12纳秒(900万次迭代)

Results without -O3: Average: 239 nanos Min: 230 nanos (9 million iterations)
Results with-O3: Average: 14 nanos, Min: 12 nanos (9 million iterations)

int iterations = stoi(argv[1]);
int load = stoi(argv[2]);

long long x = 0;

for(int i = 0; i < iterations; i++) {

    long start = get_nano_ts(); // START clock

    for(int j = 0; j < load; j++) {
        if (i % 4 == 0) {
            x += (i % 4) * (i % 8);
        } else {
            x -= (i % 16) * (i % 32);
        }
    }

    long end = get_nano_ts(); // STOP clock

    // (omitted for clarity)
}

cout << "My result: " << x << endl;

注意:我使用 clock_gettime 来衡量:

Note: I am using clock_gettime to measure:

long get_nano_ts() {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec * 1000000000 + ts.tv_nsec;
}


推荐答案

基准你认为你正在测量的东西。在内部循环的情况下:

It can be very difficult to benchmark what you think you are measuring. In the case of the inner loop:

for (int j = 0;  j < load;  ++j)
        if (i % 4 == 0)
                x += (i % 4) * (i % 8);
        else    x -= (i % 16) * (i % 32);

一个精明的编译器可能能够看穿并​​且将代码更改为如下所示:

A shrewd compiler might be able to see through that and change the code to something like:

 x = load * 174;   // example only

我知道这不是等价的,但有一些相当简单的表达式,可以替换该循环。

I know that isn't equivalent, but there is some fairly simple expression which can replace that loop.

确定的方法是使用 gcc -S 编译器选项并查看汇编代码。

The way to be sure is to use the gcc -S compiler option and look at the assembly code it generates.

这篇关于使用-O3或-Ofast编译您的基准测试代码或删除代码是否现实?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆