OpenMP 与 gcc 编译器优化 [英] OpenMP vs gcc compiler optimizations

查看:74
本文介绍了OpenMP 与 gcc 编译器优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用通过正交计算 pi 值的示例学习 openmp.在串行中,我运行以下 C 代码:

I'm learning openmp using the example of computing the value of pi via quadature. In serial, I run the following C code:

double serial() {
    double step;
    double x,pi,sum = 0.0;

    step = 1.0 / (double) num_steps;

    for (int i = 0; i < num_steps; i++) {
        x = (i + 0.5) * step; // forward quadature
        sum += 4.0 / (1.0 + x*x);
    }
    pi = step * sum;

    return pi;
}

我将其与使用并行 for with reduction 的 omp 实现进行比较:

I'm comparing this to an omp implementation using a parallel for with reduction:

double SPMD_for_reduction() {
    double step;
    double pi,sum = 0.0;

    step = 1.0 / (double) num_steps;

    #pragma omp parallel for reduction (+:sum)
    for (int i = 0; i < num_steps; i++) {
        double x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x*x);
    }
    pi = step * sum;

    return pi;
}

对于 num_steps = 1,000,000,000,在 omp 的情况下有 6 个线程,我编译并计时:

For num_steps = 1,000,000,000, and 6 threads in the case of omp, I compile and time:

    double start_time = omp_get_wtime();
    serial();
    double end_time = omp_get_wtime();

    start_time = omp_get_wtime();
    SPMD_for_reduction();
    end_time = omp_get_wtime();

不使用 cc 编译器优化,运行时间约为 4 秒(串行)和 0.66 秒(omp).使用 -O3 标志,串行运行时间下降到.000001s"并且 omp 运行时间基本保持不变.这里发生了什么?是使用了向量指令,还是糟糕的代码或计时方法?如果是矢量化,为什么 omp 函数没有受益?

Using no cc compiler optimizations, the runtimes are around 4s (Serial) and .66s (omp). With the -O3 flag, serial runtime drops to ".000001s" and the omp runtime is mostly unchanged. What's going on here? Is it vector instructions being used, or is it poor code or timing method? If it's vectorization, why isn't the omp function benefiting?

我使用的机器使用的是现代 6 核 Xeon 处理器,这可能很有趣.

It may be of interest that the machine I am using is using a modern 6 core Xeon processor.

谢谢!

推荐答案

编译器比你聪明.对于串行版本,它能够检测到您的计算结果从未被使用.因此它完全抛弃了计算.

The compiler outsmarts you. For the serial version it is able to detect, that the result of your computation is never used. Therefore it throws out the computation completely.

double start_time = omp_get_wtime();
serial(); //<-- Computations not used.
double end_time = omp_get_wtime();

在 openMP 的情况下,编译器看不到函数体内的所有东西是否真的没有效果,所以为了安全起见,它保留了函数调用.

In the openMP case the compiler can not see if really everything inside the function body is without an effect, so to stay on the safe side it keeps the function call.

你当然可以写一些类似 double serial_pi = serial(); 的东西,并且在时间测量之外用变量 serial_pi 做一些虚拟的东西.这样,编译器将保留函数调用并执行您实际正在寻找的优化.

You can of course write something like double serial_pi = serial(); and outside of the time measurement do some dummy stuff with the variable serial_pi. This way the compiler will keep the function call and do the optimizations you are actually looking for.

这篇关于OpenMP 与 gcc 编译器优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆