OpenMP的风向标:我这样做对吗? [英] OpenMP benchmark: Am I doing it right?

查看:125
本文介绍了OpenMP的风向标:我这样做对吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一个程序,计算Fibonacci序列。我具有不同数量的线程执行它(例如1,2,10),但执行时间几乎保持相同(大约0.500mm秒)。

I made a program which calculates the fibonacci sequence. I executed it with different numbers of threads (eg. 1, 2, 10) but the execution time remained almost the same (about 0.500 seconds).

我使用在Ubuntu和GNU GCC编译器$ C $的cblock。在$ C $我的cblock链接的库 gomp 和定义的标记 -fopenmp 编译器。

I'm using CodeBlocks on Ubuntu and the GNU GCC compiler. In CodeBlocks I linked the library gomp and defined the flag -fopenmp for the compiler.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
    int i, n=1000, a[n];
    omp_set_num_threads(4);

    for(i=0; i<n; i++)
    {
        a[i] = 1 + (rand() % ( 50 - 1 + 1 ) );
    }

    fibo(n, a);

    return 0;
}

void fibo(int sizeN, int n[])
{
    int i;

    #pragma omp parallel for
    for(i=0; i<sizeN; i++)
    {
    int a = 0, b = 1, next, c;
        printf("n = %i\n", n[i]);
        for (c=0; c<=n[i]; c++)
        {
            if (c <= 1)
            {
                next = c;
            }
            else
            {
                next = a + b;
                a = b;
                b = next;
            }
            printf("%d\n",next);
        }
    }
}

有谁有一个想法?结果
我怎样才能确保OpenMP的真正起作用(安装)?

Does anybody have an idea?
How can I make sure that OpenMP really works (is installed)?

推荐答案

删除这两个的printf 语句。您的程序花费更多的时间发送文本到标准输出比计算的数字。由于标准输出基本上是串行的,你的程序在串行化的printf 语句。且不说中的printf 的开销本身 - 它来解析格式字符串,整型值转换为字符串,然后发送到标准输出流。

Remove both printf statements. Your program is spending more time sending text to the standard output than computing the numbers. Since the standard output is basically serial, your program serialises in the printf statements. Not to mention the overhead of printf itself - it has to parse the format string, convert the integer value to a string and then send that to the stdout stream.

遵守这些测量计时( N = 10000

OMP_NUM_THREADS=1 ./fibo.exe  0.10s user 0.42s system 40% cpu 1.305 total
                                         ^^^^^^^^^^^^
OMP_NUM_THREADS=2 ./fibo.exe  0.24s user 1.01s system 95% cpu 1.303 total
                                         ^^^^^^^^^^^^
OMP_NUM_THREADS=4 ./fibo.exe  0.36s user 1.87s system 163% cpu 1.360 total
                                         ^^^^^^^^^^^^

我已经删除了调用 OMP_SET_NUM_THREADS(),并使用 OMP_NUM_THREADS 相反,它允许运行程序具有不同的线程数而无需重新编译源。注意,该程序在系统模式下花费始终如一大约4倍的时间比在用户模式。这就是文字输出的开销

I've removed the call to omp_set_num_threads() and use OMP_NUM_THREADS instead, which allows to run the program with varying number of threads without recompiling the source. Note that the program spends consistently about 4x more time in system mode than in user mode. This is the overhead of that text output.

现在比较注释掉既的printf 语句相同(请注意,我不得不增加 N 百万时间)有意义的结果>:

Now compare the same with both printf statements commented out (note that I had to increase n to 1000000 in order to get meaningful results from time):

OMP_NUM_THREADS=1 ./fibo.exe  0.20s user 0.00s system 99% cpu 0.208 total
                                                              ^^^^^^^^^^^
OMP_NUM_THREADS=2 ./fibo.exe  0.21s user 0.00s system 179% cpu 0.119 total
                                                               ^^^^^^^^^^^
OMP_NUM_THREADS=4 ./fibo.exe  0.20s user 0.01s system 295% cpu 0.071 total
                                                               ^^^^^^^^^^^

现在的系统时间保持几乎为零,程序与2个线程1,75x更快2,93x有4个线程更快。高速化不是线性的,因为有在线程间的工作分配的轻微不平衡。如果数组填充有恒定值,则高速化几乎是线性的。

Now the system time stays almost zero and the program is 1,75x faster with 2 threads and 2,93x faster with 4 threads. The speed-up is not linear since there is a slight imbalance in the work distribution among the threads. If the array is filled with constant values, then the speed-up is almost linear.

这篇关于OpenMP的风向标:我这样做对吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆