C ++ OpenMP Fibonacci:1个线程的执行速度比4个线程快得多 [英] C++ OpenMP Fibonacci: 1 thread performs much faster than 4 threads

查看:270
本文介绍了C ++ OpenMP Fibonacci:1个线程的执行速度比4个线程快得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图理解为什么以下内容在OpenMP的1个线程上比在4个线程上运行快得多.以下代码实际上是基于类似的问题: OpenMP递归任务,但是在尝试实现以下一项时建议的答案,我没有达到预期的加速,这表明我做错了什么(不确定是什么).在4个线程上运行以下线程时,人们是否比在1个线程上运行更好的速度?在4核上运行时,我的速度下降了10倍(我应该得到适度的加速,而不是明显的减速).

I'm trying to understand why the following runs much faster on 1 thread than on 4 threads on OpenMP. The following code is actually based on a similar question: OpenMP recursive tasks but when trying to implement one of the suggested answers, I don't get the intended speedup, which suggests I've done something wrong (and not sure what it is). Do people get better speed when running the below on 4 threads than on 1 thread? I'm getting a 10 times slowdown when running on 4 cores (I should be getting moderate speedup rather than significant slowdown).

int fib(int n)
  {
    if(n == 0 || n == 1)
        return n;
    if (n < 20) //EDITED CODE TO INCLUDE CUTOFF
        return fib(n-1)+fib(n-2); 
    int res, a, b;
    #pragma omp task shared(a)
    a = fib(n-1);
    #pragma omp task shared(b)
    b = fib(n-2);
    #pragma omp taskwait
    res = a+b;
    return res;
  }

int main(){
  omp_set_nested(1);
  omp_set_num_threads(4);
  double start_time = omp_get_wtime();
  #pragma omp parallel
  {
    #pragma omp single
    {
      cout << fib(25) << endl;
    }
  }
  double time = omp_get_wtime() - start_time;
  std::cout << "Time(ms): " << time*1000 << std::endl;
  return 0;
}

推荐答案

您尝试了很多吗?

在多线程中,需要一些时间来初始化CPU内核上的工作.对于较小的作业(在单个内核上完成得非常快),线程化会因此而减慢作业的速度.

In multi-threading, it takes some time to initialize work on CPU cores. For smaller jobs, which is done very fast on a single core, threading slows the job down because of this.

如果作业通常花费的时间比秒(而不是毫秒)长,那么多线程显示速度会提高.

Multi-threading shows increase in speed if the job normally takes time longer than second, not milliseconds.

还有另一个线程瓶颈.如果您的代码尝试创建太多线程(主要是通过递归方法),则可能会导致所有正在运行的线程延迟,从而导致大量的回退.

There is also another bottleneck for threading. If your codes try to create too many threads, mostly by recursive methods, this may cause a delay to all running threads causing a massive set back.

在此 OpenMP/Tasks Wiki页面中,它被提及并被手动剪切建议关闭.该函数必须有2个版本,当线程太深时,它将继续使用单线程进行递归.

In this OpenMP/Tasks wiki page, it is mentioned and a manual cut off is suggested. There need to be 2 versions of the function and when the thread goes too deep, it continues the recursion with single threading.

进入OMP区域之前,需要增加截止变量.

cutoff variable needs to be increased before entering OMP zone.

以下代码用于OP进行测试

the following code is for test purposes for the OP to test

#define CUTOFF 5
int fib_s(int n)
{
    if (n == 0 || n == 1)
        return n;
    int res, a, b;
    a = fib_s(n - 1);
    b = fib_s(n - 2);
    res = a + b;
    return res;
}
int fib_m(int n,int co)
{
    if (co >= CUTOFF) return fib_s(n);
    if (n == 0 || n == 1)
        return n;
    int res, a, b;
    co++;
#pragma omp task shared(a)
    a = fib_m(n - 1,co);
#pragma omp task shared(b)
    b = fib_m(n - 2,co);
#pragma omp taskwait
    res = a + b;
    return res;
}

int main()
{
    omp_set_nested(1);
    omp_set_num_threads(4);
    double start_time = omp_get_wtime();
#pragma omp parallel
    {
#pragma omp single
        {
            cout << fib_m(25,1) << endl;
        }
    }
    double time = omp_get_wtime() - start_time;
    std::cout << "Time(ms): " << time * 1000 << std::endl;
    return 0;
}


结果: 在CUTOFF值设置为10的情况下,计算第45个项不到8秒.


RESULT: With CUTOFF value set to 10, it was under 8 seconds to calculate 45th term.

co=1   14.5s
co=2    9.5s
co=3    6.4s
co=10   7.5s
co=15   7.0s 
co=20   8.5s
co=21 >18.0s
co=22 >40.0s

这篇关于C ++ OpenMP Fibonacci:1个线程的执行速度比4个线程快得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆