使用OpenMP无法加速 [英] No speedup with OpenMP

查看：690 发布时间：2020/5/21 1:28:17 c++ multithreading parallel-processing openmp

本文介绍了使用OpenMP无法加速的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在与OpenMP一起使用，以获取具有接近线性加速的算法. 不幸的是，我注意到我无法达到理想的加速比.

I am working with OpenMP in order to obtain an algorithm with a near-linear speedup. Unfortunately I noticed that I could not get the desired speedup.

因此，为了理解我的代码中的错误，我编写了另一个简单的代码，只是仔细检查了加速原理上是否可以在我的硬件上获得.

So, in order to understand the error in my code, I wrote another code, an easy one, just to double-check that the speedup was in principle obtainable on my hardware.

这是我写的玩具示例:

#include <omp.h>
#include <cmath>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <cstdlib>
#include <fstream>
#include <sstream>
#include <iomanip>
#include <iostream>
#include <stdexcept>
#include <algorithm>
#include "mkl.h"

int main () {
      int number_of_threads = 1;
      int n = 600;
      int m = 50;
      int N = n/number_of_threads;
      int time_limit = 600;
      double total_clock = omp_get_wtime();
      int time_flag = 0;

      #pragma omp parallel num_threads(number_of_threads)
       {
          int thread_id = omp_get_thread_num();
          int iteration_number_local = 0;
          double *C = new double[n]; std::fill(C, C+n, 3.0);
          double *D = new double[n]; std::fill(D, D+n, 3.0);
          double *CD = new double[n]; std::fill(CD, CD+n, 0.0);

          while (time_flag == 0){
                for (int i = 0; i < N; i++)                     
                    for(int z = 0; z < m; z++)
                        for(int x = 0; x < n; x++)
                            for(int c = 0; c < n; c++){
                                CD[c] = C[z]*D[x];
                                C[z] = CD[c] + D[x];
                            }
                iteration_number_local++;
                if ((omp_get_wtime() - total_clock) >= time_limit) 
                    time_flag = 1; 
           }
       #pragma omp critical
       std::cout<<"I am "<<thread_id<<" and I got" <<iteration_number_local<<"iterations."<<std::endl;
       }
    }

我想再次强调一下，该代码只是尝试提高速度的一个玩具示例:当并行线程数增加时(因为N减少)，第一个for循环会变短.

I want to highlight again that this code is only a toy-example to try to see the speedup: the first for-cycle becomes shorter when the number of parallel threads increases (since N decreases).

但是，当我从1个线程扩展到2-4个线程时，迭代次数会按预期增加一倍；但是当我使用8-10-20线程时就不是这种情况:迭代次数不会随线程数线性增加.

However, when I go from 1 to 2-4 threads the number of iterations double up as expected; but this is not the case when I use 8-10-20 threads: the number of iterations does not increase linearly with the number of threads.

您能帮我吗?代码正确吗?我应该期待接近线性的加速吗?

Could you please help me with this? Is the code correct? Should I expect a near-linear speedup?

结果

运行上面的代码，我得到了以下结果.

Running the code above I got the following results.

1个线程:23次迭代.

1 thread: 23 iterations.

20个线程:每个线程397-401次迭代(而不是420-460).

20 threads: 397-401 iterations per thread (instead of 420-460).

使用OpenMP无法加速 [英] No speedup with OpenMP

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

使用OpenMP无法加速 [英] No speedup with OpenMP

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭