为什么在C ++ 11中运行多个线程无法获得任何性能改进? [英] Why can't I get any performance improvements by running multiple threads in C++11?

查看:127
本文介绍了为什么在C ++ 11中运行多个线程无法获得任何性能改进?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下具有简单功能的测试程序,该程序可以找到要在多个线程中运行的素数(仅作为示例).

I have the following test program with a simple function that finds primes which I am trying to run in multiple threads (just as an example).

#include <cstdio>
#include <iostream>
#include <ctime>
#include <thread>

void primefinder(void)
{
   int n = 300000;

   int i, j;
   int lastprime = 0;
   for(i = 2; i <= n; i++) {
      for(j = 2; j <= i; j++) {
           if((i % j) == 0) {
               if(i == j)
                   lastprime = i;
               else {
                   break;
               }
           }
      }
   }

   std::cout << "Prime: " << lastprime << std::endl;
}

int main(void)
{
   std::clock_t start;
   start = std::clock();

   std::thread t1(primefinder);
   t1.join();

   std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;

   start = std::clock();

   std::thread t2(primefinder);
   std::thread t3(primefinder);
   t2.join();
   t3.join();

   std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
   return 0;
}

如图所示,我在1个线程中运行一次该函数,然后在2个不同的线程中运行一次.我使用-O3和-pthread与g ++进行编译.我在Linux Mint 18上运行它.我有一个Core i5-4670.我知道这取决于操作系统,但我非常希望这些线程能够在一定程度上并行运行.当我运行程序时,顶部显示使用1个线程时100%CPU,使用2个线程时200%CPU.尽管如此,第二轮运行几乎要花两倍的时间.

As shown, I run the function once in 1 thread and then once in 2 different threads. I compile it with g++ using -O3 and -pthread. I am running it on Linux Mint 18. I have a Core i5-4670. I know it comes down to the OS but I would very much expect these threads to run in somewhat parallel. When I run the program, top shows 100% CPU when using 1 thread and 200% CPU when using 2 threads. Despite this the second run takes almost exactly twice as long.

在运行程序时,CPU没有执行其他任何操作.为什么没有并行执行此操作?

The CPU is doing nothing else while running the program. Why doesn't this get executed in parallel ?

我知道两个线程都在做完全相同的事情.我选择引物查找器功能只是作为令人尴尬的并行操作的示例,因此,当我在多个线程中运行它时,它的实时性将花费同样长的时间.

I know both threads are doing the exact same thing. I chose the primerfinder function simply as an example of something embarrassingly parallel so when I run it in multiple threads it should take just as long in real time.

推荐答案

使用std :: clock计时c ++中的并行程序非常具有欺骗性.安排程序时间时,我们关心两种类型的时间:挂墙时间和cpu时间.墙上的时间是我们所有人都习惯的(想想墙上的时钟).从本质上讲,CPU时间是您的程序使用了多少个CPU周期. std :: clock返回cpu时间(这就是为什么要除以CLOCKS_PER_SEC的原因),并且只有一个执行线程时,cpu时间才等于墙时间.如果一个程序可以100%并行运行(如您的程序),则cpu时间=(线程数)*(挂机时间).因此,看到的时间几乎是您期望的两倍.

Using std::clock to time a parallel program in c++ is very deceptive. There are two types of time that we care about when timing a program: wall time and cpu time. Wall time is what we are all used to (think clock on a wall). Cpu time is essentially how many cpu cycles your program used. std::clock returns cpu time (this is why you are dividing by CLOCKS_PER_SEC) and cpu time is only equal to wall time when there is one thread of execution. If a program can be run 100% in parallel (like your's), then cpu time = (number of threads)*(wall time). So seeing almost exactly twice as long is exactly what you would expect.

要测量挂墙时间(这是您要执行的操作),我不知道使用STL进行此操作的方法.您可以使用OpenMP或Boost对其进行测量.

For measuring wall time (which is what you want to do), I don't know of a way to do that using the STL. You can measure it using OpenMP or Boost.

omp_get_wtime()

加速计时器

由于您使用的是Linux,因此所使用的g ++版本内置的openmp支持可能会更多.

Since you are on linux, the version of g++ that you are using more than likely has openmp support built in.

#include <omp.h>

const double startTime = omp_get_wtime();
..... //Work goes here

const double time = omp_get_wtime() - startTime;

您将必须使用-fopenmp进行编译

You will have to compile with -fopenmp

正如johnbakers所指出的,chrono库确实具有壁钟

As johnbakers pointed out, the chrono library does have a wall clock

#include <chrono>

auto start = std::chrono::system_clock::now();
.... //Do some work

auto end = std::chrono::system_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Time: " << diff.count() << "(s)" << std::end;

与加速计时器的输出:

Boost: 121.685972s wall, 724.940000s user + 67.660000s system = 792.600000s CPU  (651.3%)
Chrono: 121.683(s)

这篇关于为什么在C ++ 11中运行多个线程无法获得任何性能改进?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆