C ++中的多线程程序表现出与串行程序相同的性能 [英] Multithread program in C++ shows the same performance as a serial one

查看:103
本文介绍了C ++中的多线程程序表现出与串行程序相同的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只想用C ++写一个简单的程序,它创建两个线程,每个线程都用整数(0、1、4、9,...)的平方填充向量. 这是我的代码:

I just want to write a simple program in C++, which creates two threads and each of them fills vector by squares of integers (0, 1, 4, 9, ...). Here is my code:

#include <iostream>
#include <vector>
#include <functional>
#include <thread>
#include <time.h>

#define MULTI 1
#define SIZE 10000000

void fill(std::vector<unsigned long long int> &v, size_t n)
{
    for (size_t i = 0; i < n; ++i) {
        v.push_back(i * i);
    }
}

int main()
{
    std::vector<unsigned long long int> v1, v2;
    v1.reserve(SIZE);
    v2.reserve(SIZE);
    #if !MULTI
    clock_t t = clock();
    fill(v1, SIZE);
    fill(v2, SIZE);
    t = clock() - t;
    #else
    clock_t t = clock();
    std::thread first(fill, std::ref(v1), SIZE);
    fill(v2, SIZE);
    first.join();
    t = clock() - t;
    #endif
    std::cout << (float)t / CLOCKS_PER_SEC << std::endl;
    return 0;
}

但是,当我运行程序时,我发现串行版本与并行版本之间没有显着差异(有时并行版本显示的结果甚至更差). 知道会发生什么吗?

But when I run my program, I see, that there is no significant difference between the serial version and the parallel one (or sometimes parallel version shows even worse results). Any idea what happens?

推荐答案

当我在i7上使用MSVC2015执行代码时,我观察到:

When I execute your code with MSVC2015 on a i7, I observe:

  • 在调试模式下,多线程为14s,而单线程为26s.因此,速度几乎快一倍.结果是预期的.
  • 在发布模式下,多线程为0.3,而单线程为0.2,因此,它的运行速度较慢,如您所报告的那样.

这表明您的问题与以下事实有关:与创建线程的开销相比,优化的fill()太短了.

This suggest that your issue is related to the fact that the optimized fill() is too short compared to the overhead of creating a thread.

还请注意,即使在fill()中有足够的工作要做(例如,未优化的版本),多线程也不会将时间乘以2.多线程将提高多核处理器上每秒的总体吞吐量,但单独使用的每个线程的运行速度可能会比平常慢一点.

Note also that even when there is enought work to do in fill() (e.g. the unoptimized version), the multithread will not multiply the time by two. Multithreading will increase overall throughput per second on a multicore processor, but each thread taken separately might run a little bit slower than usual.

其他信息

additional information

多线程性能取决于很多因素,例如,处理器上的内核数量,测试期间运行的其他进程使用的内核以及

The multithreading performance depends on a lot of factors, among others, for example the number of cores on your processor, the cores used by other processes running during the test, and as remarked by doug in his comment, the profile of the multithreaded task (i.e. memory vs. computing).

为了说明这一点,这里是一个非正式基准测试的结果,该结果表明,内存密集型计算的单个线程吞吐量下降比浮点密集型计算的下降快得多,而全局吞吐量的增长则要慢得多(如果有的话):

To illustrate this, here the results of an informal benchmark that shows that decrease of individual thread throughput is much faster for memory intensive than for floating point intensive computations, and global throughput grows much slower (if at all):

为每个线程使用以下功能:

Using the following functions for each thread :

// computation intensive
void mytask(unsigned long long loops)
{
    volatile double x; 
    for (unsigned long long i = 0; i < loops; i++) {
        x = sin(sqrt(i) / i*3.14159);
    }
}

//memory intensive
void mytask2(vector<unsigned long long>& v, unsigned long long loops)
{
    for (unsigned long long i = 0; i < loops; i++) {
        v.push_back(i*3+10);
    }
}

这篇关于C ++中的多线程程序表现出与串行程序相同的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆