为什么包含rand()的C ++ 11代码在使用多个线程时比使用一个线程要慢? [英] Why is this C++11 code containing rand() slower with multiple threads than with one?

查看:149
本文介绍了为什么包含rand()的C ++ 11代码在使用多个线程时比使用一个线程要慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用新的C ++ 11线程,但是我的简单测试具有糟糕的多核性能.作为一个简单的示例,该程序将一些平方的随机数相加.

I'm trying around on the new C++11 threads, but my simple test has abysmal multicore performance. As a simple example, this program adds up some squared random numbers.

#include <iostream>
#include <thread>
#include <vector>
#include <cstdlib>
#include <chrono>
#include <cmath>

double add_single(int N) {
    double sum=0;
    for (int i = 0; i < N; ++i){
        sum+= sqrt(1.0*rand()/RAND_MAX);
    }
    return sum/N;
}

void add_multi(int N, double& result) {
    double sum=0;
    for (int i = 0; i < N; ++i){
        sum+= sqrt(1.0*rand()/RAND_MAX);
    }
    result = sum/N;
}

int main() {
    srand (time(NULL));
    int N = 1000000;

    // single-threaded
    auto t1 = std::chrono::high_resolution_clock::now();
    double result1 = add_single(N);
    auto t2 = std::chrono::high_resolution_clock::now();
    auto time_elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    std::cout << "time single: " << time_elapsed << std::endl;

    // multi-threaded
    std::vector<std::thread> th;
    int nr_threads = 3;
    double partual_results[] = {0,0,0};
    t1 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < nr_threads; ++i) 
        th.push_back(std::thread(add_multi, N/nr_threads, std::ref(partual_results[i]) ));
    for(auto &a : th)
        a.join();
    double result_multicore = 0;
    for(double result:partual_results)
        result_multicore += result;
    result_multicore /= nr_threads;
    t2 = std::chrono::high_resolution_clock::now();
    time_elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    std::cout << "time multi: " << time_elapsed << std::endl;

    return 0;
}

在Linux和3core机器上使用'g ++ -std = c ++ 11 -pthread test.cpp'进行编译,典型结果是

Compiled with 'g++ -std=c++11 -pthread test.cpp' on Linux and a 3core machine, a typical result is

time single: 33
time multi: 565

因此,多线程版本的速度要慢一个数量级.我使用随机数和sqrt使示例变得不那么琐碎并且易于编译器优化,所以我没有主意.

So the multi threaded version is more than an order of magnitude slower. I've used random numbers and a sqrt to make the example less trivial and prone to compiler optimizations, so I'm out of ideas.

修改:

  1. 此问题随着N的增加而扩展,因此问题不在于运行时间短
  2. 创建线程的时间不是问题.排除它不会显着改变结果

哇,我发现了问题.确实是rand().我用等效的C ++ 11代替了它,现在运行时可以完美扩展.谢谢大家!

Wow I found the problem. It was indeed rand(). I replaced it with an C++11 equivalent and now the runtime scales perfectly. Thanks everyone!

推荐答案

在我的系统上,行为是相同的,但正如Maxim所言,rand不是线程安全的.当我将rand更改为rand_r时,多线程代码会比预期的更快.

On my system the behavior is same, but as Maxim mentioned, rand is not thread safe. When I change rand to rand_r, then the multi threaded code is faster as expected.

void add_multi(int N, double& result) {
double sum=0;
unsigned int seed = time(NULL);
for (int i = 0; i < N; ++i){
    sum+= sqrt(1.0*rand_r(&seed)/RAND_MAX);
}
result = sum/N;
}

这篇关于为什么包含rand()的C ++ 11代码在使用多个线程时比使用一个线程要慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆