从STL容器并行读取 [英] Parallel reads from STL containers

查看:138
本文介绍了从STL容器并行读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从多个并行线程读取STL容器是安全的。然而,性能是可怕的。为什么?

It is safe to read a STL container from multiple parallel threads. However, the performance is terrible. Why?

我创建一个小对象,在多重集中存储一些数据。这使得构造函数相当昂贵(在我的机器上约5微秒)。我在一个大的multiset中存储了数十万个小对象。处理这些对象是一个独立的业务,所以我在多核机器上运行的线程之间分割工作。每个线程从大的multiset读取它需要的对象,并处理它们。

I create a small object that stores some data in a multiset. This makes the constructors fairly expensive ( about 5 usecs on my machine. ) I store hundreds of thousands of the small objects in a large multiset. Processing these objects is an independent business, so I split the work between threads running on a multi-core machine. Each thread reads the objects it needs from the large multiset, and processes them.

问题是,从大multiset的读取不会并行进行。它看起来像在一个线程块中的读取在另一个。

The problem is that the reading from the big multiset does not proceed in parallel. It looks like the reads in one thread block the reads in the other.

下面的代码是最简单的我可以使它,仍然显示问题。首先,它创建一个大的multiset包含100,000个小对象,每个小对象包含自己的空multiset。然后它连续两次调用multiset复制构造函数,然后再并行两次。

The code below is the simplest I can make it and still show the problem. First it creates a large multiset containing 100,000 small objects each containing its own empty multiset. Then it calls the multiset copy constructor twice in series, then twice again in parallel.

一个概要分析工具显示,串行复制构造函数需要大约0.23秒,需要两倍的时间。

A profiling tool shows that the serial copy constructors take about 0.23 secs, whereas the parallel ones take twice as long. Somehow the parallel copies are interfering with each other.

// a trivial class with a significant ctor and ability to populate an associative container
class cTest
{
    multiset<int> mine;
    int id;
public:
    cTest( int i ) : id( i ) {}
    bool operator<(const cTest& o) const { return  id < o.id;  }
};
// add 100,000 objects to multiset
void Populate( multiset<cTest>& m )
{
    for( int k = 0; k < 100000; k++ )
    {
        m.insert(cTest(k));
    }
}
// copy construct multiset, called from mainline
void Copy( const multiset<cTest>& m )
{
    cRavenProfile profile("copy_main");
    multiset<cTest> copy( m );
}
// copy construct multiset, called from thread
void Copy2( const multiset<cTest>& m )
{
    cRavenProfile profile("copy_thread");
    multiset<cTest> copy( m );
}
int _tmain(int argc, _TCHAR* argv[])
{
    cRavenProfile profile("test");
    profile.Start();

    multiset<cTest> master;

    Populate( master );

    // two calls to copy ctor from mainline
    Copy( master );
    Copy( master );

    // call copy ctor in parrallel
    boost::thread* pt1 = new boost::thread( boost::bind( Copy2, master ));
    boost::thread* pt2 = new boost::thread( boost::bind( Copy2, master ));

    pt1->join();
    pt2->join();

    // display profiler results
    cRavenProfile print_profile;

    return 0;
}

以下是输出

            Scope   Calls       Mean (secs)     Total
      copy_thread        2      0.472498        0.944997
        copy_main        2      0.233529        0.467058


推荐答案

你提到了复制构造函数。我假设这些也从堆中分配内存?

You mentioned copy constructors. I assume that these also allocate memory from the heap?

在多个线程中分配堆内存是一个大错误

Allocating heap memory in multiple threads is a big mistake.

标准分配器可能是单个池锁定的实现。你需要不使用堆内存(堆栈分配)或者你需要一个线程优化的堆分配器。

The standard allocator is probably a single pool locked implementation. You need to either not use heap memory (stack allocate) or you need a thread optimized heap allocator.

这篇关于从STL容器并行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆