将C ++ 11 thread_local与其他并行库一起使用 [英] Using C++11 thread_local with other parallel libraries

查看:153
本文介绍了将C ++ 11 thread_local与其他并行库一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的问题,C ++ 11 thread_local是否可以与其他并行模型一起使用.

I have a simple question, can C++11 thread_local be used with other parallel models.

例如,在使用OpenMP或Intel TBB并行执行任务时,是否可以在函数中使用它.

For example, can I use it within a function while using OpenMP or Intel TBB to parallel the tasks.

大多数此类并行编程模型会将硬件线程隐藏在更高级别的API之后.我的直觉是,他们所有人都必须将其任务调度程序映射到硬件线程中.我能期望C ++ 11 thread_local会产生预期的效果吗?

Most such parallel programming models hide hardware threads behind higher level API. My instinct is that they all have to map their task schedulers into hardware threads. Can I expect that C++11 thread_local will have expected effect.

一个简单的例子是

void func ()
{
    static thread_local some_var = init_val;
#pragma omp parallel for [... clauses ...]
    for (int i = 0; i < N; ++i) {
        // access some_var somewhere within the loop
    }
}

我可以期望每个 OpenMP线程将访问其自己的some_var副本吗?

Can I expect that each OpenMP thread will access its own copy of some_var?

我知道大多数并行编程模型都有自己的线程本地存储构造.但是,具有使用C ++ 11 thread_local(或编译器特定的关键字)的能力很好.例如,考虑这种情况

I know that most parallel programming models have their own constructs for thread-local storage. However, having the ability to use C++11 thread_local (or compiler specific keyword) is nice. For example, consider the situation

// actually may implemented with a class with operator()
void func ()
{
     static thread_local some_var;
     // a quite complex function
}

void func_omp (int N)
{
#pragma omp for [... clauses ...]
    for (int i = 0; i < N; ++i)
        func();
}

void func_tbb (int N)
{
      tbb::parallel_for(tbb::blocked_range<int>(0, N), func);
}

void func_select (int N)
{
     // At runtime or at compile time, based which programming model is available,
     // select to run func_omp or func_tbb
}

这里的基本思想是func可能非常复杂.我想支持多个并行编程模型.如果我使用并行编程特定于线程的本地构造,那么我已经实现了func的不同版本,或者至少实现了其中的一部分.但是,如果我可以自由使用C ++ 11 thread_local,那么除了func外,我只需要实现一些非常简单的功能.对于较大的项目,可以通过使用模板编写func_ompfunc_tbb的更通用版本来进一步简化操作.但是,我不确定这样做是否安全.

The basic idea here is that func may be quite complex. I want to support multiple parallel programming models. If I use parallel programming specific thread-local constructs, then I have implement different versions of func or at least partial of it. However, if I can freely use C++11 thread_local, then in addition to func I only need to implement a few very simple functions. And for a larger project things can be further simplified by using templates to write more generic versions of func_omp, func_tbb. However, I am not quite sure it is safe to do so.

推荐答案

一方面,OpenMP规范有意忽略了与其他编程范例互操作性的任何规范,并且C ++ 11线程与OpenMP的任何混合都是非标准的和特定于供应商.另一方面,编译器(至少是GCC)倾向于使用相同的底层TLS机制来实现OpenMP的#pragma omp threadprivate,C ++ 11的thread_local以及各种特定于编译器的存储类,例如__thread.

On the one side, the OpenMP specification intentionally omits any specifications concerning interoperability with other programming paradigms and any mixing of C++11 threading with OpenMP is non-standard and vendor-specific. On the other side, compilers (at least GCC) tend to use the same underlying TLS mechanism to implement OpenMP's #pragma omp threadprivate, C++11's thread_local and the various compiler-specific storage classes like __thread.

例如,GCC完全在POSIX线程API之上实现其OpenMP运行时(libgomp),并通过将变量放在ELF TLS存储上来实现OpenMP threadprivate.这可以与GNU的C ++ 11实现互操作,该实现也使用POSIX线程,并将thread_local变量放在ELF TLS存储中.最终,这可以与使用__thread关键字的代码进行互操作,以指定线程本地存储类和显式POSIX线程API调用.例如,以下代码:

For example, GCC implements its OpenMP runtime (libgomp) entirely on top of the POSIX threads API and implements OpenMP threadprivate by placing the variables on the ELF TLS storage. This interoperates with GNU's C++11 implementation that also uses POSIX threads and places thread_local variables on the ELF TLS storage. Ultimately this interoperates with code that uses the __thread keyword to specify thread-local storage class and explicit POSIX threads API calls. For example, the following code:

int foo;
#pragma omp threadprivate(foo)

__thread int bar;

thread_local int baz;

int func(void)
{
   return foo + bar + baz;
}

编译为:

    .globl  foo
    .section        .tbss,"awT",@nobits
    .align 4
    .type   foo, @object
    .size   foo, 4
foo:
    .zero   4
    .globl  bar
    .align 4
    .type   bar, @object
    .size   bar, 4
bar:
    .zero   4
    .globl  baz
    .align 4
    .type   baz, @object
    .size   baz, 4
baz:
    .zero   4

    movl    %fs:foo@tpoff, %edx
    movl    %fs:bar@tpoff, %eax
    addl    %eax, %edx
    movl    %fs:baz@tpoff, %eax

此处,.tbss ELF节是线程本地的BSS(未初始化的数据).创建和访问这三个变量的方式相同.

Here the .tbss ELF section is the thread-local BSS (uninitialised data). All three variables are created and accessed in the same way.

互操作性现在与其他编译器无关紧要.当Clang仍然缺少OpenMP支持时,Intel不会实现thread_local.

Interoperability is of less concern right now with other compilers. Intel does not implement thread_local while Clang still misses OpenMP support.

这篇关于将C ++ 11 thread_local与其他并行库一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆