在C ++ STL类型的静态实例上使用OpenMP threadprivate指令 [英] Using the OpenMP threadprivate directive on static instances of C++ STL types

查看:148
本文介绍了在C ++ STL类型的静态实例上使用OpenMP threadprivate指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下代码段:

#include <map>

class A {
    static std::map<int,int> theMap;
#pragma omp threadprivate(theMap)
};

std::map<int,int> A::theMap;

使用OpenMP编译失败,并显示以下错误消息:

Compilation with OpenMP fails with the following error message:

$ g++ -fopenmp -c main.cpp 
main.cpp:5:34: error: ‘threadprivate’ ‘A::theMap’ has incomplete type

我不明白这一点.我可以在没有#pragma指令的情况下进行编译,这应该意味着std::map不是不是不完整.如果Map是原始类型(double,int ...),我也可以编译.

I don't understand this. I can compile without the #pragma directive, which should mean that std::map is not incomplete. I can also compile if theMap is a primitive type (double, int...).

如何制作全局静态std::map threadprivate?

推荐答案

这是编译器的限制.英特尔C/C ++编译器在threadprivate上支持C ++类,而gcc和MSVC当前不支持.

This is a compiler restriction. Intel C/C++ compiler supports C++ classes on threadprivate while gcc and MSVC currently cannot.

例如,在MSVC(VS 2010)中,您会收到此错误(我删除了该类):

For example, in MSVC (VS 2010), you will get this error (I removed the class):

static std::map<int,int> theMap;
#pragma omp threadprivate(theMap)

error C3057: 'theMap' : dynamic initialization of 'threadprivate' symbols is not currently supported

因此,解决方法非常明显,但是很脏.您需要做一个非常简单的线程本地存储.一种简单的方法是:

So, the workaround is pretty obvious, but dirty. You need to make a very simple thread-local storage. A simple approach would be:

const static int MAX_THREAD = 64;

struct MY_TLS_ITEM
{
  std::map<int,int> theMap;
  char padding[64 - sizeof(theMap)];
};

__declspec(align(64)) MY_TLS_ITEM tls[MAX_THREAD];

请注意,我进行填充的原因是避免虚假共享.我假设现代Intel x86处理器使用64字节高速缓存行. __declspec(align(64))是MSVC扩展,其结构位于64的边界上.因此,tls中的任何元素都将位于不同的缓存行中,从而不会导致错误共享. GCC具有__attribute__ ((aligned(64))).

Note that the reason why I have padding is to avoid false sharing. I assume that 64-byte cache line for modern Intel x86 processors. __declspec(align(64)) is a MSVC extension that the structure is on the boundary of 64. So, any elements in tls will be located on a different cache line, resulting in no false sharing. GCC has __attribute__ ((aligned(64))).

为了访问此简单的TLS,您可以执行以下操作:

In order to access this simple TLS, you can do this:

tls[omp_get_thread_num()].theMap;

当然,您应该在OpenMP并行构造之一中调用此函数.不错的是,OpenMP在[0,N)中提供了一个抽象的线程ID,其中N是最大线程数.这样可以实现快速,简单的TLS实施.通常,来自操作系统的本地TID是任意整数.因此,您通常需要一个哈希表,该哈希表的访问时间比简单数组长.

Of course, you should call this inside one of OpenMP parallel constructs. The nice thing is that OpenMP provides an abstracted thread ID in [0, N), where N is the maximum thread number. This enables a fast and simple TLS implementation. In general, a native TID from operating system is an arbitrary integer number. So, you mostly need to have a hash table whose access time is longer than a simple array.

这篇关于在C ++ STL类型的静态实例上使用OpenMP threadprivate指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆