thread_local的成本 [英] The Cost of thread_local
问题描述
现在C ++正在将thread_local
存储作为一种语言功能添加,我想知道一些事情:
-
thead_local
的成本可能是多少?- 在记忆中?
- 要进行读写操作吗?
- 与此相关:操作系统通常如何实现此目的?似乎必须为每个创建的线程都为声明为
thread_local
的所有内容提供特定于线程的存储空间.
存储空间:变量的大小*线程数,或者可能是(sizeof(var)+ sizeof(var *))*线程数.>
有两种实现线程本地存储的基本方法:
-
使用某种系统调用来获取有关当前内核线程的信息. Sloooow.
-
使用一些指针(可能在处理器寄存器中),该指针由内核在每个线程上下文切换处正确设置-与所有其他寄存器同时设置.便宜.
在intel平台上,变体2通常是通过某些段寄存器(FS或GS,我不记得了)实现的. GCC和MSVC都支持此功能.因此,访问时间大约与全局变量一样快.
这也是可能的,但实际上我还没有看到,这可以通过现有的库函数(如pthread_getspecific
)来实现.这样,性能将类似于1.或2.,再加上库调用开销.请记住,变体2.+库调用的开销仍然比内核调用快得多.
Now that C++ is adding thread_local
storage as a language feature, I'm wondering a few things:
- What is the cost of
thead_local
likely to be?- In memory?
- For read and write operations?
- Associated with that: how do Operating Systems usually implement this? It would seem like anything declared
thread_local
would have to be given thread-specific storage space for each thread created.
Storage space: size of the variable * number of threads, or possibly (sizeof(var) + sizeof(var*)) * number of threads.
There are two basic ways of implementing thread-local storage:
Using some sort of system call that gets information about the current kernel thread. Sloooow.
Using some pointer, probably in a processor register, that is set properly at every thread context switch by the kernel - at the same time as all the other registers. Cheap.
On intel platforms, variant 2 is usually implemented via some segment register (FS or GS, I don't remember). Both GCC and MSVC support this. Access times are therefore about as fast as for global variables.
It is also possible, but I haven't seen it yet in practice, for this to be implemented via existing library functions like pthread_getspecific
. Performance would then be like 1. or 2., plus library call overhead. Keep in mind that variant 2. + library call overhead is still a lot faster than a kernel call.
这篇关于thread_local的成本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!