为什么线程本地存储不能实现页表映射? [英] Why is thread local storage not implemented with page table mappings?

查看:152
本文介绍了为什么线程本地存储不能实现页表映射?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够使用C ++ 11 thread_local 关键字来获得每个线程的boolean标志,这个标志很频繁地被访问。

I was hoping to use the C++11 thread_local keyword for a per-thread boolean flag that is going to be accessed very frequently.

然而,大多数编译器似乎实现线程本地存储与一个表,映射整数ID(槽)到当前线程的变量的地址。这种查找会发生在性能关键的代码路径内,所以我对它的性能有一些关注。

However, most compilers seem to implemented thread local storage with a table that maps integer IDs (slots) to the variable's address on the current thread. This lookup would happen inside a performance-critical code path, so I have some concerns about its performance.

我希望线程本地存储实现的方式是分配由根据线程的不同物理页面支持的虚拟存储器范围。这样,访问该标记将与任何其他内存访问成本相同,因为MMU负责映射。

The way I would have expected thread local storage to be implemented is by allocating virtual memory ranges that are backed by different physical pages depending on the thread. That way, accessing the flag would be the same cost as any other memory access, since the MMU takes care of the mapping.

为什么没有一个主流编译器可以利用

Why do none of the mainstream compilers take advantage of page table mappings in this way?

我想我可以用 mmap 在Linux上和 VirtualAlloc 在Win32,但这似乎是一个很常见的用例。如果有人知道现有的或更好的解决方案,请给我指出。

I suppose I can implement my own "thread-specific page" with mmap on Linux and VirtualAlloc on Win32, but this seems like a pretty common use-case. If anyone knows of existing or better solutions, please point me to them.

我也考虑过存储一个 std :: atomic& :thread :: id> 代表活动线程,但是分析显示对 std :: this_thread :: get_id()== active_thread

I've also considered storing an std::atomic<std::thread::id> inside each object to represent the active thread, but profiling shows that the check for std::this_thread::get_id() == active_thread is quite expensive.

推荐答案

在Linux / x86-64线程本地存储通过一个特殊的段寄存器%fs (每 x86-64 ABI page 21 ...)

On Linux/x86-64 thread local storage is implemented thru a special segment register %fs (per x86-64 ABI page 21...)

所以下面的代码(我使用C + GCC扩展 __ thread 语法,但它与C ++ 11相同 thread_local

So the following code (I'm using C + GCC extension __thread syntax, but it is the same as C++11 thread_local)

__thread int x;
int f(void) { return x; }

编译(使用 gcc -O -fverbose-asm -S )into:

is compiled (with gcc -O -fverbose-asm -S) into:

         .text
 .Ltext0:
         .globl  f
         .type   f, @function
 f:
 .LFB0:
         .file 1 "tl.c"
         .loc 1 3 0
         .cfi_startproc
         .loc 1 3 0
         movl    %fs:x@tpoff, %eax       # x,
         ret
         .cfi_endproc
 .LFE0:
         .size   f, .-f
         .globl  x
         .section        .tbss,"awT",@nobits
         .align 4
         .type   x, @object
         .size   x, 4
 x:
         .zero   4

因此,与您的恐惧相反,到TLS在Linux / x86-64上真的很快。它不是完全实现为表(而是内核和运行时管理%fs 段寄存器以指向线程特定的存储器区域,并且编译器和链接器管理偏移)。但是,旧的 pthread_getspecific 确实通过一个表,但几乎没用您有TLS。

Therefore, contrarily to your fears, access to TLS is really quick on Linux/x86-64. It is not exactly implemented as a table (instead the kernel & runtime manage the %fs segment register to point to a thread-specific memory zone, and the compiler & linker manage the offset there). However, old pthread_getspecific indeed went thru a table, but is nearly useless once you have TLS.

BTW,按定义,所有主题,在同一过程地址空间 .org / wiki / Virtual_memoryrel =nofollow> virtual memory ,因为一个进程有自己的单个地址空间。 (见 / proc / self / maps 等...见 proc(5)了解更多关于 / proc / 以及 mmap(2); C ++ 11线程库基于 pthreads ,它们使用 clone(2))。所以线程特定的内存映射是一个矛盾:一旦任务(由内核调度器运行的东西)有自己的地址空间,它被称为进程(而不是线程)。同一过程中的线程的定义特征是共享公共地址空间一些其他实体,如文件描述符)。

BTW, by definition, all threads in the same process share the same address space in virtual memory, since a process has its own single address space. (see /proc/self/maps etc... see proc(5) for more about /proc/, and also mmap(2); the C++11 thread library is based on pthreads which are implemented using clone(2)). So "thread-specific memory mapping" is a contradiction: once a task (the thing which is run by the kernel scheduler) has its own address space, it is called a process (not a thread). The defining characteristic of threads in the same process is to share a common address space (and some other entities, like file descriptors).

这篇关于为什么线程本地存储不能实现页表映射?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆