如何在gcc`__thread`工作? [英] How does the gcc `__thread` work?

查看:147
本文介绍了如何在gcc`__thread`工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何 __线程在gcc中实现的?这纯粹是一个包装过 pthread_getspecific pthread_setspecific

How is __thread in gcc implemented? Is it simply a wrapper over pthread_getspecific and pthread_setspecific?

使用我的程序,使用POSIX API为TLS,我有点失望现在看到我的程序运行时间的30%用于 pthread_getspecific 。我把它叫做对每个函数调用需要的资源的进入。编译器似乎不内联优化后,优化了 pthread_getspecific 。因此,内联函数之后,code基本上是搜索正确的TLS指针一次又一次得到同样的指针返回。

With my program that uses the posix API for TLS, I'm kind of disappointed now seeing that 30% of my program runtime is spent on pthread_getspecific. I called it on the entry of each function call that needs the resource. The compiler doesn't seem to optimize out pthread_getspecific after inlining optimization. So after the functions are inlined the code is basically searching for the correct TLS pointer again and again to get the same pointer returned.

威尔 __线程帮助我在这种情况呢?我知道有 thread_local 在C11,但海合会我不支持它。 (但是现在我看到我的gcc不支持 _Thread_local 只是没有宏)。

Will __thread help me in this situation? I know that there is thread_local in C11, but the gcc I have doesn't support it yet. (But now I see that my gcc does support _Thread_local just not the macro.)

我知道我可以简单地测试一下看看。但我现在得去别的地方,我想知道在一个功能更好的之前,我尝试相当大的重写。

I know I can simply test it and see. But I have to go somewhere else now, and I'd like to know better on a feature before I attempt a quite big rewrite.

推荐答案

GCC ,例如 GCC 5 都支持C11和 thread_local (如果与如<编译code>的gcc -std = C11 )。由于 FUZxxl 的评论,你可以使用(而不是C11 thread_local )在 __线程预选赛由旧版本的GCC支持。阅读关于线程本地存储

Recent GCC, e.g. GCC 5 do support C11 and its thread_local (if compiling with e.g. gcc -std=c11). As FUZxxl commented, you could use (instead of C11 thread_local) the __thread qualifier supported by older GCC versions. Read about Thread Local Storage.

pthread_getspecific 确实很慢(这是在POSIX库,因此不被GCC提供,但例如通过的 GNU glibc的 MUSL-libc的 ),因为它涉及到一个函数调用。使用 thread_local 变量将很有可能会更快。

pthread_getspecific is indeed quite slow (it is in the POSIX library, so is not provided by GCC but e.g. by GNU glibc or musl-libc) since it involves a function call. Using thread_local variables will very probably be faster.

看入 MUSL的源$ C ​​$ C 螺纹/ pthread_getspecific.c 文件
实施的一个例子。阅读这个答案以一个相关的问题。

Look into the source code of MUSL's thread/pthread_getspecific.c file for an example of implementation. Read this answer to a related question.

_Thread &安培; thread_local 的(通常)不会奇迹般地转化为呼叫 pthread_getspecific 。他们通常涉及到一些具体的地址模式和/或注册(细节是具体落实,关系到 ABI ;在Linux上,我想,既然有x86-64的更多的寄存器和放大器;地址模式,其实现TLS的比在i386上)更快,从的编译连接运行系统。它可能发生相反的 pthread_getspecific 使用的是一些内部的一些实现thread_local 变量(在您的实现POSIX线程)。

And _thread & thread_local are (often) not magically translated to calls to pthread_getspecific. They usually involve some specific address mode and/or register (details are implementation specific, related to the ABI; on Linux, I guess that since x86-64 has more registers & address modes, its implementation of TLS is faster than on i386), with help from the compiler, the linker and the runtime system. It could happen on the contrary that some implementations of pthread_getspecific are using some internal thread_local variables (in your implementation of POSIX threads).

作为一个例子,编译以下code

As an example, compiling the following code

#include <pthread.h>

const extern pthread_key_t key;

__thread int data;

int
get_data (void) {
  return data;
}

int
get_by_key (void) {
  return *(int*) (pthread_getspecific (key));
}

使用GCC 5.2(在Debian / SID)与的gcc -m32 -S -O2 -fverbose-ASM 给出了以下code为 GET_DATA 使用TLS:

using GCC 5.2 (on Debian/Sid) with gcc -m32 -S -O2 -fverbose-asm gives the following code for get_data using TLS:

  .type get_data, @function
get_data:
.LFB3:
  .cfi_startproc
  movl  %gs:data@ntpoff, %eax   # data,
  ret
.cfi_endproc

和以下$ C $ get_by_key 的用的显式调用 pthread_getspecific

and the following code of get_by_key with an explicit call to pthread_getspecific:

get_by_key:
 .LFB4:
  .cfi_startproc
  subl  $24, %esp   #,
  .cfi_def_cfa_offset 28
  pushl key # key
  .cfi_def_cfa_offset 32
  call  pthread_getspecific #
  movl  (%eax), %eax    # MEM[(int *)_4], MEM[(int *)_4]
  addl  $28, %esp   #,
  .cfi_def_cfa_offset 4
  ret
  .cfi_endproc

因此​​,使用带有 TLS __线程(或 thread_local 在C11)可能应该比使用<$ C $更快C> pthread_getspecific (避免调用的开销)。

Hence using TLS with __thread (or thread_local in C11) should probably be faster than using pthread_getspecific (avoiding the overhead of a call).

注意 thread_local 是的方便的宏&LT; threads.h方式&gt; (一C11标准头)

Notice that thread_local is a convenience macro defined in <threads.h> (a C11 standard header).

这篇关于如何在gcc`__thread`工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆