在组装中线程本地存储 [英] thread local storage in assembly
问题描述
我想在汇编中增加TLS变量,但在汇编代码中给出了分段错误.我不想让编译器更改任何其他寄存器或内存.有没有一种方法可以不使用gcc输入和输出语法?
I want to increment a TLS variable in assembly but is gives a segmentation fault in the assembly code. I don't want to let compiler change any other register or memory. Is there a way to do this without using gcc input and output syntax?
__thread unsigned val;
int main() {
val = 0;
asm("incl %gs:val");
return 0;
}
推荐答案
如果由于某些原因确实确实需要执行此操作,则应通过将其地址预加载到C中来从汇编语言访问线程局部变量,像这样:
If you really really need to be able to do this for some reason, you should access a thread-local variable from assembly language by preloading its address in C, like this:
__thread unsigned val;
void incval(void)
{
unsigned *vp = &val;
asm ("incl\t%0" : "+m" (*vp));
}
这是因为访问线程局部变量所需的代码序列对于GCC支持的几乎每个操作系统和CPU组合都是不同的,并且如果您要为共享库而不是可执行文件(例如,使用-fPIC
).上面的构造允许编译器为您发出正确的代码序列.如果可以在没有任何额外指令的情况下访问线程局部变量,则地址生成将被折叠到汇编操作中.举例说明,这是用于x86/Linux的gcc 4.7如何以几种不同的方式进行编译(为清楚起见,我在所有情况下都去除了一堆汇编程序指令)...
This is because the code sequence required to access a thread-local variable is different for just about every OS and CPU combination supported by GCC, and also varies if you're compiling for a shared library rather than an executable (i.e. with -fPIC
). The above construct allows the compiler to emit the correct code sequence for you. In cases where it is possible to access the thread-local variable without any extra instructions, the address generation will be folded into the assembly operation. By way of illustration, here is how gcc 4.7 for x86/Linux compiles the above in several different possible modes (I've stripped out a bunch of assembler directives in all cases, for clarity)...
# -S -O2 -m32 -fomit-frame-pointer
incval:
incl %gs:val@ntpoff
ret
# -S -O2 -m64
incval:
incl %fs:val@tpoff
ret
# -S -O2 -m32 -fomit-frame-pointer -fpic
incval:
pushl %ebx
call __x86.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
leal val@tlsgd(,%ebx,1), %eax
call ___tls_get_addr@PLT
incl (%eax)
popl %ebx
ret
# -S -O2 -m64 -fpic
incval:
.byte 0x66
leaq val@tlsgd(%rip), %rdi
.value 0x6666
rex64
call __tls_get_addr@PLT
incl (%rax)
ret
要意识到,如果我针对x86/OSX进行编译,所有四个示例将有所不同,而对于x86/Windows,则再次有所不同.
Do realize that all four examples would be different if I'd compiled for x86/OSX, and different yet again for x86/Windows.
这篇关于在组装中线程本地存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!