适用于x86处理器的gcc中的编译器屏障的运行时开销 [英] Run time overhead of compiler barrier in gcc for x86 processors

查看:146
本文介绍了适用于x86处理器的gcc中的编译器屏障的运行时开销的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究在x86 env中使用编译器屏障(在gcc中)的副作用/运行时开销.

I was looking into the side effects/run time overhead of using compiler barrier ( in gcc ) in x86 env.

编译器屏障:asm volatile(:::"memory")

Compiler barrier: asm volatile( ::: "memory" )

GCC文档讲述了一些有趣的内容( https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html )

GCC documentation tells something interesting ( https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html )

节选:

内存"缓冲区告诉编译器汇编代码 执行内存读取或写入,而不是列出的项目 输入和输出操作数(例如,访问内存 由输入参数之一指向). 确保内存中包含 正确的值,GCC可能需要将特定的寄存器值刷新为 .此外,编译器不假定 在asm之前从内存中读取的所有值在之后都保持不变 那个屁股它会根据需要重新加载它们.使用内存"清除器 有效地为编译器形成了一个读/写内存屏障.

The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.

问题:

1)清除哪些寄存器值?

1) What register values are flushed ?

2)为什么需要冲洗?

2) Why it needs to be flushed ?

3)示例?

4)除寄存器刷新外,还有其他开销吗?

4) Is there any other overhead apart from register flushing ?

推荐答案

另一个线程可能指向的每个内存位置都需要在屏障之前更新,并在之后重新加载.因此,存在于寄存器中的任何此类值都需要存储(如果脏了),或者如果寄存器中的值仅仅是内存中内容的副本,则只是被遗忘".

Every memory location which another thread might have a pointer to needs to be up to date before the barrier, and reloaded after. So any such values that are live in registers needed to be stored (if dirty), or just "forgotten about" if the value in a register is just a copy of what's still in memory.

有关此gcc非bug报告,请参见此gcc非bug报告. gcc dev:一个"memory" Clobber仅包含可以间接访问的内存(因此可以在此编译单元或其他编译单元中获取地址)

See this gcc non-bug report for this quote from a gcc dev: a "memory" clobber only includes memory that can be indirectly accessed (thus may be address-taken in this or another compilation unit)

除了寄存器刷新之外还有其他开销吗?

Is there any other overhead apart from register flushing ?

障碍可以阻止诸如使商店陷入循环之外的优化,但这通常是为什么您使用障碍的原因.确保循环计数器和循环变量是尚未将其地址传递给编译器无法看到的函数的局部变量,否则必须在循环内溢出/重新加载它们.让引用转义您的函数始终是优化的潜在问题,但这几乎可以保证带有障碍的较差代码.

A barrier can prevent optimizations like sinking a store out of a loop, but that's usually why you used barriers. Make sure your loop counters and loop variables are locals that haven't had their address passed to functions the compiler can't see, or else they'll have to be spilled/reloaded inside the loop. Letting references escape your function is always a potential problem for optimization, but it's a near-guarantee of worse code with barriers.

为什么?

这是障碍的全部要点:因此将值同步到内存,从而防止了编译时重新排序.

This is the whole point of a barrier: so values are synced to memory, preventing compile-time reordering.

asm volatile( ::: "memory" )(确切地说是)等同于 atomic_signal_fence(memory_order_seq_cst) (不是atomic_thread_fence,而是需要mfence指令才能在x86上实现).

asm volatile( ::: "memory" ) is (exactly?) equivalent to atomic_signal_fence(memory_order_seq_cst) (not atomic_thread_fence, which would take an mfence instruction to implement on x86).

示例:

有关更多信息,请参阅Jeff Preshing的在编译时进行内存排序" 文章,以了解更多信息.关于原因以及有关实际x86 asm的示例.

See Jeff Preshing's Memory Ordering at Compile Time article for more about why, and examples with actual x86 asm.

这篇关于适用于x86处理器的gcc中的编译器屏障的运行时开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆