指令顺序可以跨函数调用发生吗? [英] Can instruction order happen cross function call?

查看:114
本文介绍了指令顺序可以跨函数调用发生吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有如下伪C代码:

  int x = 0; 
int y = 0;

int __attribute__((noinline))func1(void)
{
int prev = x; (1)

x | =标记; (2)

返回上一页; (3)
}

int main(void)
{
int tmp;

...
y = 5; (4)
Compiler_mem_barrier();
func1();
Compiler_mem_barrier();
tmp = y; (5)
...
}

假设这是一个单线程处理,因此我们不必担心
的锁。并假设代码在x86系统上运行。我们还假设编译器不进行任何重新排序。写入不同的位置,但
不会与较旧的写入相同的位置)。但是对于我来说,尚不清楚
是否将调用/重拨指令视为写/读
指令。所以这是我的问题:


  1. 在x86系统上,调用是否被视为WRITE指令?我假设是因为调用会将地址压入堆栈。但是我没有找到正式的官方文件。因此,请帮助确认。


  2. 出于相同的原因, ret是否被视为READ指令(因为它从堆栈中弹出地址)? p>


  3. 实际上,可以在函数中对 ret指令进行重新排序。例如,可以在下面的ASM代码中在(2)之前执行(3)吗?这对我来说没有意义,但是 ret不是序列化指令。我没有在《英特尔手册》中找到任何地方说 ret无法重新排序。


  4. 在上面的代码中,可以(1)在(4)之前执行)?据推测,读指令(1)可以在写指令(4)之前重新排序。 call指令可能包含 jmp部分,但具有推测执行力....因此,我认为这种情况有可能发生,但是我希望对此问题更熟悉的人可以确认这一点。


  5. 在上面的代码中,可以在(2)之前执行(5)吗?如果 ret被认为是READ指令,那么我认为它不会发生。但是,我希望有人能再次确认。


如果需要func1()的汇编代码,它将应该是这样的

  mov%gs:0x24,%eax(1)
orl $ 0x8,%gs :0x24(2)
retq(3)

请帮助。谢谢!

解决方案

乱序执行可以重新排序任何内容,但保留了您的代码按程序顺序执行的错觉。 OoOE的基本原则是您不要破坏单线程程序。硬件跟踪依赖关系,以便指令可以在它们的输入和执行单元准备好后立即执行,但保留了一切都按程序顺序发生的错觉。




您似乎将单个内核上的OoOE与其他内核全局可见的加载/存储顺序混淆了。 (存储缓冲区将那些解耦


如果您有一个线程观察在另一个内核上运行的另一个线程的堆栈内存,那么可以,由调用生成的存储(推回地址)将与其他商店一起订购。


但是,运行该代码的线程中的乱序执行实际上可以执行 ret 指令,而存储因高速缓存未命中而延迟或执行较长的依赖链时。多个高速缓存未命中可以一次飞行。内存顺序缓冲区只需要确保以后的存储才真正在全局上可见,直到更早的存储之后,才可以保留x86的内存排序语义。




如果您对硬件重新排序有特定疑问,则可能应该发布asm代码,而不是C代码,因为 C ++编译器可以在编译时根据C ++内存模型重新排序,对于像x86这样的强排序目标进行编译时,它不会改变。


另请参见内存重新排序如何帮助处理器和编译器? (一个Java问题,但我的回答并不特定于Java)。




re:您的编辑


此答案已经假设您的函数是 noinline ,并且您正在谈论就像您的C那样的ASM,而不是编译器实际上会从您的代码生成的ASM。

  mov%gs:0x24,%eax(1) 
orl $ 0x8,%gs:0x24(2)
retq(3)

所以 x 实际上位于线程本地存储中,而不是普通的全局 int x 。不过,这对于乱序执行实际上并不重要;带有%gs 段覆盖的负载仍然是负载。


Suppose I have pseudo C code like below:

int x = 0;
int y = 0;

int __attribute__ ((noinline)) func1(void)
{ 
  int prev = x;  (1)

   x |= FLAG;    (2)

   return prev;  (3)
}

int main(void)
{  
  int tmp;

   ...
   y = 5;   (4)
   compiler_mem_barrier();
   func1();
   compiler_mem_barrier();
   tmp = y;  (5)
   ...
}

Suppose this is a single threaded process so we don't need to worry about locks. And suppose the code is running on an x86 system. Let's also suppose the compiler doesn't do any reordering.

I understand that x86 systems can only reorder write/read instructions (Reads may be reordered with older writes to different locations but not with older writes to the same location). But it's not clear to me if call/ret instructions are considered to be WRITE/READ instructions. So here are my questions:

  1. On x86 systems, is "call" treated as a WRITE instruction? I assume so since call will push the address to the stack. But I didn't find an official document officially saying that. So please help confirm.

  2. For the same reason, is "ret" treated as a READ instruction (since it pops the address from the stack)?

  3. Actually, can "ret" instruction be reordered within the function. For example, can (3) be executed before (2) in the ASM code below? This doesn't make sense to me, but "ret" is not a serializing instruction. I didn't find any place in Intel Manual saying "ret" cannot be reordered.

  4. In the code above, can (1) be executed before (4)? Presumably, read instructions (1) can be reordered ahead of write instructions (4). The "call" instruction may have a "jmp" part, but with speculative execution .... So I feel it can happen, but I hope someone more familiar with this issue can confirm this.

  5. In the code above, can (5) be executed before (2)? If "ret" is considered to be a READ instruction, then I assume it cannot happen. But again, I hope someone can confirm this.

In case the assembly code for func1() is needed, it should be something like:

mov    %gs:0x24,%eax          (1)                                                                                                                                                                                                
orl    $0x8,%gs:0x24          (2)                                                                                                                                                                                                
retq                          (3)

Please help. Thanks!

解决方案

Out-of-order execution can reorder anything, but it preserves the illusion that your code executed in program order. The cardinal rule of OoOE is that you don't break single-threaded programs. The hardware tracks dependencies so the instructions can execute as soon as their inputs and an execution unit are ready, but preserves the illusion that everything happened in program order.


You appear to be confusing OoOE on a single core with the order in which the loads/stores become globally visible to other cores. (The store buffer decouples those)

If you have one thread observing the stack memory of another thread running on another core, then yes, the store generated by call (pushing a return address) will be ordered with other stores.

However, out-of-order execution in the thread running this code can actually execute call and ret instructions while a store is delayed on a cache miss, or while a long dependency chain is executing. Multiple cache misses can be in flight at once. The memory-order buffer just has to make sure that later stores don't actually become globally visible until after earlier stores, to preserve x86's memory ordering semantics.


If you have a specific question about hardware reordering, you should probably post asm code, not C code, because C++ compilers can reorder at compile time based on the C++ memory model, which doesn't change when compiling for a strongly-ordered target like x86.

See also How does memory reordering help processors and compilers? (a Java question, but my answer isn't Java-specific).


re: your edit

This answer was already assuming your function was noinline, and that you were talking about ASM that looked like your C, not what a compiler would actually generate from your code.

mov    %gs:0x24,%eax          (1)                                                                                                                                                                                                
orl    $0x8,%gs:0x24          (2)                                                                                                                                                                                                
retq                          (3)

So x is actually in thread-local storage, not a plain global int x. This doesn't actually matter for out-of-order execution, though; a load with a %gs segment override is still a load.

这篇关于指令顺序可以跨函数调用发生吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆