调用堆栈究竟是如何工作的? [英] How exactly does the callstack work?

查看:30
本文介绍了调用堆栈究竟是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图更深入地了解编程语言的低级操作是如何工作的,尤其是它们如何与 OS/CPU 交互.我可能已经阅读了 Stack Overflow 上每个堆栈/堆相关线程中的每个答案,它们都很出色.但是还有一件事我还没有完全理解.

在伪代码中考虑这个函数,它往往是有效的 Rust 代码;-)

fn foo() {让 a = 1;让 b = 2;让 c = 3;让 d = 4;//第 X 行做某事(a,b);做另一件事(c,d);}

这就是我假设堆栈在第 X 行的样子:

堆栈+-------------+|1 |b +-------------+|2 |c +-------------+|3 |d +-------------+|4 |+------------+

现在,我读到的关于堆栈如何工作的所有内容都是它严格遵守 LIFO 规则(后进先出).就像 .NET、Java 或任何其他编程语言中的堆栈数据类型一样.

但如果是这样,那么在 X 行之后会发生什么?因为很明显,我们接下来需要处理 ab,但这意味着 OS/CPU (?) 必须弹出 dc 首先回到 ab.但是它会在脚下射击自己,因为它需要在下一行中使用 cd.

那么,我想知道幕后究竟发生了什么?

另一个相关问题.考虑我们传递对其他函数之一的引用,如下所示:

fn foo() {让 a = 1;让 b = 2;让 c = 3;让 d = 4;//第 X 行doSomething(&a, &b);做另一件事(c,d);}

根据我的理解,这意味着 doSomething 中的参数本质上指向相同的内存地址,如 abfoo 中.但话又说回来,这意味着在我们到达 ab 发生之前,不会弹出堆栈.

那两个案例让我觉得我还没有完全掌握究竟堆栈是如何工作的,以及它如何严格遵循LIFO规则.

解决方案

调用栈也可以称为帧栈.
在LIFO原则之后堆叠的不是局部变量,而是被调用函数的整个栈帧(调用").局部变量与所谓的 给我们

主要:pushq %rbpmovq %rsp, %rbpsubq 16 美元,%rspmovl std::cin, %edi调用 std::basic_istream>::get()movb %al, -1(%rbp)movsbl -1(%rbp), %eaxmovl %eax, %esimovl std::cout, %edi调用 [... char 的插入运算符,长的东西... ]movl $0, %eax离开退

.. 用于 main.我将代码分成三个小节.函数序言由前三个操作组成:

  • 基指针被压入堆栈.
  • 栈指针保存在基指针中
  • 减去堆栈指针为局部变量腾出空间.

然后将cin移入EDI寄存器2并调用get;返回值在 EAX 中.

到目前为止一切顺利.现在有趣的事情发生了:

EAX 的低位字节,由 8 位寄存器 AL 指定,被取出并存储在基指针之后的字节:即 -1(%rbp),基指针的偏移量为-1.这个字节是我们的变量c.偏移量为负,因为堆栈在 x86 上向下增长.下一个操作将 c 存储在 EAX 中:EAX 移动到 ESI,cout 移动到 EDI,然后使用 cout 调用插入运算符,然后c 是参数.

最后,

  • main 的返回值存储在 EAX: 0 中.这是因为隐含的 return 语句.您可能还会看到 xorl rax rax 而不是 movl.
  • 离开并返回呼叫站点.leave 是这个结尾的缩写和隐含的
    • 用基指针替换堆栈指针和
    • 弹出基指针.

在执行此操作和 ret 之后,框架已经有效地弹出,尽管调用者仍然需要清理参数,因为我们正在使用 cdecl 调用约定.其他约定,例如stdcall,要求被调用者整理,例如通过将字节数传递给 ret.

帧指针省略

也可以不使用基/帧指针的偏移量,而是使用堆栈指针 (ESB) 的偏移量.这使得 EBP 寄存器原本包含可用于任意使用的帧指针值 - 但它可以使 在某些机器上无法调试,并且将隐式关闭某些功能.在为只有少数寄存器的处理器(包括 x86)进行编译时,它特别有用.

这种优化被称为 FPO(帧指针省略),由 GCC 中的 -fomit-frame-pointer 和 Clang 中的 -Oy 设置;请注意,当且仅当调试仍然可能时,它由每个优化级别 > 0 隐式触发,因为除此之外它没有任何成本.如需更多信息,请参阅此处此处.

<小时>

1 正如评论中所指出的,帧指针大概是指向返回地址之后的地址.

2 请注意,以 R 开头的寄存器是以 E 开头的寄存器的 64 位对应物.EAX 指定 RAX 的四个低位字节.为清楚起见,我使用了 32 位寄存器的名称.

I'm trying to get a deeper understanding of how the low level operations of programming languages work and especially how they interact with the OS/CPU. I've probably read every answer in every stack/heap related thread here on Stack Overflow, and they are all brilliant. But there is still one thing that I didn't fully understand yet.

Consider this function in pseudo code which tends to be valid Rust code ;-)

fn foo() {
    let a = 1;
    let b = 2;
    let c = 3;
    let d = 4;

    // line X

    doSomething(a, b);
    doAnotherThing(c, d);
}

This is how I assume the stack to look like on line X:

Stack

a +-------------+
  | 1           | 
b +-------------+     
  | 2           |  
c +-------------+
  | 3           | 
d +-------------+     
  | 4           | 
  +-------------+ 

Now, everything I've read about how the stack works is that it strictly obeys LIFO rules (last in, first out). Just like a stack datatype in .NET, Java or any other programming language.

But if that's the case, then what happens after line X? Because obviously, the next thing we need is to work with a and b, but that would mean that the OS/CPU (?) has to pop out d and c first to get back to a and b. But then it would shoot itself in the foot, because it needs c and d in the next line.

So, I wonder what exactly happens behind the scenes?

Another related question. Consider we pass a reference to one of the other functions like this:

fn foo() {
    let a = 1;
    let b = 2;
    let c = 3;
    let d = 4;

    // line X

    doSomething(&a, &b);
    doAnotherThing(c, d);
}

From how I understand things, this would mean that the parameters in doSomething are essentially pointing to the same memory address like a and b in foo. But then again this means that there is no pop up the stack until we get to a and b happening.

Those two cases make me think that I haven't fully grasped how exactly the stack works and how it strictly follows the LIFO rules.

解决方案

The call stack could also be called a frame stack.
The things that are stacked after the LIFO principle are not the local variables but the entire stack frames ("calls") of the functions being called. The local variables are pushed and popped together with those frames in the so-called function prologue and epilogue, respectively.

Inside the frame the order of the variables is completely unspecified; Compilers "reorder" the positions of local variables inside a frame appropriately to optimize their alignment so the processor can fetch them as quickly as possible. The crucial fact is that the offset of the variables relative to some fixed address is constant throughout the lifetime of the frame - so it suffices to take an anchor address, say, the address of the frame itself, and work with offsets of that address to the variables. Such an anchor address is actually contained in the so-called base or frame pointer which is stored in the EBP register. The offsets, on the other hand, are clearly known at compile time and are therefore hardcoded into the machine code.

This graphic from Wikipedia shows what the typical call stack is structured like1:

Add the offset of a variable we want to access to the address contained in the frame pointer and we get the address of our variable. So shortly said, the code just accesses them directly via constant compile-time offsets from the base pointer; It's simple pointer arithmetic.

Example

#include <iostream>

int main()
{
    char c = std::cin.get();
    std::cout << c;
}

gcc.godbolt.org gives us

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp

    movl    std::cin, %edi
    call    std::basic_istream<char, std::char_traits<char> >::get()
    movb    %al, -1(%rbp)
    movsbl  -1(%rbp), %eax
    movl    %eax, %esi
    movl    std::cout, %edi
    call    [... the insertion operator for char, long thing... ]

    movl    $0, %eax
    leave
    ret

.. for main. I divided the code into three subsections. The function prologue consists of the first three operations:

  • Base pointer is pushed onto the stack.
  • The stack pointer is saved in the base pointer
  • The stack pointer is subtracted to make room for local variables.

Then cin is moved into the EDI register2 and get is called; The return value is in EAX.

So far so good. Now the interesting thing happens:

The low-order byte of EAX, designated by the 8-bit register AL, is taken and stored in the byte right after the base pointer: That is -1(%rbp), the offset of the base pointer is -1. This byte is our variable c. The offset is negative because the stack grows downwards on x86. The next operation stores c in EAX: EAX is moved to ESI, cout is moved to EDI and then the insertion operator is called with cout and c being the arguments.

Finally,

  • The return value of main is stored in EAX: 0. That is because of the implicit return statement. You might also see xorl rax rax instead of movl.
  • leave and return to the call site. leave is abbreviating this epilogue and implicitly
    • Replaces the stack pointer with the base pointer and
    • Pops the base pointer.

After this operation and ret have been performed, the frame has effectively been popped, although the caller still has to clean up the arguments as we're using the cdecl calling convention. Other conventions, e.g. stdcall, require the callee to tidy up, e.g. by passing the amount of bytes to ret.

Frame Pointer Omission

It is also possible not to use offsets from the base/frame pointer but from the stack pointer (ESB) instead. This makes the EBP-register that would otherwise contain the frame pointer value available for arbitrary use - but it can make debugging impossible on some machines, and will be implicitly turned off for some functions. It is particularly useful when compiling for processors with only few registers, including x86.

This optimization is known as FPO (frame pointer omission) and set by -fomit-frame-pointer in GCC and -Oy in Clang; note that it is implicitly triggered by every optimization level > 0 if and only if debugging is still possible, since it doesn't have any costs apart from that. For further information see here and here.


1 As pointed out in the comments, the frame pointer is presumably meant to point to the address after the return address.

2 Note that the registers that start with R are the 64-bit counterparts of the ones that start with E. EAX designates the four low-order bytes of RAX. I used the names of the 32-bit registers for clarity.

这篇关于调用堆栈究竟是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆