编译器:了解从小程序生成的汇编代码 [英] Compilers: Understanding assembly code generated from small programs

查看:219
本文介绍了编译器:了解从小程序生成的汇编代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我自学了编译器的工作方式。我通过阅读来自小型64位Linux程序的 GCC 生成代码的反汇编来学习。



我写了这个 C 程序:

  #include< stdio.h> 

int main()
{
for(int i = 0; i <10; i ++){
int k = 0;


使用objdump后,我得到:

  00000000004004d6< main> ;: 
4004d6:55 push rbp
4004d7:48 89 e5 mov rbp,rsp
4004da:c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
4004e1:eb 0b jmp 4004ee< main + 0x18>
4004e3:c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
4004ea:83 45 f8 01 add DWORD PTR [rbp-0x8],0x1
4004ee:83 7d f8 09 cmp DWORD PTR [rbp-0x8],0x9
4004f2:7e ef jle 4004e3< main + 0xd>
4004f4:b8 00 00 00 00 mov eax,0x0
4004f9:5d pop rbp
4004fa:c3 ret
4004fb:0f 1f 44 00 00 nop DWORD PTR [rax + rax * 1 + 0x0]

现在我有些怀疑了。


  1. 结尾处是什么 NOP ,为什么它在那里? (alignment?)


  2. 我正在使用 gcc -Wall< program.c> 进行编译。为什么我没有收到警告控制达到非void函数的结束


  3. t编译器使用 sub rsp,0x10 来分配堆栈空间?为什么不使用 rbp 寄存器来引用本地堆栈数据?



    PS:如果我调用一个函数( printf )在 for 循环中,为什​​么编译器突然生成 sub rsp, 0x10的?为什么它仍然使用 rsp 寄存器引用本地数据。我期望生成的代码通过 rbp !引用本地堆栈数据!



return 0 c> main 函数,编译器会隐式添加它。请注意,这仅适用于 main 函数,没有其他函数。



至于第三个问题, code> rbp 寄存器充当 框架指针



最后是PS。被调用的函数很可能对传递给函数的参数使用 16 字节( 0x10 )。减法就是从堆栈中移除这些变量。难道它可能是你传递的两个指针吗?



如果你认真学习编译器是如何工作的,并且可能想创建你自己的(它很有趣! )),那么我建议你投资一些关于它的理论和实践的书籍。 龙书对任何程序员书架都是很好的补充。


I'm self-studying how compilers works. I'm learning by reading the disassembly of GCC generated code from small 64-bit Linux programs.

I wrote this C program:

#include <stdio.h>

int main()
{
    for(int i=0;i<10;i++){
        int k=0;
    }
}

After using objdump I get:

00000000004004d6 <main>:
  4004d6:       55                      push   rbp
  4004d7:       48 89 e5                mov    rbp,rsp
  4004da:       c7 45 f8 00 00 00 00    mov    DWORD PTR [rbp-0x8],0x0
  4004e1:       eb 0b                   jmp    4004ee <main+0x18>
  4004e3:       c7 45 fc 00 00 00 00    mov    DWORD PTR [rbp-0x4],0x0
  4004ea:       83 45 f8 01             add    DWORD PTR [rbp-0x8],0x1
  4004ee:       83 7d f8 09             cmp    DWORD PTR [rbp-0x8],0x9
  4004f2:       7e ef                   jle    4004e3 <main+0xd>
  4004f4:       b8 00 00 00 00          mov    eax,0x0
  4004f9:       5d                      pop    rbp
  4004fa:       c3                      ret    
  4004fb:       0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]

Now I have some doubts.

  1. What is that NOP at the end for, and why is it there? (alignment?)

  2. I'm compiling with gcc -Wall <program.c>. Why am I not getting the warning control reaches end of non-void function?

  3. Why doesn't the compiler allocate space on the stack with sub rsp,0x10? Why doesn't it use the rbp register for referencing local stack data?

    PS: If I call a function (like printf) in the for loop, why does the compiler suddenly generate sub rsp,0x10? Why does it still references local data with the rsp register. I expect the generated code to reference local stack data with rbp!

解决方案

Regarding the second question, since the C99 standard it's allowed to not have an explicit return 0 in the main function, the compiler will add it implicitly. Note that this is only for the main function, no other function.

As for the third question, the rbp register acts as the frame pointer.

Lastly the PS. It's likely that the called function is using 16 bytes (0x10) for the arguments passed to the function. The subtraction is what "removes" those variables from the stack. Could it possibly be two pointers you pass as arguments?

If you're serious learning how compilers in general works, and possibly want to create your own (it's fun! :)), then I suggest you invest in some books about the theory and practice of it. The dragon book is an excellent addition to any programmers bookshelf.

这篇关于编译器:了解从小程序生成的汇编代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆