理解一个简单的C程序生成的汇编代码 [英] Understand the assembly code generated by a simple C program

查看:20
本文介绍了理解一个简单的C程序生成的汇编代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过使用 gdb 的反汇编程序检查一个简单的 C 程序的汇编级代码.

I am trying to understand the assembly level code for a simple C program by inspecting it with gdb's disassembler.

以下是C代码:

#include <stdio.h>

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

void main() {
  function(1,2,3);
}

以下是 mainfunction

gdb) disass main
Dump of assembler code for function main:
0x08048428 <main+0>:    push   %ebp
0x08048429 <main+1>:    mov    %esp,%ebp
0x0804842b <main+3>:    and    $0xfffffff0,%esp
0x0804842e <main+6>:    sub    $0x10,%esp
0x08048431 <main+9>:    movl   $0x3,0x8(%esp)
0x08048439 <main+17>:   movl   $0x2,0x4(%esp)
0x08048441 <main+25>:   movl   $0x1,(%esp)
0x08048448 <main+32>:   call   0x8048404 <function>
0x0804844d <main+37>:   leave  
0x0804844e <main+38>:   ret
End of assembler dump.

(gdb) disass function
Dump of assembler code for function function:
0x08048404 <function+0>:    push   %ebp
0x08048405 <function+1>:    mov    %esp,%ebp
0x08048407 <function+3>:    sub    $0x28,%esp
0x0804840a <function+6>:    mov    %gs:0x14,%eax
0x08048410 <function+12>:   mov    %eax,-0xc(%ebp)
0x08048413 <function+15>:   xor    %eax,%eax
0x08048415 <function+17>:   mov    -0xc(%ebp),%eax
0x08048418 <function+20>:   xor    %gs:0x14,%eax
0x0804841f <function+27>:   je     0x8048426 <function+34>
0x08048421 <function+29>:   call   0x8048340 <__stack_chk_fail@plt>
0x08048426 <function+34>:   leave  
0x08048427 <function+35>:   ret    
End of assembler dump.

我正在寻找以下问题的答案:

I am seeking answers for following things :

  1. 寻址是如何工作的,我的意思是 (main+0) , (main+1), (main+3)
  2. 主要是为什么要使用 $0xffffffff0,%esp
  3. 在函数中,为什么要使用 %gs:0x14,%eax , %eax,-0xc(%ebp).
  4. 如果有人可以解释,一步一步发生,那将不胜感激.

推荐答案

main+0main+1main+1等奇怪"地址的原因code>main+3、main+6 等等,是因为每条指令占用的字节数是可变的.例如:

The reason for the "strange" addresses such as main+0, main+1, main+3, main+6 and so on, is because each instruction takes up a variable number of bytes. For example:

main+0: push %ebp

是一个单字节指令,因此下一条指令位于 main+1.另一方面,

is a one-byte instruction so the next instruction is at main+1. On the other hand,

main+3: and $0xfffffff0,%esp

是一个三字节指令,因此之后的下一条指令位于 main+6.

is a three-byte instruction so the next instruction after that is at main+6.

而且,既然您在评论中问为什么 movl 似乎采用可变数量的字节,对此的解释如下.

And, since you ask in the comments why movl seems to take a variable number of bytes, the explanation for that is as follows.

指令长度不仅取决于操作码(如movl),还取决于操作数的寻址方式(事物操作码正在运行).我没有专门检查你的代码,但我怀疑

Instruction length depends not only on the opcode (such as movl) but also the addressing modes for the operands as well (the things the opcode are operating on). I haven't checked specifically for your code but I suspect the

movl $0x1,(%esp)

指令可能更短,因为不涉及偏移量 - 它只是使用 esp 作为地址.而类似的东西:

instruction is probably shorter because there's no offset involved - it just uses esp as the address. Whereas something like:

movl $0x2,0x4(%esp)

需要 movl $0x1,(%esp) 所做的一切,加上一个额外的字节作为偏移量 0x4.

requires everything that movl $0x1,(%esp) does, plus an extra byte for the offset 0x4.

事实上,这是一个调试会话,显示了我的意思:

In fact, here's a debug session showing what I mean:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

c:pax> debug
-a
0B52:0100 mov word ptr [di],7
0B52:0104 mov word ptr [di+2],8
0B52:0109 mov word ptr [di+0],7
0B52:010E
-u100,10d
0B52:0100 C7050700      MOV     WORD PTR [DI],0007
0B52:0104 C745020800    MOV     WORD PTR [DI+02],0008
0B52:0109 C745000700    MOV     WORD PTR [DI+00],0007
-q
c:pax> _

您可以看到第二条带有偏移量的指令实际上与没有它的第一条指令不同.它比 c745 长一个字节(5 个字节而不是 4 个字节来保存偏移量)并且实际上具有不同的编码 c705.

You can see that the second instruction with an offset is actually different to the first one without it. It's one byte longer (5 bytes instead of 4, to hold the offset) and actually has a different encoding c745 instead of c705.

您还可以看到,您可以用两种不同的方式对第一条指令和第三条指令进行编码,但它们的作用基本相同.

You can also see that you can encode the first and third instruction in two different ways but they basically do the same thing.

and $0xfffffff0,%esp 指令是一种强制esp 在特定边界上的方法.这用于确保变量的正确对齐.如果现代处理器上的许多内存访问遵循对齐规则(例如 4 字节值必须与 4 字节边界对齐),它们的效率会更高.如果您不遵守这些规则,一些现代处理器甚至会引发故障.

The and $0xfffffff0,%esp instruction is a way to force esp to be on a specific boundary. This is used to ensure proper alignment of variables. Many memory accesses on modern processors will be more efficient if they follow the alignment rules (such as a 4-byte value having to be aligned to a 4-byte boundary). Some modern processors will even raise a fault if you don't follow these rules.

在此指令之后,您可以确保 esp 小于或等于其先前的值并且 对齐到 16 字节边界.

After this instruction, you're guaranteed that esp is both less than or equal to its previous value and aligned to a 16 byte boundary.

gs: 前缀仅仅意味着使用 gs 段寄存器来访问内存,而不是默认的.

The gs: prefix simply means to use the gs segment register to access memory rather than the default.

指令mov %eax,-0xc(%ebp)表示取ebp寄存器的内容,减12(0xc) 然后将 eax 的值放入该内存位置.

The instruction mov %eax,-0xc(%ebp) means to take the contents of the ebp register, subtract 12 (0xc) and then put the value of eax into that memory location.

重新解释代码.您的 function 函数基本上是一个大空操作.生成的程序集仅限于堆栈帧设置和拆卸,以及一些使用上述 %gs:14 内存位置的堆栈帧损坏检查.

Re the explanation of the code. Your function function is basically one big no-op. The assembly generated is limited to stack frame setup and teardown, along with some stack frame corruption checking which uses the afore-mentioned %gs:14 memory location.

它将来自该位置的值(可能类似于 0xdeadbeef)加载到堆栈帧中,完成它的工作,然后检查堆栈以确保它没有被损坏.

It loads the value from that location (probably something like 0xdeadbeef) into the stack frame, does its job, then checks the stack to ensure it hasn't been corrupted.

在这种情况下,它的工作是什么.所以你看到的只是功能管理的东西.

Its job, in this case, is nothing. So all you see is the function administration stuff.

堆栈设置发生在 function+0function+12 之间.之后的一切都是在 eax 中设置返回码并拆除堆栈帧,包括损坏检查.

Stack set-up occurs between function+0 and function+12. Everything after that is setting up the return code in eax and tearing down the stack frame, including the corruption check.

类似地,main 包括栈帧的设置、function 的参数推送、function 的调用、栈帧的拆除和退出.

Similarly, main consist of stack frame set-up, pushing the parameters for function, calling function, tearing down the stack frame and exiting.

注释已插入以下代码:

0x08048428 <main+0>:    push   %ebp                 ; save previous value.
0x08048429 <main+1>:    mov    %esp,%ebp            ; create new stack frame.
0x0804842b <main+3>:    and    $0xfffffff0,%esp     ; align to boundary.
0x0804842e <main+6>:    sub    $0x10,%esp           ; make space on stack.

0x08048431 <main+9>:    movl   $0x3,0x8(%esp)       ; push values for function.
0x08048439 <main+17>:   movl   $0x2,0x4(%esp)
0x08048441 <main+25>:   movl   $0x1,(%esp)
0x08048448 <main+32>:   call   0x8048404 <function> ; and call it.

0x0804844d <main+37>:   leave                       ; tear down frame.
0x0804844e <main+38>:   ret                         ; and exit.

0x08048404 <func+0>:    push   %ebp                 ; save previous value.
0x08048405 <func+1>:    mov    %esp,%ebp            ; create new stack frame.
0x08048407 <func+3>:    sub    $0x28,%esp           ; make space on stack.
0x0804840a <func+6>:    mov    %gs:0x14,%eax        ; get sentinel value.
0x08048410 <func+12>:   mov    %eax,-0xc(%ebp)      ; put on stack.

0x08048413 <func+15>:   xor    %eax,%eax            ; set return code 0.

0x08048415 <func+17>:   mov    -0xc(%ebp),%eax      ; get sentinel from stack.
0x08048418 <func+20>:   xor    %gs:0x14,%eax        ; compare with actual.
0x0804841f <func+27>:   je     <func+34>            ; jump if okay.
0x08048421 <func+29>:   call   <_stk_chk_fl>        ; otherwise corrupted stack.
0x08048426 <func+34>:   leave                       ; tear down frame.
0x08048427 <func+35>:   ret                         ; and exit.

<小时>

我认为 %gs:0x14 的原因可能从上面很明显,但以防万一,我会在这里详细说明.


I think the reason for the %gs:0x14 may be evident from above but, just in case, I'll elaborate here.

它使用这个值(一个哨兵)来放入当前的堆栈帧,这样,如果函数中的某些东西做一些愚蠢的事情,比如将 1024 字节写入堆栈上创建的 20 字节数组,或者在你的情况下:

It uses this value (a sentinel) to put in the current stack frame so that, should something in the function do something silly like write 1024 bytes to a 20-byte array created on the stack or, in your case:

char buffer1[5];
strcpy (buffer1, "Hello there, my name is Pax.");

然后哨兵将被覆盖,函数末尾的检查会检测到,调用失败函数让你知道,然后可能会中止以避免任何其他问题.

then the sentinel will be overwritten and the check at the end of the function will detect that, calling the failure function to let you know, and then probably aborting so as to avoid any other problems.

如果它把 0xdeadbeef 放到堆栈上,然后把它改成别的东西,那么带有 0xdeadbeefxor 会产生一个非零使用 je 指令在代码中检测到的值.

If it placed 0xdeadbeef onto the stack and this was changed to something else, then an xor with 0xdeadbeef would produce a non-zero value which is detected in the code with the je instruction.

此处转述相关部分:

          mov    %gs:0x14,%eax     ; get sentinel value.
          mov    %eax,-0xc(%ebp)   ; put on stack.

          ;; Weave your function
          ;;   magic here.

          mov    -0xc(%ebp),%eax   ; get sentinel back from stack.
          xor    %gs:0x14,%eax     ; compare with original value.
          je     stack_ok          ; zero/equal means no corruption.
          call   stack_bad         ; otherwise corrupted stack.
stack_ok: leave                    ; tear down frame.

这篇关于理解一个简单的C程序生成的汇编代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆