理解一个简单的C程序产生的汇编code [英] Understand the assembly code generated by a simple C program

查看:875
本文介绍了理解一个简单的C程序产生的汇编code的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将尝试使用gdb的反汇编器在检查了解装配水平code为一个简单的C程序。

以下是C code:

 的#include<&stdio.h中GT;void函数(INT A,INT B,INT C){
   炭缓冲器1 [5];
   炭缓冲器2 [10];
}无效的主要(){
  函数(1,2,3);
}

以下是拆卸code为函数

  GDB)disass主
汇编code的转储为主要功能:
0x08048428<主+ 0计算值:推%EBP
0x08048429&​​lt;主+ 1计算值:MOV%ESP,EBP%
0x0804842b<主+3 ;:和$ 0xfffffff0,ESP%
0x0804842e<主+ 6计算值:子$ 0×10,ESP%
0x08048431<主+ 9计算值:MOVL $ 0x3,0x8(%ESP)
0x08048439<主+ 17计算值:MOVL $ 0x2,0x4(%ESP)
0x08048441<主+ 25计算值:MOVL $为0x1(%ESP)
0x08048448<主+ 32计算值:调用0x8048404<作用>
0x0804844d<主+ 37计算值:离开
0x0804844e<主+ 38计算值:RET
汇编转储结束。(GDB)disass功能
汇编code的功能函数的转储:
0x08048404<作用+ 0计算值:推%EBP
0x08048405<作用+ 1计算值:MOV%ESP,EBP%
0x08048407<作用+3 ;:子$ 0x28,%尤
0x0804840a<作用+ 6计算值:MOV%GS:0x14的,EAX%
0x08048410<作用+ 12计算值:MOV EAX%,-0xc(EBP%)
0x08048413<作用+ 15计算值:XOR%eax中,EAX%
0x08048415<作用+ 17计算值:MOV -0xc(EBP%),%EAX
0x08048418<作用+ 20计算值:XOR%GS:0x14的,EAX%
0x0804841f<作用+ 27计算值:JE 0x8048426<作用+ 34>
0x08048421<作用+ 29计算值:调用0x8048340< __ stack_chk_fail @ PLT>
0x08048426<作用+ 34计算值:离开
0x08048427<作用+ 35计算值:RET
汇编转储结束。

我寻求答案了以下几件事:


  1. 如何工作的解决,我的意思是(主+ 0),(主+ 1),(主+ 3)

  2. 在主,为什么$ 0xfffffff0,尤其在使用%

  3. 在功能,为何%葡萄糖:0×14,%eax中,EAX%,-0xc(EBP%)正在使用

  4. 如果有人可以解释,一步一步发生的事情,这将大大AP preciated。


解决方案

究其原因,奇怪的地址,如主+ 0 主+ 1 主+ 3 主+ 6 等,是因为每个指令占用的可变字节数。例如:

 主+ 0:推%EBP

是一个单字节指令所以接下来的指令在主+ 1 。另一方面,

 主+ 3:$和0xfffffff0,ESP%

是一个三字节指令,以便之后的下一个指令在主+ 6

和,因为你在评论中问为什么 MOVL 似乎采取了可变字节数,为的解释如下:

指令长度不仅取决于对的运算code 的(如 MOVL ),但也为寻址模式中的操作数的以及(东西运code正在操作的)。我没有你的code特意查,但我怀疑

  MOVL $为0x1(%ESP)

指令可能是短,因为没有偏移参与 - 它只是使用尤其作为地址。尽管是这样的:

  MOVL $ 0x2,0x4(%ESP)

需要的一切, MOVL $为0x1(%ESP)确实,的以及的为偏移为0x4一个额外的字节

其实,这里有一个调试会话展示了我的意思:

 的Microsoft Windows XP [版本5.1.2600]
(C)版权所有1985-2001微软公司C:\\ PAX>调试
-一个
0B52:0100 MOV字PTR [二],7
0B52:0104 MOV字PTR [DI + 2],8
0B52:0109 MOV字ptr的[二+ 0],7
0B52:010E
-u100,10d
0B52:0100 C7050700 MOV WORD PTR [DI],0007
0B52:0104 C745020800 MOV WORD PTR [DI + 02],0008
0B52:0109 C745000700 MOV WORD PTR [DI + 00],0007
-q
C:\\ PAX> _

您可以看到,偏移第二条指令是第一个没有它实际上是不同的。这是一个字节长(5个字节,而不是4,保持偏移),实际上有不同的编码 C745 而不是 C705

您还可以看到你可以连接code两种不同的方式,但他们基本上做同样的事情第一个和第三个指令。


和$ 0xfffffff0,%ESP 指令是一种强制尤其是一个特定的边界上。这用来确保变量正确对准。许多内存现代处理器的访问,如果他们遵循(具有对齐到4字节边界,如一个4字节的值)排列规则会更有效率。一些现代的处理器甚至会引发故障,如果你不遵循这些规则。

这个指令后,你保证尤其既是小于或等于其previous值的的对齐到16字节边界。


GS: preFIX只是意味着使用 GS 段寄存器访问内存,而不是默认值。

指令 MOV EAX%,-0xc(EBP%)表示取 EBP 寄存器的内容,减去12(位于0xC ),然后把 EAX 的值复制到该内存位置。


重新在code的说明。你的函数的功能基本上是一个大空操作。生成的程序集是有限的堆栈帧的安装和拆卸,与使用前面提到的部分堆栈帧校验腐败沿%GS:14 内存位置

它加载从该位置(大概就像 0xdeadbeef )插入到堆栈帧的价值,它的工作,然后检查堆栈以确保它没有被破坏

它的工作,在这种情况下,没有什么。因此,所有你看到的是功能管理的东西。

之间发生堆栈建立功能+ 0 函数+ 12 。之后,这一切都是建立返回code在 EAX 拆了堆栈帧,包括腐败检查。

同样,包括栈帧设置,推动参数<$​​ C $ C>函数,要求函数,拆除堆栈帧和退出。

评论已被插入到下面的code:

  0x08048428&lt;主+ 0计算值:推%EBP;节省previous值。
0x08048429&​​lt;主+ 1计算值:MOV%ESP,EBP%;创建新的堆栈帧。
0x0804842b&lt;主+3 ;:和$ 0xfffffff0,%ESP;对齐边界。
0x0804842e&lt;主+ 6计算值:子$ 0×10,ESP%;就堆栈空间。0x08048431&lt;主+ 9计算值:MOVL $ 0x3,0x8(%ESP);推值的功能。
0x08048439&lt;主+ 17计算值:MOVL $ 0x2,0x4(%ESP)
0x08048441&lt;主+ 25计算值:MOVL $为0x1(%ESP)
0x08048448&lt;主+ 32计算值:调用0x8048404&lt;作用&GT; ;并调用它。0x0804844d&lt;主+ 37计算值:离开;推倒框架。
0x0804844e&lt;主+ 38计算值:RET;并退出。0x08048404&LT;功能+ 0计算值:推%EBP;节省previous值。
0x08048405&LT;功能+ 1计算值:MOV%ESP,EBP%;创建新的堆栈帧。
0x08048407&所述;功能+ 3计算值:子$ 0x28,%尤;就堆栈空间。
0x0804840a&LT;功能+ 6计算值:MOV%GS:0×14,%eax中;获得定点值。
0x08048410&LT;功能+ 12计算值:MOV EAX%,-0xc(EBP%);把堆栈。0x08048413&LT;功能+ 15计算值:XOR%eax中,EAX%;集中返回code 0。0x08048415&LT;功能+ 17计算值:MOV -0xc(EBP%),%EAX;从堆栈中获取前哨。
0x08048418&LT;功能+ 20计算值:XOR%GS:0×14,%eax中;实际比较。
0x0804841f&LT;功能+ 27计算值:JE&LT;功能+ 34 GT; ;如果跳好。
0x08048421&LT;功能+ 29计算值:调用&LT; _stk_chk_fl&GT; ;否则损坏的堆栈。
0x08048426&LT;功能+ 34计算值:离开;推倒框架。
0x08048427&LT;功能+ 35计算值:RET;并退出。


我觉得对于的原因%葡萄糖:0×14 可从上面,但为了以防万一,我会在这里详述是显而易见的。

使用这个值(定点)摆在当前堆栈帧,这样,要什么东西在功能做一些愚蠢像写1024个字节的栈上创建了一个20字节数组或,你的情况:

 的char缓冲器1 [5];
的strcpy(缓冲器1,你好,我的名字是大同。);

然后前哨将被覆盖,并在函数结束时检查会检测到,调用失败功能,让你知道,然后很可能中止,从而避免任何其他问题。

如果它放在 0xdeadbeef 入堆栈,这被改为别的东西,那么 XOR 0xdeadbeef 将产生这是在code检测到一个非零值 JE 指令。

相关位的位置转述:

  MOV%GS:0×14,%eax中;获得定点值。
          MOV EAX%,-0xc(EBP%);把堆栈。          ;;编织你的功能
          ;;魔术在这里。          MOV -0xc(EBP%),%EAX;获得定点从堆栈回来。
          XOR%GS:0×14,%eax中;原装值进行比较。
          JE stack_ok;零/等于表示没有腐败。
          调用stack_bad;否则损坏的堆栈。
stack_ok:离开;推倒框架。

I am trying to understand the assembly level code for a simple C program by inspecting it with gdb's disassembler.

Following is the C code:

#include <stdio.h>

void function(int a, int b, int c) {
   char buffer1[5];
   char buffer2[10];
}

void main() {
  function(1,2,3);
}

Following is the disassembly code for both main and function

gdb) disass main
Dump of assembler code for function main:
0x08048428 <main+0>:    push   %ebp
0x08048429 <main+1>:    mov    %esp,%ebp
0x0804842b <main+3>:    and    $0xfffffff0,%esp
0x0804842e <main+6>:    sub    $0x10,%esp
0x08048431 <main+9>:    movl   $0x3,0x8(%esp)
0x08048439 <main+17>:   movl   $0x2,0x4(%esp)
0x08048441 <main+25>:   movl   $0x1,(%esp)
0x08048448 <main+32>:   call   0x8048404 <function>
0x0804844d <main+37>:   leave  
0x0804844e <main+38>:   ret
End of assembler dump.

(gdb) disass function
Dump of assembler code for function function:
0x08048404 <function+0>:    push   %ebp
0x08048405 <function+1>:    mov    %esp,%ebp
0x08048407 <function+3>:    sub    $0x28,%esp
0x0804840a <function+6>:    mov    %gs:0x14,%eax
0x08048410 <function+12>:   mov    %eax,-0xc(%ebp)
0x08048413 <function+15>:   xor    %eax,%eax
0x08048415 <function+17>:   mov    -0xc(%ebp),%eax
0x08048418 <function+20>:   xor    %gs:0x14,%eax
0x0804841f <function+27>:   je     0x8048426 <function+34>
0x08048421 <function+29>:   call   0x8048340 <__stack_chk_fail@plt>
0x08048426 <function+34>:   leave  
0x08048427 <function+35>:   ret    
End of assembler dump.

I am seeking answers for following things :

  1. how the addressing is working , I mean (main+0) , (main+1), (main+3)
  2. In the main, why is $0xfffffff0,%esp being used
  3. In the function, why is %gs:0x14,%eax , %eax,-0xc(%ebp) being used.
  4. If someone can explain , step by step happening, that will be greatly appreciated.

解决方案

The reason for the "strange" addresses such as main+0, main+1, main+3, main+6 and so on, is because each instruction takes up a variable number of bytes. For example:

main+0: push %ebp

is a one-byte instruction so the next instruction is at main+1. On the other hand,

main+3: and $0xfffffff0,%esp

is a three-byte instruction so the next instruction after that is at main+6.

And, since you ask in the comments why movl seems to take a variable number of bytes, the explanation for that is as follows.

Instruction length depends not only on the opcode (such as movl) but also the addressing modes for the operands as well (the things the opcode are operating on). I haven't checked specifically for your code but I suspect the

movl $0x1,(%esp)

instruction is probably shorter because there's no offset involved - it just uses esp as the address. Whereas something like:

movl $0x2,0x4(%esp)

requires everything that movl $0x1,(%esp) does, plus an extra byte for the offset 0x4.

In fact, here's a debug session showing what I mean:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

c:\pax> debug
-a
0B52:0100 mov word ptr [di],7
0B52:0104 mov word ptr [di+2],8
0B52:0109 mov word ptr [di+0],7
0B52:010E
-u100,10d
0B52:0100 C7050700      MOV     WORD PTR [DI],0007
0B52:0104 C745020800    MOV     WORD PTR [DI+02],0008
0B52:0109 C745000700    MOV     WORD PTR [DI+00],0007
-q
c:\pax> _

You can see that the second instruction with an offset is actually different to the first one without it. It's one byte longer (5 bytes instead of 4, to hold the offset) and actually has a different encoding c745 instead of c705.

You can also see that you can encode the first and third instruction in two different ways but they basically do the same thing.


The and $0xfffffff0,%esp instruction is a way to force esp to be on a specific boundary. This is used to ensure proper alignment of variables. Many memory accesses on modern processors will be more efficient if they follow the alignment rules (such as a 4-byte value having to be aligned to a 4-byte boundary). Some modern processors will even raise a fault if you don't follow these rules.

After this instruction, you're guaranteed that esp is both less than or equal to its previous value and aligned to a 16 byte boundary.


The gs: prefix simply means to use the gs segment register to access memory rather than the default.

The instruction mov %eax,-0xc(%ebp) means to take the contents of the ebp register, subtract 12 (0xc) and then put the value of eax into that memory location.


Re the explanation of the code. Your function function is basically one big no-op. The assembly generated is limited to stack frame setup and teardown, along with some stack frame corruption checking which uses the afore-mentioned %gs:14 memory location.

It loads the value from that location (probably something like 0xdeadbeef) into the stack frame, does its job, then checks the stack to ensure it hasn't been corrupted.

Its job, in this case, is nothing. So all you see is the function administration stuff.

Stack set-up occurs between function+0 and function+12. Everything after that is setting up the return code in eax and tearing down the stack frame, including the corruption check.

Similarly, main consist of stack frame set-up, pushing the parameters for function, calling function, tearing down the stack frame and exiting.

Comments have been inserted into the code below:

0x08048428 <main+0>:    push   %ebp                 ; save previous value.
0x08048429 <main+1>:    mov    %esp,%ebp            ; create new stack frame.
0x0804842b <main+3>:    and    $0xfffffff0,%esp     ; align to boundary.
0x0804842e <main+6>:    sub    $0x10,%esp           ; make space on stack.

0x08048431 <main+9>:    movl   $0x3,0x8(%esp)       ; push values for function.
0x08048439 <main+17>:   movl   $0x2,0x4(%esp)
0x08048441 <main+25>:   movl   $0x1,(%esp)
0x08048448 <main+32>:   call   0x8048404 <function> ; and call it.

0x0804844d <main+37>:   leave                       ; tear down frame.
0x0804844e <main+38>:   ret                         ; and exit.

0x08048404 <func+0>:    push   %ebp                 ; save previous value.
0x08048405 <func+1>:    mov    %esp,%ebp            ; create new stack frame.
0x08048407 <func+3>:    sub    $0x28,%esp           ; make space on stack.
0x0804840a <func+6>:    mov    %gs:0x14,%eax        ; get sentinel value.
0x08048410 <func+12>:   mov    %eax,-0xc(%ebp)      ; put on stack.

0x08048413 <func+15>:   xor    %eax,%eax            ; set return code 0.

0x08048415 <func+17>:   mov    -0xc(%ebp),%eax      ; get sentinel from stack.
0x08048418 <func+20>:   xor    %gs:0x14,%eax        ; compare with actual.
0x0804841f <func+27>:   je     <func+34>            ; jump if okay.
0x08048421 <func+29>:   call   <_stk_chk_fl>        ; otherwise corrupted stack.
0x08048426 <func+34>:   leave                       ; tear down frame.
0x08048427 <func+35>:   ret                         ; and exit.


I think the reason for the %gs:0x14 may be evident from above but, just in case, I'll elaborate here.

It uses this value (a sentinel) to put in the current stack frame so that, should something in the function do something silly like write 1024 bytes to a 20-byte array created on the stack or, in your case:

char buffer1[5];
strcpy (buffer1, "Hello there, my name is Pax.");

then the sentinel will be overwritten and the check at the end of the function will detect that, calling the failure function to let you know, and then probably aborting so as to avoid any other problems.

If it placed 0xdeadbeef onto the stack and this was changed to something else, then an xor with 0xdeadbeef would produce a non-zero value which is detected in the code with the je instruction.

The relevant bit is paraphrased here:

          mov    %gs:0x14,%eax     ; get sentinel value.
          mov    %eax,-0xc(%ebp)   ; put on stack.

          ;; Weave your function
          ;;   magic here.

          mov    -0xc(%ebp),%eax   ; get sentinel back from stack.
          xor    %gs:0x14,%eax     ; compare with original value.
          je     stack_ok          ; zero/equal means no corruption.
          call   stack_bad         ; otherwise corrupted stack.
stack_ok: leave                    ; tear down frame.

这篇关于理解一个简单的C程序产生的汇编code的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆