局部变量的内存分配浪费 [英] Waste in memory allocation for local variables

查看:25
本文介绍了局部变量的内存分配浪费的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的程序:

void test_function(int a, int b, int c, int d){
    int flag;
    char buffer[10];

   flag = 31337;
   buffer[0] = 'A';
}

int main() {
    test_function(1, 2, 3, 4);
}

我用调试选项编译这个程序:

I compile this program with the debug option:

gcc -g my_program.c

我使用 gdb 并使用 intel 语法反汇编 test_function:

I use gdb and I disassemble the test_function with intel syntax:

(gdb) disassemble test_function
Dump of assembler code for function test_function:
0x08048344 <test_function+0>:   push   ebp
0x08048345 <test_function+1>:   mov    ebp,esp
0x08048347 <test_function+3>:   sub    esp,0x28
0x0804834a <test_function+6>:   mov    DWORD PTR [ebp-12],0x7a69
0x08048351 <test_function+13>:  mov    BYTE PTR [ebp-40],0x41
0x08048355 <test_function+17>:  leave  
0x08048356 <test_function+18>:  ret    
End of assembler dump.

然后我拆开主要的:

(gdb) disassemble main
Dump of assembler code for function main:
0x08048357 <main+0>:    push   ebp
0x08048358 <main+1>:    mov    ebp,esp
0x0804835a <main+3>:    sub    esp,0x18
0x0804835d <main+6>:    and    esp,0xfffffff0
0x08048360 <main+9>:    mov    eax,0x0
0x08048365 <main+14>:   sub    esp,eax
0x08048367 <main+16>:   mov    DWORD PTR [esp+12],0x4
0x0804836f <main+24>:   mov    DWORD PTR [esp+8],0x3
0x08048377 <main+32>:   mov    DWORD PTR [esp+4],0x2
0x0804837f <main+40>:   mov    DWORD PTR [esp],0x1
0x08048386 <main+47>:   call   0x8048344 <test_function>
0x0804838b <main+52>:   leave  
0x0804838c <main+53>:   ret    
End of assembler dump.

我在这个地址设置了一个断点:0x08048355(为 test_function 留下指令)然后我运行程序.

I place a breakpoint at this adresse: 0x08048355 (leave instruction for the test_function) and I run the program.

我是这样看堆栈的:

(gdb) x/16w $esp
0xbffff7d0:     0x00000041      0x08049548      0xbffff7e8      0x08048249
0xbffff7e0:     0xb7f9f729      0xb7fd6ff4      0xbffff818      0x00007a69
0xbffff7f0:     0xb7fd6ff4      0xbffff8ac      0xbffff818      0x0804838b
0xbffff800:     0x00000001      0x00000002      0x00000003      0x00000004

0x0804838b 是返回地址,0xbffff818 是保存的帧指针(主 ebp),标志变量存储 12 个字节.为什么是 12?

0x0804838b is the return adress, 0xbffff818 is the saved frame pointer (main ebp) and flag variable is stocked 12 bytes further. Why 12?

我不明白这个指令:

0x0804834a <test_function+6>:   mov    DWORD PTR [ebp-12],0x7a69

为什么我们不在 ebp-4 中存储内容的变量 0x00007a69 而不是 0xbffff8ac?

Why we don't stock the content's variable 0x00007a69 in ebp-4 instead of 0xbffff8ac?

关于缓冲区的同样问题.为什么是 40?

Same question for buffer. Why 40?

我们不浪费内存吗?0xb7fd6ff4 0xbffff8ac 和 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x08049548 0xbffff7e8 0x08048249 没用?

We don't waste the memory? 0xb7fd6ff4 0xbffff8ac and 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x08049548 0xbffff7e8 0x08048249 are not used?

这是命令 gcc -Q -v -g my_program.c 的输出:

Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/specs
Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext --enable-clocale=gnu --enable-debug i486-linux-gnu
Thread model: posix
gcc version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1)
 /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/cc1 -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6 notesearch.c -dumpbase notesearch.c -auxbase notesearch -g -version -o /tmp/ccGT0kTf.s
GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1) (i486-linux-gnu)
        compiled by GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1).
GGC heuristics: --param ggc-min-expand=99 --param ggc-min-heapsize=129473
options passed:  -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6
 -auxbase -g
options enabled:  -fpeephole -ffunction-cse -fkeep-static-consts
 -fpcc-struct-return -fgcse-lm -fgcse-sm -fsched-interblock -fsched-spec
 -fbranch-count-reg -fcommon -fgnu-linker -fargument-alias
 -fzero-initialized-in-bss -fident -fmath-errno -ftrapping-math -m80387
 -mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387
 -maccumulate-outgoing-args -mcpu=pentiumpro -march=i486
ignoring nonexistent directory "/usr/local/include/i486-linux-gnu"
ignoring nonexistent directory "/usr/i486-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i486-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/include
 /usr/include
End of search list.
 gnu_dev_major gnu_dev_minor gnu_dev_makedev stat lstat fstat mknod fatal ec_malloc dump main print_notes find_user_note search_note
Execution times (seconds)
 preprocessing         :   0.00 ( 0%) usr   0.01 (25%) sys   0.00 ( 0%) wall
 lexical analysis      :   0.00 ( 0%) usr   0.01 (25%) sys   0.00 ( 0%) wall
 parser                :   0.02 (100%) usr   0.01 (25%) sys   0.00 ( 0%) wall
 TOTAL                 :   0.02             0.04             0.00
 as -V -Qy -o /tmp/ccugTYeu.o /tmp/ccGT0kTf.s
GNU assembler version 2.17.50 (i486-linux-gnu) using BFD version 2.17.50 20070103 Ubuntu
 /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crt1.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crti.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtbegin.o -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6 -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../.. /tmp/ccugTYeu.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtend.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crtn.o

注意:我读过这本书剥削的艺术",我使用随书提供的虚拟机.

NOTE: I read the book "The art of exploitation" and I use the VM provides with the book.

推荐答案

编译器试图在堆栈上保持 16 字节对齐.如今,这也适用于 32 位代码(不仅仅是 64 位).这个想法是,在执行 CALL 指令之前,堆栈必须与 16 字节边界对齐.

The compiler is trying to maintain 16 byte alignment on the stack. This also applies to 32-bit code these days (not just 64-bit). The idea is that at the point before executing a CALL instruction the stack must be aligned to a 16-byte boundary.

因为您编译时没有优化,所以存在一些无关的指令.

Because you compiled with no optimizations there are some extraneous instructions.

0x0804835a <main+3>:    sub    esp,0x18        ; Allocate local stack space
0x0804835d <main+6>:    and    esp,0xfffffff0  ; Ensure `main` has a 16 byte aligned stack
0x08048360 <main+9>:    mov    eax,0x0         ; Extraneous, not needed
0x08048365 <main+14>:   sub    esp,eax         ; Extraneous, not needed

ESP 现在在上面的最后一条指令之后是 16 字节对齐的.我们从堆栈顶部的 ESP 开始移动调用的参数.这是通过以下方式完成的:

ESP is now 16-byte aligned after the last instruction above. We move the parameters for the call starting at the top of the stack at ESP. That is done with:

0x08048367 <main+16>:   mov    DWORD PTR [esp+12],0x4
0x0804836f <main+24>:   mov    DWORD PTR [esp+8],0x3
0x08048377 <main+32>:   mov    DWORD PTR [esp+4],0x2
0x0804837f <main+40>:   mov    DWORD PTR [esp],0x1

CALL 然后将一个 4 字节的返回地址压入堆栈.然后我们在调用后得到这些指令:

The CALL then pushes a 4 byte return address on the stack. We then reach these instructions after the call:

0x08048344 <test_function+0>:   push   ebp     ; 4 bytes pushed on stack
0x08048345 <test_function+1>:   mov    ebp,esp ; Setup stackframe

这会将另外 4 个字节压入堆栈.使用返回地址的 4 个字节,我们现在错位了 8 个字节.为了再次达到 16 字节对齐,我们需要在堆栈上浪费额外的 8 个字节.这就是为什么在此语句中分配了额外的 8 个字节:

This pushes another 4 bytes on the stack. With the 4 bytes from the return address we are now misaligned by 8 bytes. To reach 16-byte alignment again we will need to waste an additional 8 bytes on the stack. That is why in this statement there is an additional 8 bytes allocated:

0x08048347 <test_function+3>:   sub    esp,0x28

  • 由于返回地址(4 字节)和 EBP(4 字节),堆栈中已有 0x08 字节
  • 将堆栈对齐回 16 字节对齐需要 0x08 字节的填充
  • 局部变量分配需要 0x20 字节 = 32 字节.32/16 可以被 16 整除,因此保持对齐
    • 0x08 bytes already on stack because of return address(4-bytes) and EBP(4 bytes)
    • 0x08 bytes of padding needed to align stack back to 16-byte alignment
    • 0x20 bytes needed for local variable allocation = 32 bytes. 32/16 is evenly divisible by 16 so alignment maintained
    • 上面的第二个和第三个数字相加是编译器计算出来的值 0x28,用于 sub esp,0x28.

      The second and third number above added together is the value 0x28 computed by the compiler and used in sub esp,0x28.

      0x0804834a <test_function+6>:   mov    DWORD PTR [ebp-12],0x7a69
      

      那么为什么在这条指令中使用 [ebp-12] 呢?前 8 个字节 [ebp-8][ebp-1] 是用于使堆栈对齐 16 字节的对齐字节.之后,本地数据将出现在堆栈上.在这种情况下,[ebp-12][ebp-9] 是 32 位整数 flag 的 4 个字节.

      So why [ebp-12] in this instruction? The first 8 bytes [ebp-8] through [ebp-1] are the alignment bytes used to get the stack 16-byte aligned. The local data will then appear on the stack after that. In this case [ebp-12] through [ebp-9] are the 4 bytes for the 32-bit integer flag.

      然后我们用字符'A'更新 buffer[0]:

      Then we have this for updating buffer[0] with the character 'A':

      0x08048351 <test_function+13>:  mov    BYTE PTR [ebp-40],0x41
      

      那么奇怪的是为什么从 [ebp+40](数组的开头)到 [ebp+13] 会出现一个 10 字节的字符数组,即28 字节.我能做出的最好猜测是编译器认为它可以将 10 字节字符数组视为 128 位(16 字节)向量.这将强制编译器在 16 字节边界上对齐缓冲区,并将数组填充到 16 字节(128 位).从编译器的角度来看,您的代码的行为似乎很像它被定义为:

      The oddity then would be why a 10 byte array of characters would appear from [ebp+40](beginning of array) to [ebp+13] which is 28 bytes. The best guess I can make is that compiler felt that it could treat the 10 byte character array as a 128-bit (16-byte) vector. This would force the compiler to align the buffer on a 16 byte boundary, and pad the array out to 16 bytes (128-bits). From the perspective of the compiler, your code seems to be acting much like it was defined as:

      #include <xmmintrin.h>
      void test_function(int a, int b, int c, int d){
          int flag;
          union {
              char buffer[10];
              __m128 m128buffer;      ; 16-byte variable that needs to be 16-bytes aligned
          } bufu;
      
         flag = 31337;
         bufu.buffer[0] = 'A';
      }
      

      GodBolt for GCC 4.9.0 上的输出使用 SSE2 生成 32 位代码> enabled 显示如下:

      The output on GodBolt for GCC 4.9.0 generating 32-bit code with SSE2 enabled appears as follows:

      test_function:
              push    ebp     #
              mov     ebp, esp  #, 
              sub     esp, 40   #,same as: sub esp,0x28
              mov     DWORD PTR [ebp-12], 31337 # flag,
              mov     BYTE PTR [ebp-40], 65     # bufu.buffer,
              leave
              ret
      

      这看起来与您在 GDB 中的反汇编非常相似.

      This looks very similar to your disassembly in GDB.

      如果您使用优化(例如 -O1-O2-O3)进行编译,优化器可能会简化 test_function 因为它是您示例中的叶函数.叶函数是不调用另一个函数的函数.编译器可能已经应用了某些快捷方式.

      If you compiled with optimizations (such as -O1, -O2, -O3), the optimizer could have simplified test_function because it is a leaf function in your example. A leaf function is one that doesn't call another function. Certain shortcuts could have been applied by the compiler.

      至于为什么字符数组似乎对齐到 16 字节边界并填充为 16 字节?在我们知道您使用的 GCC 编译器(gcc --version 会告诉您)之前,这可能无法确定地回答.了解您的操作系统和操作系统版本也很有用.更好的是将此命令的输出添加到您的问题 gcc -Q -v -g my_program.c

      As for why the character array seems to be aligned to a 16-byte boundary and padded to be 16 bytes? That probably can't be answered with certainty until we know what GCC compiler you are using (gcc --version will tell you). It would also be useful to know your OS and OS version. Even better would be to add the output from this command to your question gcc -Q -v -g my_program.c

      这篇关于局部变量的内存分配浪费的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆