每个功能的ASLR随机化会有所不同吗? [英] Can ASLR randomization be different per function?

查看:112
本文介绍了每个功能的ASLR随机化会有所不同吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码段:

#include <inttypes.h>
#include <stdio.h>

uint64_t
esp_func(void)
{
  __asm__("movl %esp, %eax");
}

int
main()
{
  uint32_t esp = 0;

  __asm__("\t movl %%esp,%0" : "=r"(esp));

  printf("esp: 0x%08x\n", esp);
  printf("esp: 0x%08lx\n", esp_func());
  return 0;
}

在多次执行时会打印以下内容:

Which prints the following upon multiple executions:

❯ clang -g  esp.c && ./a.out
esp: 0xbd3b7670
esp: 0x7f8c1c2c5140

❯ clang -g  esp.c && ./a.out
esp: 0x403c9040
esp: 0x7f9ee8bd8140

❯ clang -g  esp.c && ./a.out
esp: 0xb59b70f0
esp: 0x7fe301f8c140

❯ clang -g  esp.c && ./a.out
esp: 0x6efa4110
esp: 0x7fd95941f140

❯ clang -g  esp.c && ./a.out
esp: 0x144e72b0
esp: 0x7f246d4ef140

esp_func表明ASLR具有28位的熵,这在我的现代Linux内核上很有意义.

esp_func shows that ASLR is active with 28 bits of entropy, which makes sense on my modern Linux kernel.

第一个值没有意义:为什么它大不相同?

What doesn't make sense is the first value: why is it drastically different?

我看了看组装,看起来很奇怪...

I took a look at the assembly and it looks weird...

// From main
0x00001150      55             push rbp
0x00001151      4889e5         mov rbp, rsp
0x00001154      4883ec10       sub rsp, 0x10
0x00001158      c745fc000000.  mov dword [rbp-0x4], 0
0x0000115f      c745f8000000.  mov dword [rbp-0x8], 0
0x00001166      89e0           mov eax, esp            ; Move esp to eax
0x00001168      8945f8         mov dword [rbp-0x8], eax ; Assign eax to my variable `esp`
0x0000116b      8b75f8         mov esi, dword [rbp-0x8]
0x0000116e      488d3d8f0e00.  lea rdi, [0x00002004]
0x00001175      b000           mov al, 0
0x00001177      e8b4feffff     call sym.imp.printf     ; For whatever reason, the value in [rbp-0x8]
                                                       ; is assigned here. Why?


// From esp_func
0x00001140      55             push rbp
0x00001141      4889e5         mov rbp, rsp
0x00001144      89e0           mov eax, esp             ; Move esp to eax (same instruction as above)
0x00001146      488b45f8       mov rax, qword [rbp-0x8] ; This changes everything. What is this?
0x0000114a      5d             pop rbp
0x0000114b      c3             ret
0x0000114c      0f1f4000       nop dword [rax]

所以我的问题是,[rbp-0x8]中有什么,它是如何到达那里的,为什么两个值不同?

So my question is, what is in [rbp-0x8], how did it get there, and why are the two values different?

推荐答案

不,堆栈ASLR在程序启动时发生一次.函数之间对RSP的相对调整在编译时是固定的,并且只是为函数的局部var腾出空间的小常数. (C99可变长度数组和alloca对RSP进行运行时变量调整,但不是随机的.)

No, stack ASLR happens once at program startup. Relative adjustments to RSP between functions are fixed at compile time, and are just the small constants to make space for a function's local vars. (C99 variable-length arrays and alloca do runtime-variable adjustments to RSP, but not random.)

您的程序包含未定义的行为,并且实际上没有打印RSP.而是由先前的printf调用在寄存器中留下的某个堆栈地址(它看起来是堆栈地址,因此其高位确实随ASLR的不同而不同).它没有告诉您有关函数之间的堆栈指针差异的任何信息,只是告诉您如何不使用GNU C内联汇编.

Your program contains Undefined Behaviour and isn't actually printing RSP; instead some stack address left in a register by the previous printf call (which appears to be a stack address, so its high bits do vary with ASLR). It tells you nothing about stack-pointer differences between functions, just how not to use GNU C inline asm.

第一个值是正确打印当前的ESP,但这只是64位RSP的低32位.

The first value is printing the current ESP correctly, but that's only the low 32 bits of the 64-bit RSP.

使非void函数的结尾变尾是不安全的,并且使用返回值是Undefined Behaviour.使用返回值esp_func()的任何调用者都必然会触发UB,因此编译器可以随意将其所需的内容留在RAX中.

Falling off the end of a non-void function is not safe, and using the return value is Undefined Behaviour. Any caller that uses the return value of esp_func() necessarily would trigger UB, so the compiler is free to leave whatever it wants in RAX.

如果要编写mov %rsp, %rax/ret,则以纯asm或mov形式编写到"=r"(tmp)局部变量中.使用GNU C内联asm修改RAX而不告诉编译器它不会改变任何东西.编译器仍然将此视为没有返回值的函数.

If you want to write mov %rsp, %rax / ret, then write that function in pure asm, or mov to an "=r"(tmp) local variable. Using GNU C inline asm to modify RAX without telling the compiler about it doesn't change anything; the compiler still sees this as a function with no return value.

MSVC内联asm不同:显然支持使用_asm{ mov eax, 123 }之类的东西,然后掉落到非void函数的末尾,即使内联,MSVC也会将其视为函数的返回值. GNU C内联汇编不需要这样的愚蠢的技巧:如果您希望您的汇编与C值进行交互,请像在main中那样使用带有输出约束的扩展汇编.请记住,编译器不会解析GNU C内联asm,只是将模板字符串作为要汇编的编译器asm输出的一部分发出.

MSVC inline asm is different: it is apparently supported to use _asm{ mov eax, 123 } or something and then fall off the end of a non-void function, and MSVC will respect that as the function return value even when inlining. GNU C inline asm doesn't need silly hacks like that: if you want your asm to interact with C values, use Extended asm with an output constraint like you're doing in main. Remember that GNU C inline asm is not parsed by the compiler, just emit the template string as part of the compiler's asm output to be assembled.

我不知道为什么clang会从堆栈中重新加载返回值,但这只是clang内部的产物,它是如何在禁用优化的情况下执行代码生成的.但是由于未定义的行为,允许执行此操作.这是一个非空函数,因此需要具有返回值.最简单的事情就是发出ret,这是某些编译器在启用优化的情况下发生的事情,但是由于过程间优化,即使那样也不能解决问题.

I don't know exactly why clang is reloading a return value from the stack, but that's just an artifact of clang internals and how it does code-gen with optimization disabled. But it's allowed to do this because of the undefined behaviour. It is a non-void function, so it needs to have a return value. The simplest thing would be to just emit a ret, and is what some compilers happen to do with optimization enabled, but even that doesn't fix the problem because of inter-procedural optimization.

使用 不返回一个函数的返回值实际上是C中的未定义行为.这适用于C级别;使用内联asm修改寄存器而不通知编译器不会对编译器造成任何影响.因此,您的程序整体上都包含UB,因为它将结果传递给printf.这就是允许编译器以这种方式进行编译的原因:您的代码已经损坏.实际上,它只是从堆栈内存中返回一些垃圾.

It's actually Undefined Behaviour in C to use the return value of a function that didn't return one. This applies at the C level; using inline asm that modifies a register without telling the compiler about it doesn't change anything as far as the compiler is concerned. Therefore your program as a whole contains UB, because it passes the result to printf. That's why the compiler is allowed to compile this way: your code was already broken. In practice it's just returning some garbage from stack memory.

TL:DR:这不是发出mov %rsp, %rax/ret作为函数的asm定义的有效方法.

TL:DR: this is not a valid way to emit mov %rsp, %rax / ret as the asm definition for a function.

(C ++首先将其增强为UB,以UB结束,但是在C中,只要调用者不使用返回值,这是合法的.如果您通过优化编译与C ++相同的源,内联asm模板之后,g ++甚至不发出ret指令.如果您声明不带返回类型的函数,则这可能支持C的default- int返回类型.)

(C++ strengthens this to it being UB to fall off the end in the first place, but in C it's legal as long as the caller doesn't use the return value. If you compile the same source as C++ with optimization, g++ doesn't even emit a ret instruction after your inline asm template. Probably this is to support C's default-int return type if you declare a function without a return type.)

此UB也是为什么您的注释的修改版本(固定了printf格式字符串)在启用优化的情况下进行了编译( https://godbolt.org/z/sE7e84 )惊讶地"打印出不同的"RSP"值:第二个根本不使用RSP.

This UB is also why your modified version from comments (with the printf format strings fixed), compiled with optimization enabled (https://godbolt.org/z/sE7e84) prints "surprisingly" different "RSP" values: the 2nd one isn't using RSP at all.

#include <inttypes.h>
#include <stdio.h>

uint64_t __attribute__((noinline)) rsp_func(void)
{
  __asm__("movq %rsp, %rax");
}  // UB if return value used

int main()
{
  uint64_t rsp = 0;

  __asm__("\t movq %%rsp,%0" : "=r"(rsp));

  printf("rsp: 0x%08lx\n", rsp);
  printf("rsp: 0x%08lx\n", rsp_func());   // UB here
  return 0;
}

输出示例:

Compiler stderr
<source>:7:1: warning: non-void function does not return a value [-Wreturn-type]
}
^
1 warning generated.
Program returned: 0
Program stdout

rsp: 0x7fff5c472f30
rsp: 0x7f4b811b7170

clang -O3 asm输出显示编译器可见的UB是一个问题.即使您使用了noinline,编译器仍然可以看到函数体并尝试进行过程间优化.在这种情况下,UB导致它放弃并且在call rsp_funccall printf之间不发出mov %rsp, %rsi,因此它将打印以前的printf恰好保留在RSI中的任何值

clang -O3 asm output shows that the compiler-visible UB was a problem. Even though you used noinline, the compiler can still see the function body and try to do inter-procedural optimization. In this case, the UB led it to just give up and not emit a mov %rsp, %rsi between call rsp_func and call printf, so it's printing whatever value the previous printf happened to leave in RSI

# from the Godbolt link
rsp_func:                               # @rsp_func
        mov     rax, rsp
        ret
main:                                   # @main
        push    rax
        mov     rsi, rsp
        mov     edi, offset .L.str
        xor     eax, eax
        call    printf
        call    rsp_func               # return value ignored because of UB.
        mov     edi, offset .L.str
        xor     eax, eax
        call    printf                 # printf("0x%08lx\n", garbage in RSI left from last printf)
        xor     eax, eax
        pop     rcx
        ret
.L.str:
        .asciz  "rsp: 0x%08lx\n"

GNU C Basic汇编程序(无约束)对任何事物(__attribute__((naked))函数的主体除外)都没有用.

GNU C Basic asm (without constraints) is not useful for anything (except the body of a __attribute__((naked)) function).

在编译时看不到UB的情况下,不要以为编译器会按照您的期望进行操作.(当UB在编译时不可见时,编译器必须使代码适用于某些呼叫者或被呼叫者,并且您获得了预期的组合.但是在编译时可见的UB意味着所有选择均已关闭.)

Don't assume the compiler will do what you expect when there is UB visible to it at compile time. (When UB isn't visible at compile time, the compiler has to make code that would work for some callers or callees, and you get the asm you expected. But compile-time-visible UB means all bets are off.)

这篇关于每个功能的ASLR随机化会有所不同吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆