从不对齐 RSP 的函数调用时,glibc scanf 出现分段错误 [英] glibc scanf Segmentation faults when called from a function that doesn't align RSP

查看:25
本文介绍了从不对齐 RSP 的函数调用时,glibc scanf 出现分段错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编译以下代码时:

global main
extern printf, scanf

section .data
   msg: db "Enter a number: ",10,0
   format:db "%d",0

section .bss
   number resb 4

section .text
main:
   mov rdi, msg
   mov al, 0
   call printf

   mov rsi, number
   mov rdi, format
   mov al, 0
   call scanf

   mov rdi,format
   mov rsi,[number]
   inc rsi
   mov rax,0
   call printf 

   ret

使用:

nasm -f elf64 example.asm -o example.o
gcc -no-pie -m64 example.o -o example

然后运行

./example

它运行,打印:输入一个数字:但随后崩溃并打印:分段错误(核心转储)

it runs, print: enter a number: but then crashes and prints: Segmentation fault (core dumped)

所以 printf 工作正常,但 scanf 不行.我用 scanf 做错了什么?

So printf works fine but scanf not. What am I doing wrong with scanf so?

推荐答案

在开头/结尾使用sub rsp, 8/add rsp, 8您的函数在您的函数执行调用之前将堆栈重新对齐为16个字节.

Use sub rsp, 8 / add rsp, 8 at the start/end of your function to re-align the stack to 16 bytes before your function does a call.

或者更好地推送/弹出一个虚拟寄存器,例如push rdx/pop rcx,或者像 RBP 这样你实际上想要保存的调用保留寄存器.您需要对 RSP 的总更改为 8 的奇数倍,计算所有推送和 sub rsp, 从函数入口到任何 call.
8 + 16*n 字节表示整数 n.

Or better push/pop a dummy register, e.g. push rdx / pop rcx, or a call-preserved register like RBP you actually wanted to save anyway. You need the total change to RSP to be an odd multiple of 8 counting all pushes and sub rsp, from function entry to any call.
i.e. 8 + 16*n bytes for whole number n.

在函数入口,RSP 距离 16 字节对齐有 8 个字节,因为 call 推送了一个 8 字节的返回地址.请参阅打印浮点数来自 x86-64 的数字似乎需要保存 %rbp主要和堆栈对齐,以及使用 GNU 汇编程序在 x86_64 中调用 printf.这是一个 ABI 要求,当 printf 没有任何 FP args 时,您过去可以避免违反该要求.但现在不会了.

On function entry, RSP is 8 bytes away from 16-byte alignment because the call pushed an 8-byte return address. See Printing floating point numbers from x86-64 seems to require %rbp to be saved, main and stack alignment, and Calling printf in x86_64 using GNU assembler. This is an ABI requirement which you used to be able to get away with violating when there weren't any FP args for printf. But not any more.

另见 为什么 x86-64/AMD64 System V ABI 要求 16 字节堆栈对齐?

gcc 的 glibc scanf 代码生成现在依赖于 16 字节堆栈对齐
即使 AL == 0
.

它似乎在 __GI__IO_vfscanf 某处自动矢量化复制了 16 个字节,常规 scanf 在将其寄存器参数溢出到堆栈后调用1.(调用 scanf 的许多类似方法共享一个大实现,作为各种 libc 入口点的后端,例如 scanffscanf 等)

It seems to have auto-vectorized copying 16 bytes somewhere in __GI__IO_vfscanf, which regular scanf calls after spilling its register args to the stack1. (The many similar ways to call scanf share one big implementation as a back end to the various libc entry points like scanf, fscanf, etc.)

我下载了 Ubuntu 18.04 的 libc6 二进制包:https://packages.ubuntu.com/bionic/amd64/libc6/download 并提取文件(使用 7z x blah.debtar xf data.tar,因为 7z 知道如何提取很多文件格式).

I downloaded Ubuntu 18.04's libc6 binary package: https://packages.ubuntu.com/bionic/amd64/libc6/download and extracted the files (with 7z x blah.deb and tar xf data.tar, because 7z knows how to extract a lot of file formats).

我可以用 LD_LIBRARY_PATH=/tmp/bionic-libc/lib/x86_64-linux-gnu ./bad-printf 重现你的错误,而且结果是系统 glibc 2.27-3在我的 Arch Linux 桌面上.

I can repro your bug with LD_LIBRARY_PATH=/tmp/bionic-libc/lib/x86_64-linux-gnu ./bad-printf, and also it turns out with the system glibc 2.27-3 on my Arch Linux desktop.

使用 GDB,我在你的程序上运行它并执行 set env LD_LIBRARY_PATH/tmp/bionic-libc/lib/x86_64-linux-gnu 然后 run.使用 layout reg,反汇编窗口在收到 SIGSEGV 时看起来像这样:

With GDB, I ran it on your program and did set env LD_LIBRARY_PATH /tmp/bionic-libc/lib/x86_64-linux-gnu then run. With layout reg, the disassembly window looks like this at the point where it received SIGSEGV:

   │0x7ffff786b49a <_IO_vfscanf+602>        cmp    r12b,0x25                                                                                             │
   │0x7ffff786b49e <_IO_vfscanf+606>        jne    0x7ffff786b3ff <_IO_vfscanf+447>                                                                      │
   │0x7ffff786b4a4 <_IO_vfscanf+612>        mov    rax,QWORD PTR [rbp-0x460]                                                                             │
   │0x7ffff786b4ab <_IO_vfscanf+619>        add    rax,QWORD PTR [rbp-0x458]                                                                             │
   │0x7ffff786b4b2 <_IO_vfscanf+626>        movq   xmm0,QWORD PTR [rbp-0x460]                                                                            │
   │0x7ffff786b4ba <_IO_vfscanf+634>        mov    DWORD PTR [rbp-0x678],0x0                                                                             │
   │0x7ffff786b4c4 <_IO_vfscanf+644>        mov    QWORD PTR [rbp-0x608],rax                                                                             │
   │0x7ffff786b4cb <_IO_vfscanf+651>        movzx  eax,BYTE PTR [rbx+0x1]                                                                                │
   │0x7ffff786b4cf <_IO_vfscanf+655>        movhps xmm0,QWORD PTR [rbp-0x608]                                                                            │
  >│0x7ffff786b4d6 <_IO_vfscanf+662>        movaps XMMWORD PTR [rbp-0x470],xmm0                                                                          │

因此它将两个 8 字节的对象复制到堆栈中,movq + movhps 用于加载,movaps 用于存储.但是由于堆栈未对齐,movaps [rbp-0x470],xmm0 错误.

So it copied two 8-byte objects to the stack with movq + movhps to load and movaps to store. But with the stack misaligned, movaps [rbp-0x470],xmm0 faults.

我没有抓取调试版本来准确找出 C 源代码的哪一部分变成了这个,但该函数是用 C 编写的,并由启用优化的 GCC 编译.GCC 一直被允许这样做,但直到最近它才变得足够聪明,以这种方式更好地利用 SSE2.

I didn't grab a debug build to find out exactly which part of the C source turned into this, but the function is written in C and compiled by GCC with optimization enabled. GCC has always been allowed to do this, but only recently did it get smart enough to take better advantage of SSE2 this way.

脚注 1: printf/scanf with AL != 0 总是需要 16 字节对齐,因为 gcc 的可变参数函数的代码生成使用 test al,al/je 来溢出完整的 16 字节在这种情况下,XMM regs xmm0..7 与对齐的商店.__m128i 可以是可变参数函数的参数,而不仅仅是 double,并且 gcc 不会检查该函数是否真的读取了任何 16 字节的 FP 参数.

Footnote 1: printf / scanf with AL != 0 has always required 16-byte alignment because gcc's code-gen for variadic functions uses test al,al / je to spill the full 16-byte XMM regs xmm0..7 with aligned stores in that case. __m128i can be an argument to a variadic function, not just double, and gcc doesn't check whether the function ever actually reads any 16-byte FP args.

这篇关于从不对齐 RSP 的函数调用时,glibc scanf 出现分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆