什么是“代码"?在Linux内核崩溃消息中? [英] What is "Code" in Linux Kernel crash messages?

查看：109 发布时间：2021/4/24 21:13:49 linux linux-kernel x86 crash

本文介绍了什么是“代码"?在Linux内核崩溃消息中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Linux内核加载失败后，我具有以下堆栈跟踪和崩溃信息:

I have the following stack trace and crash information after the Linux kernel failed to load:

[    3.684670] ------------[ cut here ]------------
[    3.695507] Bad FPU state detected at fpu__clear+0x91/0xc2, reinitializing FPU registers.
[    3.695508] traps: No user code available.
[    3.704745] invalid opcode: 0000 [#1] PREEMPT
[    3.715304] CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.50-android-x86-geeb7e76-dirty #1
[    3.724594] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM21 09/01/2017
[    3.732622] EIP: ex_handler_fprestore+0x2e/0x65
[    3.737807] Code: 00 55 89 e5 57 8b 48 04 8d 44 08 04 89 42 30 80 3d e7 fb a0 c1 00 75 16 c6 05 e7 fb a0 c1 01 50 68 b4 38 87 c1 e8 98 ba 00 00 <0f> 0b 58 5a 90 8d 74 26 00 eb f
[    3.759027] EAX: 0000004d EBX: c103d6f9 ECX: c19a2a48 EDX: c19a2a48
[    3.766169] ESI: df4c7e04 EDI: 00000006 EBP: df4c7c6c ESP: df4c7c60
[    3.773316] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010292
[    3.781044] CR0: 80050033 CR2: c168c6b4 CR3: 1e902000 CR4: 001406d0
[    3.788184] Call Trace:
[    3.791026]  ? fpu__clear+0x91/0xc2
[    3.795037]  fixup_exception+0x61/0x6e
[    3.799348]  do_trap+0x35/0xe9
[    3.802864]  do_invalid_op+0xd9f/0x108a
[    3.807269]  ? atime_needs_update+0x68/0xf5
[    3.812058]  ? touch_atime+0x37/0xbd
[    3.816168]  ? __check_object_size+0x83/0x123
[    3.821153]  ? fpu__clear+0x8e/0xc2
[    3.825166]  ? generic_file_read_iter+0x28d/0x723
[    3.830544]  ? generic_file_read_iter+0x28d/0x723
[    3.835931]  ? __vfs_read+0xe9/0x11f
[    3.840043]  common_exception+0x105/0x10e
[    3.844634] EIP: fpu__clear+0x91/0xc2
[    3.848840] Code: eb 05 e8 b4 f2 fd ff ff 0d 98 a8 99 c1 74 3b 90 8d 74 26 00 eb 07 90 8d 74 26 00 eb 1c 83 c8 ff bf c0 8c a2 c1 89 c2 0f c7 1f <a1> f4 8b a2 c1 ff 0d 98 a8 99 1
[    3.870070] EAX: ffffffff EBX: df4c5900 ECX: 00000000 EDX: ffffffff
[    3.877210] ESI: df4c5900 EDI: c1a28cc0 EBP: df4c7e4c ESP: df4c7e40
[    3.884356] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010286
[    3.892085]  ? do_alignment_check+0x1a/0x1a
[    3.896878]  ? common_exception+0x105/0x10e
[    3.901674]  flush_thread+0x33/0x37
[    3.905684]  flush_old_exec+0x540/0x5f9
[    3.910085]  load_elf_binary+0x24b/0xec1
[    3.914584]  ? pick_next_task_fair+0xdf/0x13a
[    3.919575]  ? __schedule+0x4bb/0x63f
[    3.923780]  ? sched_debug_header+0x45/0x40a
[    3.928666]  ? preempt_schedule+0x2d/0x3c
[    3.933266]  search_binary_handler+0x89/0x1ac
[    3.938259]  load_script+0x184/0x19f
[    3.942366]  search_binary_handler+0x89/0x1ac
[    3.947354]  __do_execve_file+0x454/0x668
[    3.951954]  do_execve+0x1b/0x1d
[    3.955673]  run_init_process+0x31/0x36
[    3.960082]  ? rest_init+0x99/0x99
[    3.963992]  kernel_init+0x5e/0xdf
[    3.967905]  ret_from_fork+0x19/0x30
[    3.972014] Modules linked in:
[    3.975542] ---[ end trace 7d27fceeb3852a38 ]---
[    3.980823] EIP: ex_handler_fprestore+0x2e/0x65
[    3.986014] Code: 00 55 89 e5 57 8b 48 04 8d 44 08 04 89 42 30 80 3d e7 fb a0 c1 00 75 16 c6 05 e7 fb a0 c1 01 50 68 b4 38 87 c1 e8 98 ba 00 00 <0f> 0b 58 5a 90 8d 74 26 00 eb f
[    4.007247] EAX: 0000004d EBX: c103d6f9 ECX: c19a2a48 EDX: c19a2a48
[    4.014387] ESI: df4c7e04 EDI: 00000006 EBP: df4c7c6c ESP: c1afa3b0
[    4.021536] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010292
[    4.029265] CR0: 80050033 CR2: c168c6b4 CR3: 1e902000 CR4: 001406d0
[    4.036413] note: swapper[1] exited with preempt_count 1

代码是什么意思?我还能知道导致内核崩溃的确切x86指令(不是C函数)吗?

What does the Code mean? Also can I know the exact x86 instruction (not the C function) that caused the kernel to crash?

编辑:更新了代码.我试图在虚拟环境中运行Linux.

Updated the code. I was trying to run Linux in a virtualized environment.

推荐答案

代码是x86机器代码的十六进制转储(大概是旧版32-位模式的32位模式位内核，因为它仅转储32位寄存器的内容.

Code is a hexdump of x86 machine code (presumably 32-bit mode from a legacy 32-bit kernel since it only dumped 32-bit register contents).

标记为<> 的字节是EIP指向的位置，因此这是 ex_handler_fprestore

The byte marked with <> is where EIP is pointing, so it's the faulting instruction inside ex_handler_fprestore

将其提交给反汇编程序，例如 https://defuse.ca/online-x86-assembler.htm#disassembly2 ，或Linux的崩溃转储解码脚本 https://elixir.bootlin.com/linux/latest/source/scripts/decodecode

请记住，x86机器代码使用可变长度编码，该编码不能向后明确解码.但这是由编译器生成的代码，因此至少我们可以假定不存在重叠的指令或静态数据与代码混合的情况(因为x86对此没有好处).如果我们在编译器生成的代码中找到函数的开始，则其余指令将全部健全".

Remember that x86 machine code uses a variable-length encoding that can't be unambiguously decoded backwards. But this is compiler-generated code, so at least we can assume there aren't supposed to be overlapping instructions or static data mixed with code (because x86 has no benefit for that). If we find the start of a function in compiler-generated code, the rest of the instructions will all be "sane".

00 字节看起来像是上一条指令的一部分或函数之间的填充:从那里进行解码将使我们 add BYTE PTR [ebp-0x77]，dl 可能是这样，对于非驱动程序功能，在eax，0x57 之后不是.

The 00 byte looks like part of a previous instruction or padding between functions: Decoding from there would give us add BYTE PTR [ebp-0x77],dl which is plausible, in eax,0x57 after that isn't, for a non-driver function.

很有可能 0x89 字节是MOV指令的操作码.

Much more likely is that the 0x89 byte is the opcode of a MOV instruction.

如果我们丢弃 00 字节并从 55 (即 push ebp )开始，我们将获得一个正常的函数体，包括使用 -Os 或 -fno-omit-frame-pointer 编译时期望的堆栈框架设置序言.

If we drop the 00 byte and start from 55 (which is push ebp), we get a normal function body including the stack-frame setup prologue you'd expect if compiled with -Os or -fno-omit-frame-pointer.

通常，您可以一次删除一个字节，直到获得看起来合理的解码，并且该解码至少对出错的指令具有指令边界.(但是，神智活泼"需要一定的经验；在错误启动之后，反汇编可能是偶然发生的.对于x86机器代码来说，这种情况并不罕见.)

In general, you can drop bytes one at a time until you get a sane-looking decoding that at least has an instruction-boundary on the faulting instruction. (But some experience is required for "sane-looking"; disassembly may have gotten in sync by chance after starting wrong. That's not rare for x86 machine code.)

# skipped the 00 byte which would desync decoding
0:  55                      push   ebp
1:  89 e5                   mov    ebp,esp
3:  57                      push   edi
4:  8b 48 04                mov    ecx,DWORD PTR [eax+0x4]      # EAX = 1st function arg, ECX = tmp
7:  8d 44 08 04             lea    eax,[eax+ecx*1+0x4]
b:  89 42 30                mov    DWORD PTR [edx+0x30],eax     # EDX = 2rd function arg
e:  80 3d e7 fb a0 c1 00    cmp    BYTE PTR ds:0xc1a0fbe7,0x0
15: 75 16                   jne    0x2d
17: c6 05 e7 fb a0 c1 01    mov    BYTE PTR ds:0xc1a0fbe7,0x1
1e: 50                      push   eax
1f: 68 b4 38 87 c1          push   0xc18738b4
24: e8 98 ba 00 00          call   0xbac1
29: 0f 0b                   ud2                     ### <=== EIP points here

# stuff after this probably isn't real code; it's unreachable
2b: 58                      pop    eax
2c: 5a                      pop    edx
2d: 90                      nop
2e: 8d 74 26 00             lea    esi,[esi+eiz*1+0x0]
32: eb                      .byte 0xeb

因此，此函数实际上以对带有堆栈参数的 noreturn 函数的调用结束.(32位x86 Linux内核是使用 -mregparm = 3 构建的，因此前3个args依次位于EAX，EDX和ECX中，因此此功能不是regparm或具有超过3个您可以看到此函数使用EAX和EDX作为传入的args:在写入之前先读取它们.)

So this function really ends with a call to a noreturn function with stack args. (32-bit x86 Linux kernels are built with -mregparm=3 so the first 3 args are in EAX, EDX, ECX in that order, so either this function is not regparm or it has more than 3 args. You can see this function uses EAX and EDX as incoming args: reading them before writing.)

但是由于某种原因，它不是 jmp 尾调用；也许为了异常回溯，它希望此函数的堆栈框架位于堆栈上.(即使该内核是使用 -fomit-frame-pointer 作为 -O2 .)

But it's not a jmp tailcall for some reason; maybe for exception backtracing it wants this function's stack frame on the stack. (Which might explain the push ebp / mov ebp,esp even if this kernel was built with -fomit-frame-pointer as part of -O2.)

您必须查看 ex_handler_fprestore 的C源代码，才能猜测为什么会这样.

You'd have to look at the C source for ex_handler_fprestore to guess why that might be.

ud2 是非法指令.编译器(或内联asm?)将其放在此处，因此如果函数返回，它将出错.这是一个明确的信号，表明该执行路径应该是不可达的，或者被标记为有意将其捕获为 assert()类型的机制.(在Linux中，查找 BUG_ON() ).

ud2 is an illegal instruction. The compiler (or inline asm?) put it there so it would fault if the function returned. It's a clear sign that this path of execution is supposed to be unreachable, or is marked to intentionally trap as an assert() type of mechanism. (In Linux, look for BUG_ON()).

这篇关于什么是“代码"?在Linux内核崩溃消息中?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

什么是“代码"?在Linux内核崩溃消息中? [英] What is "Code" in Linux Kernel crash messages?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

什么是“代码"?在Linux内核崩溃消息中? [英] What is &quot;Code&quot; in Linux Kernel crash messages?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

什么是“代码"?在Linux内核崩溃消息中? [英] What is "Code" in Linux Kernel crash messages?

登录关闭