内核如何获取在linux下运行的可执行二进制文件? [英] How does kernel get an executable binary file running under linux?

查看:30
本文介绍了内核如何获取在linux下运行的可执行二进制文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

内核如何获取linux下运行的可执行二进制文件?

How does kernel get an executable binary file running under linux?

这似乎是一个简单的问题,但谁能帮我深入挖掘一下?文件是如何加载到内存中的,以及执行代码是如何开始的?

It seems a simple question, but anyone can help me dig deep? How the file is loaded to memory and how execution code get started?

谁能帮我一步一步告诉我发生了什么?

Can anyone help me and tell what's happening step by step?

推荐答案

Linux 4.0 上 exec 系统调用的最佳时刻

Best moments of the exec system call on Linux 4.0

找到所有这些的最好方法是使用 QEMU 对 GDB 内核进行逐步调试:如何用GDB和QEMU调试Linux内核?

The best way to find all of that out is to GDB step debug the kernel with QEMU: How to debug the Linux kernel with GDB and QEMU?

  • fs/exec.c 定义系统调用 SYSCALL_DEFINE3(execve

只需转发到 do_execve.

do_execve

转发到do_execveat_common.

do_execveat_common

为了找到下一个主要函数,跟踪返回值retval最后修改的时间.

To find the next major function, track when return value retval is last modified.

开始构建一个struct linux_binprm *bprm来描述程序,并将其传递给exec_binprm执行.

Starts building a struct linux_binprm *bprm to describe the program, and passes it to exec_binprm to execute.

exec_binprm

再次按照返回值查找下一个主要调用.

Once again, follow the return value to find the next major call.

search_binary_handler

  • 处理程序由可执行文件的第一个魔术字节决定.

  • Handlers are determined by the first magic bytes of the executable.

最常见的两个处理程序是用于解释文件的处理程序(#! 魔术)和用于 ELF(x7fELF 魔术)的处理程序,但还有其他内置于内核的处理程序,例如a.out.用户也可以通过/proc/sys/fs/binfmt_misc

The two most common handlers are those for interpreted files (#! magic) and for ELF (x7fELF magic), but there are other built-into the kernel, e.g. a.out. And users can also register their own though /proc/sys/fs/binfmt_misc

ELF 处理程序定义在 fs/binfmt_elf.c.

The ELF handler is defined at fs/binfmt_elf.c.

另见:为什么人们要在 Python 脚本的第一行写 #!/usr/bin/env python shebang?

formats 列表包含所有处理程序.

The formats list contains all the handlers.

每个处理程序文件包含如下内容:

Each handler file contains something like:

static int __init init_elf_binfmt(void)
{
    register_binfmt(&elf_format);
    return 0;
}

elf_format 是在该文件中定义的 struct linux_binfmt.

and elf_format is a struct linux_binfmt defined in that file.

__init 很神奇,并将该代码放入一个在内核启动时调用的神奇部分:Linux内核代码中的__init是什么意思?

__init is magic and puts that code into a magic section that gets called when the kernel starts: What does __init mean in the Linux kernel code?

链接器级依赖注入!

还有一个递归计数器,以防解释器无限执行.

There is also a recursion counter, in case an interpreter executes itself infinitely.

试试这个:

echo '#!/tmp/a' > /tmp/a
chmod +x /tmp/a
/tmp/a

  • 我们再次跟踪返回值,看看接下来会发生什么,看看它来自:

  • Once again we follow the return value to see what comes next, and see that it comes from:

    retval = fmt->load_binary(bprm);
    

    其中为结构上的每个处理程序定义了 load_binary:C 样式多态.

    where load_binary is defined for each handler on the struct: C-style polymorsphism.

    fs/binfmt_elf.c:load_binary

    实际工作:

    • 根据ELF规范解析ELF文件,这里是ELF文件格式的概述:如何在Linux中使用十六进制编辑器制作可执行的ELF文件?
    • 根据解析的ELF文件设置进程初始程序状态,最值得注意的是:
      • parse the ELF file according to the ELF specification, here is an overview of the ELF file format: How to make an executable ELF file in Linux using a hex editor?
      • set up the process initial program state based on the parsed ELF file, most notably:
        • initial register setup in a struct pt_regs
        • initial virtual memory setup, the memory is specified in the ELF segments: What's the difference of section and segment in ELF file format
        • call start_thread, which marks the process as available to get to be scheduled by the scheduler

        最终调度器决定运行该进程,然后它必须跳转到存储在 struct pt_regs 中的 PC 地址,同时移动到较低特权的 CPU 状态,例如 Ring 3/EL0:什么环 0 和环 3 是操作系统上下文中的吗?

        eventually the scheduler decides to run the process, and it must then jump to the PC address stored in struct pt_regs while also moving to a less privileged CPU state such as Ring 3 / EL0: What are Ring 0 and Ring 3 in the context of operating systems?

        调度程序会被时钟硬件定期唤醒,该硬件会按照内核之前的配置定期生成中断,例如 旧的 x86 PITARM 计时器.内核还会注册处理程序,这些处理程序会在触发定时器中断时运行调度程序代码.

        The scheduler gets woken up periodically by a clock hardware that generates interrupts periodically as configured earlier by the kernel, for example the old x86 PIT or the ARM timer. The kernel also registers handlers which run the scheduler code when the timer interrupts are fired.

        TODO:继续进一步的源代码分析.我希望接下来会发生什么:

        TODO: continue source analysis further. What I expect to happen next:

        • 内核解析 ELF 的 INTERP 标头以找到动态加载器(通常设置为 /lib64/ld-linux-x86-64.so.2).
        • 如果存在:
          • 内核将动态加载器和 ELF 映射到内存中
          • 动态加载器启动,获取指向内存中 ELF 的指针.
          • 现在在用户态,加载器以某种方式解析 elf 标头,并对它们执行 dlopen
          • dlopen 使用可配置的搜索路径来查找这些库(ldd 和朋友),将它们映射到内存中,并以某种方式通知 ELF 在哪里可以找到其丢失的符号
          • loader 调用 ELF 的 _start
          • the kernel parses the INTERP header of the ELF to find the dynamic loader (usually set to /lib64/ld-linux-x86-64.so.2).
          • if it is present:
            • the kernel mmaps the dynamic loader and the ELF to be executed to memory
            • dynamic loader is started, taking a pointer to the ELF in memory.
            • now in userland, the loader somehow parses elf headers, and does dlopen on them
            • dlopen uses a configurable search path to find those libraries (ldd and friends), mmap them to memory, and somehow inform the ELF where to find its missing symbols
            • loader calls the _start of the ELF

            否则,内核会直接将可执行文件加载到内存中,而无需动态加载器.

            otherwise, the kernel loads the executable into memory directly without the dynamic loader.

            因此,它必须特别检查可执行文件是否为 PIE 或是否将其放置在内存中的随机位置:gcc 和 ld 中位置无关的可执行文件的 -fPIE 选项是什么?

            It must therefore in particular check if the executable is PIE or not an if it is place it in memory at a random location: What is the -fPIE option for position-independent executables in gcc and ld?

            这篇关于内核如何获取在linux下运行的可执行二进制文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆