从 fork() 到 do_fork() 的函数调用 [英] function calls from fork() to do_fork()

查看:29
本文介绍了从 fork() 到 do_fork() 的函数调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读了一些文本和源代码之后,我意识到forkvforkclone 这三个都是通过do_fork<执行的fork.c 中的/code> 带有不同的参数.

After going through some text and source code i realized that fork, vfork and clone all three are executed through do_fork in fork.c with different parameters.

但是 fork() 究竟如何调用 do_fork()..

But how exactly fork() calls do_fork()..

当调用fork()时,所有的函数都被调用了?

When calling fork() which all functions are called ?

fork()do_fork()的分步类是什么?

推荐答案

libcfork() 和其他系统调用的实现包含特殊的处理器指令,这些指令会调用系统调用.系统调用是特定于架构的,可能是一个非常复杂的话题.

libc's implementation of fork() and other system calls contain special processor instructions that invoke a system call. System call invocation is architecture-specific, and can be a quite complex topic.

让我们从一个简单"的例子开始,MIPS:

Let's begin with a "simple" example, MIPS:

在 MIPS 上,系统调用是通过 SYSCALL 指令调用的.所以,libc 的 fork() 实现最终将一些参数放在一些寄存器上,系统调用号在寄存器 v0 中,并发出 syscall说明.

On MIPS system calls are invoked via the SYSCALL instruction. So, libc's implementation of fork() ends up putting some arguments on some registers, the system call number in regiter v0, and issuing a syscall instruction.

在 MIPS 上,这会导致 SYSCALL_EXCEPTION(异常编号 8).启动时,内核将异常 8 与 arch/mips/kernel/traps.c:trap_init() 中的处理例程相关联:

On MIPS, this causes a SYSCALL_EXCEPTION (exception number 8). When booting, the kernel associates exception 8 to a handling routine in arch/mips/kernel/traps.c:trap_init():

set_except_vector(8, handle_sys);

因此,当 CPU 由于程序发出syscall 指令而收到异常 8 时,CPU 会转换到内核模式,并开始在 handle_sys 处执行处理程序code>/usr/src/linux/arch/mips/kernel/scall*.S(有几个文件用于不同的 32/64 位内核空间/用户空间组合).该例程在系统调用表中查找系统调用号并跳转到相应的 sys_...() 函数,在本例中为 sys_fork().

So when the CPU receives an exception 8 because a program has issued a syscall instruction, the CPU transitions into kernel mode, and begins executing the handler at handle_sys at /usr/src/linux/arch/mips/kernel/scall*.S (there are several files for the different 32/64 bits kernelspace/userspace combinations). That routine looks up the system call number in the system call table and jumps to the appropriate sys_...() function, in this example sys_fork().

现在,x86 更复杂了.传统上,Linux 使用中断 0x80 来调用系统调用.这与 arch/x86/kernel/traps_*.c:trap_init() 中的 x86 门相关联:

Now, x86 is more complicated. Traditionally, Linux used interrupt 0x80 to invoke system calls. This is associated to an x86 gate in arch/x86/kernel/traps_*.c:trap_init():

set_system_gate(SYSCALL_VECTOR,&system_call);

x86 处理器具有多个级别(环)特权(自 80286 起).只能通过预定义的门访问(跳转到)较低的环(=更多特权),这些门是内核设置的特殊类型的段描述符.所以,当一个int 0x80被调用时,会产生一个中断,CPU查找一个叫做IDT(Interrupt Descriptor Table)的特殊表,看到它有一个门(x86中的陷阱门,x86-64 中的一个中断门),并转换到 ring 0,开始在 arch/x86/kernel/entry_32 处执行 system_call/ia32_syscall 处理程序.S/arch/x86/ia32/ia32entry.S(分别用于x86/x86_64).

An x86 processor has several levels (rings) of privilege (since 80286). It is only possible to access (jump to) a lower ring (= more privilege) through predefined gates, which are special kinds of segment descriptors set by the kernel. So, when an int 0x80 is called, an interrupt is generated, the CPU looks up a special table called the IDT (Interrupt Descriptor Table), sees that it has a gate (a trap gate in x86, an interrupt gate in x86-64), and transitions into ring 0, beginning the execution of the system_call/ia32_syscall handler at arch/x86/kernel/entry_32.S/arch/x86/ia32/ia32entry.S (for x86/x86_64 respectively).

但是,从 Pentium Pro 开始,还有一种调用系统调用的替代方法:使用 SYSENTER 指令(AMD 也有自己的 SYSCALL 指令).这是调用系统调用的更有效方法.这个较新"机制的处理程序设置在 arch/x86/vdso/vdso32-setup.c:syscall32_cpu_init():

But, since the Pentium Pro, there is an alternative way to invoke a system call: using the SYSENTER instruction (AMD also has its own SYSCALL instruction). This is a more efficient way to invoke a system call. The handler for this "newer" mechanism is set at arch/x86/vdso/vdso32-setup.c:syscall32_cpu_init():

#ifdef CONFIG_X86_64
[...]
void syscall32_cpu_init(void)
{
    if (use_sysenter < 0)
            use_sysenter = (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL);

    /* Load these always in case some future AMD CPU supports
       SYSENTER from compat mode too. */
    checking_wrmsrl(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
    checking_wrmsrl(MSR_IA32_SYSENTER_ESP, 0ULL);
    checking_wrmsrl(MSR_IA32_SYSENTER_EIP, (u64)ia32_sysenter_target);

    wrmsrl(MSR_CSTAR, ia32_cstar_target);
}
[...]
#else
[...]
void enable_sep_cpu(void)
{
    int cpu = get_cpu();
    struct tss_struct *tss = &per_cpu(init_tss, cpu);

    if (!boot_cpu_has(X86_FEATURE_SEP)) {
            put_cpu();
            return;
    }

    tss->x86_tss.ss1 = __KERNEL_CS;
    tss->x86_tss.sp1 = sizeof(struct tss_struct) + (unsigned long) tss;
    wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
    wrmsr(MSR_IA32_SYSENTER_ESP, tss->x86_tss.sp1, 0);
    wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long) ia32_sysenter_target, 0);
    put_cpu();
}
[...]
#endif  /* CONFIG_X86_64 */

以上使用特定于机器的寄存器 (MSR) 进行设置.处理程序例程是 ia32_sysenter_targetia32_cstar_target(最后一个仅适用于 x86_64)(在 arch/x86/kernel/entry_32.Sarch/x86/ia32/ia32entry.S).

The above uses machine specific registers (MSRs) to do the setup. The handler routines are ia32_sysenter_target and ia32_cstar_target (this last one only for x86_64) (in arch/x86/kernel/entry_32.S or arch/x86/ia32/ia32entry.S).

选择使用哪种系统调用机制

Linux 内核和 glibc 有一种机制可以在调用系统调用的不同方式之间进行选择.

The linux kernel and glibc have a mechanism to choose between the different ways to invoke a system call.

内核为每个进程设置了一个虚拟共享库,称为VDSO(virtual dynamic shared object),在cat/proc//maps:

The kernel sets up a virtual shared library for each process, it's called the VDSO (virtual dynamic shared object), which you can see in the output of cat /proc/<pid>/maps:

$ cat /proc/self/maps
08048000-0804c000 r-xp 00000000 03:04 1553592    /bin/cat
0804c000-0804d000 rw-p 00003000 03:04 1553592    /bin/cat
[...]
b7ee8000-b7ee9000 r-xp b7ee8000 00:00 0          [vdso]
[...]

这个 vdso,除其他外,还包含一个适用于正在使用的 CPU 的适当系统调用调用序列,例如:

This vdso, among other things, contains an appropriate system call invocation sequence for the CPU in use, e.g:

ffffe414 <__kernel_vsyscall>:
ffffe414:       51                      push   %ecx        ; 
ffffe415:       52                      push   %edx        ; > save registers
ffffe416:       55                      push   %ebp        ; /
ffffe417:       89 e5                   mov    %esp,%ebp   ; save stack pointer
ffffe419:       0f 34                   sysenter           ; invoke system call
ffffe41b:       90                      nop
ffffe41c:       90                      nop                ; the kernel will usually
ffffe41d:       90                      nop                ; return to the insn just
ffffe41e:       90                      nop                ; past the jmp, but if the
ffffe41f:       90                      nop                ; system call was interrupted
ffffe420:       90                      nop                ; and needs to be restarted
ffffe421:       90                      nop                ; it will return to this jmp
ffffe422:       eb f3                   jmp    ffffe417 <__kernel_vsyscall+0x3>
ffffe424:       5d                      pop    %ebp        ; 
ffffe425:       5a                      pop    %edx        ; > restore registers
ffffe426:       59                      pop    %ecx        ; /
ffffe427:       c3                      ret                ; return to caller

arch/x86/vdso/vdso32/中有使用int 0x80sysentersyscall的实现,内核选择合适的.

In arch/x86/vdso/vdso32/ there are implementations using int 0x80, sysenter and syscall, the kernel selects the appropriate one.

为了让用户空间知道有一个 vdso 以及它的位置,内核在辅助向量中设置了 AT_SYSINFOAT_SYSINFO_EHDR 条目(auxvmain() 的第 4 个参数,在 argc, argv, envp 之后,用于将一些信息从内核传递给新启动的进程).AT_SYSINFO_EHDR 指向 vdso 的 ELF 头,AT_SYSINFO 指向 vsyscall 实现:

To let userspace know that there is a vdso, and where it is located, the kernel sets AT_SYSINFO and AT_SYSINFO_EHDR entries in the auxiliary vector (auxv, the 4th argument to main(), after argc, argv, envp, which is used to pass some information from the kernel to newly started processes). AT_SYSINFO_EHDR points to the ELF header of the vdso, AT_SYSINFO points to the vsyscall implementation:

$ LD_SHOW_AUXV=1 id    # tell the dynamic linker ld.so to output auxv values
AT_SYSINFO:      0xb7fd4414
AT_SYSINFO_EHDR: 0xb7fd4000
[...]

glibc 使用此信息来定位 vsyscall.它将它存储到动态加载器全局 _dl_sysinfo 中,例如:

glibc uses this information to locate the vsyscall. It stores it into the dynamic loader global _dl_sysinfo, e.g.:

glibc-2.16.0/elf/dl-support.c:_dl_aux_init():
ifdef NEED_DL_SYSINFO
  case AT_SYSINFO:
    GL(dl_sysinfo) = av->a_un.a_val;
    break;
#endif
#if defined NEED_DL_SYSINFO || defined NEED_DL_SYSINFO_DSO
  case AT_SYSINFO_EHDR:
    GL(dl_sysinfo_dso) = (void *) av->a_un.a_val;
    break;
#endif

glibc-2.16.0/elf/dl-sysdep.c:_dl_sysdep_start()

glibc-2.16.0/elf/rtld.c:dl_main:
GLRO(dl_sysinfo) = GLRO(dl_sysinfo_dso)->e_entry + l->l_addr;

并且在 TCB(线程控制块)的头部字段中:

and in a field in the header of the TCB (thread control block):

glibc-2.16.0/nptl/sysdeps/i386/tls.h

_head->sysinfo = GLRO(dl_sysinfo)

如果内核很旧并且没有提供 vdso,glibc 会为 _dl_sysinfo 提供一个默认实现:

If the kernel is old and doesn't provide a vdso, glibc provides a default implementation for _dl_sysinfo:

.hidden _dl_sysinfo_int80:
int $0x80
ret

当针对 glibc 编译程序时,根据情况,在调用系统调用的不同方式之间做出选择:

When a program is compiled against glibc, depending on circumstances, a choice is made between different ways of invoking a system call:

glibc-2.16.0/sysdeps/unix/sysv/linux/i386/sysdep.h:
/* The original calling convention for system calls on Linux/i386 is
   to use int $0x80.  */
#ifdef I386_USE_SYSENTER
# ifdef SHARED
#  define ENTER_KERNEL call *%gs:SYSINFO_OFFSET
# else
#  define ENTER_KERNEL call *_dl_sysinfo
# endif
#else
# define ENTER_KERNEL int $0x80
#endif

  • int 0x80 ← 传统方式
  • call *%gs:offsetof(tcb_head_t, sysinfo)%gs 指向TCB,所以这里通过存储在TCB 中的指向vsyscall 的指针间接跳转.这对于编译为 PIC 的对象是首选.这需要 TLS 初始化.对于动态可执行文件,TLS 由 ld.so 初始化.对于静态 PIE 可执行文件,TLS 由 __libc_setup_tls() 初始化.
  • call *_dl_sysinfo ← 这个是通过全局变量间接跳转的.这需要重新定位 _dl_sysinfo,因此对于编译为 PIC 的对象可以避免.
    • int 0x80 ← the traditional way
    • call *%gs:offsetof(tcb_head_t, sysinfo)%gs points to the TCB, so this jumps indirectly through the pointer to vsyscall stored in the TCB. This is prefered for objects compiled as PIC. This requires TLS initialization. For dynamic executables, TLS is initialized by ld.so. For static PIE executables, TLS is initialized by __libc_setup_tls().
    • call *_dl_sysinfo ← this jumps indirectly through the global variable. This requires relocation of _dl_sysinfo, so it is avoided for objects compiled as PIC.
    • 因此,在 x86 中:

      So, in x86:

                             fork()
                               ↓
      int 0x80 / call *%gs:0x10 / call *_dl_sysinfo 
        |                ↓              ↓
        |       (in vdso) int 0x80 / sysenter / syscall
        ↓                ↓              ↓            ↓
            system_call     | ia32_sysenter_target | ia32_cstar_target
                                ↓
                             sys_fork()
      

      这篇关于从 fork() 到 do_fork() 的函数调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆