从fork()到do_fork()的函数调用 [英] function calls from fork() to do_fork()

查看:97
本文介绍了从fork()到do_fork()的函数调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过一些文本和源代码后,我意识到forkvforkclone这三个都是通过fork.cfork.c中使用不同的参数执行的.

After going through some text and source code i realized that fork, vfork and clone all three are executed through do_fork in fork.c with different parameters.

但是fork()如何精确地调用do_fork() ..

But how exactly fork() calls do_fork()..

调用fork()时会调用所有功能?

When calling fork() which all functions are called ?

fork()do_fork()的逐步类是什么?

推荐答案

libc fork()实现和其他系统调用包含调用系统调用的特殊处理器指令.系统调用调用是特定于体系结构的,并且可能是一个非常复杂的主题.

libc's implementation of fork() and other system calls contain special processor instructions that invoke a system call. System call invocation is architecture-specific, and can be a quite complex topic.

让我们从一个简单"的例子开始,MIPS:

Let's begin with a "simple" example, MIPS:

在MIPS上,系统调用是通过SYSCALL指令调用的.因此,libc的fork()实现最终会在一些寄存器上放置一些参数,在regiter v0中添加系统调用编号,并发出syscall指令.

On MIPS system calls are invoked via the SYSCALL instruction. So, libc's implementation of fork() ends up putting some arguments on some registers, the system call number in regiter v0, and issuing a syscall instruction.

在MIPS上,这会导致SYSCALL_EXCEPTION(例外编号8).引导时,内核将异常8与arch/mips/kernel/traps.c:trap_init()中的处理例程相关联:

On MIPS, this causes a SYSCALL_EXCEPTION (exception number 8). When booting, the kernel associates exception 8 to a handling routine in arch/mips/kernel/traps.c:trap_init():

set_except_vector(8, handle_sys);

因此,当CPU由于程序已发出syscall指令而收到异常8时,CPU会转换为内核模式,并开始在handle_sys/usr/src/linux/arch/mips/kernel/scall*.S处执行处理程序(不同的文件有多个32/64位内核空间/用户空间组合).该例程在系统调用表中查找系统调用号,然后跳转到相应的sys_...()函数,在本示例中为sys_fork().

So when the CPU receives an exception 8 because a program has issued a syscall instruction, the CPU transitions into kernel mode, and begins executing the handler at handle_sys at /usr/src/linux/arch/mips/kernel/scall*.S (there are several files for the different 32/64 bits kernelspace/userspace combinations). That routine looks up the system call number in the system call table and jumps to the appropriate sys_...() function, in this example sys_fork().

现在,x86更加复杂.传统上,Linux使用中断0x80来调用系统调用.这与arch/x86/kernel/traps_*.c:trap_init()中的x86门相关:

Now, x86 is more complicated. Traditionally, Linux used interrupt 0x80 to invoke system calls. This is associated to an x86 gate in arch/x86/kernel/traps_*.c:trap_init():

set_system_gate(SYSCALL_VECTOR,&system_call);

x86处理器具有多个特权级别(环)(自80286起).通过预定义的门只能访问(跳转到)一个较低的环(=更多特权),这是内核设置的特殊类型的段描述符.因此,当调用int 0x80时,会产生一个中断,CPU查找一个称为IDT(中断描述符表)的特殊表,发现它有一个门(x86中的陷阱门,x86-中的中断门). 64),然后过渡到环0,在arch/x86/kernel/entry_32.S/arch/x86/ia32/ia32entry.S(分别用于x86/x86_64)处开始执行system_call/ia32_syscall处理程序.

An x86 processor has several levels (rings) of privilege (since 80286). It is only possible to access (jump to) a lower ring (= more privilege) through predefined gates, which are special kinds of segment descriptors set by the kernel. So, when an int 0x80 is called, an interrupt is generated, the CPU looks up a special table called the IDT (Interrupt Descriptor Table), sees that it has a gate (a trap gate in x86, an interrupt gate in x86-64), and transitions into ring 0, beginning the execution of the system_call/ia32_syscall handler at arch/x86/kernel/entry_32.S/arch/x86/ia32/ia32entry.S (for x86/x86_64 respectively).

但是,自奔腾Pro以来,还有另一种调用系统调用的方法:使用SYSENTER指令(AMD也有自己的SYSCALL指令).这是调用系统调用的更有效方法.此更新"机制的处理程序设置为arch/x86/vdso/vdso32-setup.c:syscall32_cpu_init():

But, since the Pentium Pro, there is an alternative way to invoke a system call: using the SYSENTER instruction (AMD also has its own SYSCALL instruction). This is a more efficient way to invoke a system call. The handler for this "newer" mechanism is set at arch/x86/vdso/vdso32-setup.c:syscall32_cpu_init():

#ifdef CONFIG_X86_64
[...]
void syscall32_cpu_init(void)
{
    if (use_sysenter < 0)
            use_sysenter = (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL);

    /* Load these always in case some future AMD CPU supports
       SYSENTER from compat mode too. */
    checking_wrmsrl(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
    checking_wrmsrl(MSR_IA32_SYSENTER_ESP, 0ULL);
    checking_wrmsrl(MSR_IA32_SYSENTER_EIP, (u64)ia32_sysenter_target);

    wrmsrl(MSR_CSTAR, ia32_cstar_target);
}
[...]
#else
[...]
void enable_sep_cpu(void)
{
    int cpu = get_cpu();
    struct tss_struct *tss = &per_cpu(init_tss, cpu);

    if (!boot_cpu_has(X86_FEATURE_SEP)) {
            put_cpu();
            return;
    }

    tss->x86_tss.ss1 = __KERNEL_CS;
    tss->x86_tss.sp1 = sizeof(struct tss_struct) + (unsigned long) tss;
    wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
    wrmsr(MSR_IA32_SYSENTER_ESP, tss->x86_tss.sp1, 0);
    wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long) ia32_sysenter_target, 0);
    put_cpu();
}
[...]
#endif  /* CONFIG_X86_64 */

以上内容使用计算机专用寄存器(MSR)进行设置.处理程序例程是ia32_sysenter_targetia32_cstar_target(这仅适用于x86_64)(在arch/x86/kernel/entry_32.Sarch/x86/ia32/ia32entry.S中).

The above uses machine specific registers (MSRs) to do the setup. The handler routines are ia32_sysenter_target and ia32_cstar_target (this last one only for x86_64) (in arch/x86/kernel/entry_32.S or arch/x86/ia32/ia32entry.S).

选择要使用的系统调用机制

Linux内核和glibc具有一种机制,可以在调用系统调用的不同方式之间进行选择.

The linux kernel and glibc have a mechanism to choose between the different ways to invoke a system call.

内核为每个进程设置一个虚拟共享库,称为VDSO(虚拟动态共享对象),您可以在cat /proc/<pid>/maps的输出中看到它:

The kernel sets up a virtual shared library for each process, it's called the VDSO (virtual dynamic shared object), which you can see in the output of cat /proc/<pid>/maps:

$ cat /proc/self/maps
08048000-0804c000 r-xp 00000000 03:04 1553592    /bin/cat
0804c000-0804d000 rw-p 00003000 03:04 1553592    /bin/cat
[...]
b7ee8000-b7ee9000 r-xp b7ee8000 00:00 0          [vdso]
[...]

除其他事项外,此vdso包含适用于所用CPU的适当的系统调用调用顺序,例如:

This vdso, among other things, contains an appropriate system call invocation sequence for the CPU in use, e.g:

ffffe414 <__kernel_vsyscall>:
ffffe414:       51                      push   %ecx        ; \
ffffe415:       52                      push   %edx        ; > save registers
ffffe416:       55                      push   %ebp        ; /
ffffe417:       89 e5                   mov    %esp,%ebp   ; save stack pointer
ffffe419:       0f 34                   sysenter           ; invoke system call
ffffe41b:       90                      nop
ffffe41c:       90                      nop                ; the kernel will usually
ffffe41d:       90                      nop                ; return to the insn just
ffffe41e:       90                      nop                ; past the jmp, but if the
ffffe41f:       90                      nop                ; system call was interrupted
ffffe420:       90                      nop                ; and needs to be restarted
ffffe421:       90                      nop                ; it will return to this jmp
ffffe422:       eb f3                   jmp    ffffe417 <__kernel_vsyscall+0x3>
ffffe424:       5d                      pop    %ebp        ; \
ffffe425:       5a                      pop    %edx        ; > restore registers
ffffe426:       59                      pop    %ecx        ; /
ffffe427:       c3                      ret                ; return to caller

arch/x86/vdso/vdso32/中,有一些使用int 0x80sysentersyscall的实现,内核会选择合适的实现.

In arch/x86/vdso/vdso32/ there are implementations using int 0x80, sysenter and syscall, the kernel selects the appropriate one.

为了让用户空间知道一个vdso及其位置,内核在auxv,main()的第四个参数)中的AT_SYSINFOAT_SYSINFO_EHDR条目. >,用于将某些信息从内核传递到新启动的进程). AT_SYSINFO_EHDR指向vdso的ELF标头,AT_SYSINFO指向vsyscall实现:

To let userspace know that there is a vdso, and where it is located, the kernel sets AT_SYSINFO and AT_SYSINFO_EHDR entries in the auxiliary vector (auxv, the 4th argument to main(), after argc, argv, envp, which is used to pass some information from the kernel to newly started processes). AT_SYSINFO_EHDR points to the ELF header of the vdso, AT_SYSINFO points to the vsyscall implementation:

$ LD_SHOW_AUXV=1 id    # tell the dynamic linker ld.so to output auxv values
AT_SYSINFO:      0xb7fd4414
AT_SYSINFO_EHDR: 0xb7fd4000
[...]

glibc使用此信息来定位vsyscall.它将其存储到动态加载程序全局_dl_sysinfo中,例如:

glibc uses this information to locate the vsyscall. It stores it into the dynamic loader global _dl_sysinfo, e.g.:

glibc-2.16.0/elf/dl-support.c:_dl_aux_init():
ifdef NEED_DL_SYSINFO
  case AT_SYSINFO:
    GL(dl_sysinfo) = av->a_un.a_val;
    break;
#endif
#if defined NEED_DL_SYSINFO || defined NEED_DL_SYSINFO_DSO
  case AT_SYSINFO_EHDR:
    GL(dl_sysinfo_dso) = (void *) av->a_un.a_val;
    break;
#endif

glibc-2.16.0/elf/dl-sysdep.c:_dl_sysdep_start()

glibc-2.16.0/elf/rtld.c:dl_main:
GLRO(dl_sysinfo) = GLRO(dl_sysinfo_dso)->e_entry + l->l_addr;

以及TCB(线程控制块)标头中的字段中:

and in a field in the header of the TCB (thread control block):

glibc-2.16.0/nptl/sysdeps/i386/tls.h

_head->sysinfo = GLRO(dl_sysinfo)

如果内核较旧且未提供vdso,则glibc为_dl_sysinfo提供默认实现:

If the kernel is old and doesn't provide a vdso, glibc provides a default implementation for _dl_sysinfo:

.hidden _dl_sysinfo_int80:
int $0x80
ret

根据情况针对glibc编译程序时,会在调用系统调用的不同方式之间做出选择:

When a program is compiled against glibc, depending on circumstances, a choice is made between different ways of invoking a system call:

glibc-2.16.0/sysdeps/unix/sysv/linux/i386/sysdep.h:
/* The original calling convention for system calls on Linux/i386 is
   to use int $0x80.  */
#ifdef I386_USE_SYSENTER
# ifdef SHARED
#  define ENTER_KERNEL call *%gs:SYSINFO_OFFSET
# else
#  define ENTER_KERNEL call *_dl_sysinfo
# endif
#else
# define ENTER_KERNEL int $0x80
#endif

  • int 0x80←传统方式
  • call *%gs:offsetof(tcb_head_t, sysinfo)%gs指向TCB,因此它通过指向TCB中存储的vsyscall的指针间接跳转.对于编译为PIC的对象,这是首选.这需要TLS初始化.对于动态可执行文件,TLS由ld.so初始化.对于静态PIE可执行文件,TLS由__libc_setup_tls()初始化.
  • call *_dl_sysinfo←这间接地通过全局变量跳转.这需要重定位_dl_sysinfo,因此对于编译为PIC的对象避免这样做.
    • int 0x80 ← the traditional way
    • call *%gs:offsetof(tcb_head_t, sysinfo)%gs points to the TCB, so this jumps indirectly through the pointer to vsyscall stored in the TCB. This is prefered for objects compiled as PIC. This requires TLS initialization. For dynamic executables, TLS is initialized by ld.so. For static PIE executables, TLS is initialized by __libc_setup_tls().
    • call *_dl_sysinfo ← this jumps indirectly through the global variable. This requires relocation of _dl_sysinfo, so it is avoided for objects compiled as PIC.
    • 因此,在x86中:

                             fork()
                               ↓
      int 0x80 / call *%gs:0x10 / call *_dl_sysinfo 
        |                ↓              ↓
        |       (in vdso) int 0x80 / sysenter / syscall
        ↓                ↓              ↓            ↓
            system_call     | ia32_sysenter_target | ia32_cstar_target
                                ↓
                             sys_fork()
      

      这篇关于从fork()到do_fork()的函数调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆