当您调用克隆系统调用时,谁设置了 RIP 寄存器? [英] Who sets the RIP register when you call the clone syscall?

查看:42
本文介绍了当您调用克隆系统调用时,谁设置了 RIP 寄存器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现一个最小内核,并且我正在尝试实现克隆系统调用.在手册页中,您可以看到这样定义的克隆系统调用:

int clone(int (*fn)(void *), void *stack, int flags, void *arg, .../* pid_t *parent_tid, void *tls, pid_t *child_tid */);

如您所见,它接收一个函数指针.如果您仔细阅读手册页,您实际上可以看到内核中的实际系统调用实现没有收到函数指针:

long clone(unsigned long flags, void *stack,int *parent_tid, int *child_tid,无符号长 tls);

那么,我的问题是,谁在创建线程后修改 RIP 寄存器?是libc吗?

我在 glibc 中找到了这段代码: https://elixir.bootlin.com/glibc/latest/source/sysdeps/unix/sysv/linux/x86_64/clone.S 但我不确定该函数在什么时候被实际调用.p>

额外信息:

查看 clone.S 源代码时,您可以看到它在系统调用之后跳转到 thread_start 分支.在克隆系统调用之后的分支上(所以只有孩子这样做)它从堆栈中弹出函数地址和参数.谁真正将这些参数和函数地址压入堆栈?我猜它必须发生在内核的某个地方,因为在 syscall 指令的时候它们不在那里.

这是一些 gdb 输出:

就在系统调用之前:

[--------------------------------------code-------------------------------------]0x7ffff7d8af22 <克隆+34>: mov r8,r90x7ffff7d8af25 <clone+37>: mov r10,QWORD PTR [rsp+0x8]0x7ffff7d8af2a <克隆+42>: mov eax,0x38=>0x7ffff7d8af2f <克隆+47>:系统调用0x7ffff7d8af31 <clone+49>: 测试 rax,rax0x7ffff7d8af34 <克隆+52>: jl 0x7ffff7d8af49 <克隆+73>0x7ffff7d8af36 <克隆+54>: 0x7ffff7d8af39 <克隆+57>0x7ffff7d8af38 <克隆+56>: ret猜测的论据:参数 [0]: 0x3d0f00arg[1]: 0x7ffff8020b60 -->0x7ffff7d3fb30 (<do_something>: push rbx)arg[2]: 0x7ffffffda90 -->0x0[ -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 堆 -  -  -  -  -  --------------------------]0000|0x7fffffffda78 -->0x7ffff7d3f52c (<main+172>: pop rsi)0008|0x7ffffffda80 -->0x7fffffffda94 -->0x73658b00000000000016|0x7fffffffda88 -->0x7fffffffda94 -->0x73658b00000000000024|0x7fffffffda90 -->0x00032|0x7fffffffda98 -->0x492e085573658b000040|0x7fffffffdaa0 -->0x7ffff7d3f0d0 (<_init>: sub rsp,0x8)0048|0x7fffffffdaa8 -->0x7ffff7d40830 (<__libc_csu_init>: 推 r15)0056|0x7fffffffdab0 -->0x7ffff7d408d0 (<__libc_csu_fini>: 推送 rbp)[------------------------------------------------------------------------------------------]

在子线程上的系统调用指令之后(检查堆栈顶部 - 这不会发生在父线程上):

[--------------------------------------code-------------------------------------]0x7ffff7d8af25 <clone+37>: mov r10,QWORD PTR [rsp+0x8]0x7ffff7d8af2a <克隆+42>: mov eax,0x380x7ffff7d8af2f <克隆+47>:系统调用=>0x7ffff7d8af31 <clone+49>: 测试 rax,rax0x7ffff7d8af34 <克隆+52>: jl 0x7ffff7d8af49 <克隆+73>0x7ffff7d8af36 <克隆+54>: 0x7ffff7d8af39 <克隆+57>0x7ffff7d8af38 <克隆+56>: ret0x7ffff7d8af39 <克隆+57>: xor ebp,ebp[ -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 堆 -  -  -  -  -  --------------------------]0000|0x7ffff8020b60 -->0x7ffff7d3fb30 (<do_something>: push rbx)0008|0x7ffff8020b68 -->0x7ffff7dd5add -->0x4c414d0074736574('测试')0016|0x7ffff8020b70 -->0x00024|0x7ffff8020b78 -->0x4110032|0x7ffff8020b80(参数:0x7ffff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7ffffffda94
")0040|0x7ffff8020b88 ("rs: 0x7ffff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7ffffffda94
")0048|0x7ffff8020b90(fff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94
")0056|0x7ffff8020b98(30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94
")[------------------------------------------------------------------------------------------]

解决方案

通常它的工作方式是,当计算机启动时,Linux 会设置一个 MSR(模型特定寄存器)来使用汇编指令 syscall.汇编指令syscall 将使RIP 寄存器跳转到MSR 中指定的地址进入内核模式.正如英特尔的 64-ia-32-architectures-software-developer-vol-2b-manual 中所述:

<块引用>

SYSCALL 调用特权级别 0 的操作系统系统调用处理程序.它通过从 IA32_LSTAR MSR 加载 RIP 来实现

一旦进入内核模式,内核将查看传递到常规寄存器(RAX、RBX 等)的参数以确定系统调用的要求.然后内核将调用 sys_XXX 函数之一,其原型位于 linux/syscalls.h (https://elixir.bootlin.com/linux/latest/source/include/linux/syscalls.h#L217).sys_clone 的定义在 kernel/fork.c 中.

SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,int __user *, parent_tidptr,int __user *, child_tidptr,无符号长,tls)#万一{返回_do_fork(clone_flags,newsp,0,parent_tidptr,child_tidptr,tls);}

SYSCALLDEFINE5 宏采用第一个参数并为其添加前缀 sys_.这个函数实际上是sys_clone,它调用了_do_fork.

这意味着实际上没有一个 clone() 函数被 glibc 调用来调用内核.内核使用 syscall 指令调用,它跳转到 MSR 中指定的地址,然后调用 sys_call_table 中的系统调用之一.

x86 内核的入口点在这里:https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/entry_64.S.如果向下滚动,您将看到以下行:call *sys_call_table(, %rax, 8).基本上,调用 sys_call_table 的函数之一.sys_call_table 的实现在这里:https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/syscall_64.c#L20.

//SPDX-License-Identifier: GPL-2.0/* x86-64 的系统调用表.*/#include <linux/linkage.h>#include <linux/sys.h>#include <linux/cache.h>#include <linux/syscalls.h>#include <asm/unistd.h>#include <asm/syscall.h>#define __SYSCALL_X32(nr, sym)#define __SYSCALL_COMMON(nr, sym) __SYSCALL_64(nr, sym)#define __SYSCALL_64(nr, sym) extern long __x64_##sym(const struct pt_regs *);#include <asm/syscalls_64.h>#undef __SYSCALL_64#define __SYSCALL_64(nr, sym) [nr] = __x64_##sym,asmlinkage 常量 sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {/** 闻起来像编译器错误——它不起作用* 当 &下面被删除.*/[0 ... __NR_syscall_max] = &__x64_sys_ni_syscall,#include <asm/syscalls_64.h>};

我建议您阅读以下内容:https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-2.html.在这个网站上声明

<块引用>

如您所见,我们在数组末尾包含了 asm/syscalls_64.h 标头.此头文件由 arch/x86/entry/syscalls/syscalltbl.sh 中的特殊脚本生成,并从 syscall 表 (https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/a>).

...

...

因此,在此之后,我们的 sys_call_table 采用以下形式:

asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {[0 ... __NR_syscall_max] = &sys_ni_syscall,[0] = 系统读取,[1] = sys_write,[2] = sys_open,.........};

一旦您生成了表格,当您使用 syscall 汇编指令时,就会跳转到其中一个条目.对于 clone(),它将调用 sys_clone(),它本身调用 _do_fork().是这样定义的:

long _do_fork(unsigned long clone_flags,无符号长堆栈开始,无符号长堆栈大小,int __user *parent_tidptr,int __user *child_tidptr,无符号长 tls){结构任务结构* p;整数跟踪 = 0;长NR;/** 确定是否以及向 ptracer 报告哪个事件.什么时候* 从 kernel_thread 或 CLONE_UNTRACED 调用是明确的* 已请求,未报告任何事件;否则,报告事件* 用于分叉类型已启用.*/if (!(clone_flags & CLONE_UNTRACED)) {if (clone_flags & CLONE_VFORK)跟踪 = PTRACE_EVENT_VFORK;否则 if ((clone_flags & CSIGNAL) != SIGCHLD)跟踪 = PTRACE_EVENT_CLONE;别的跟踪 = PTRACE_EVENT_FORK;如果(可能(!ptrace_event_enabled(当前,跟踪)))跟踪 = 0;}p = copy_process(clone_flags, stack_start, stack_size,child_tidptr, NULL, 跟踪, tls);/** 在唤醒新线程之前执行此操作 - 线程指针* 如果线程快速退出,那么在那之后可能会变得无效.*/如果(!IS_ERR(p)){结构完成 vfork;结构 pid *pid;trace_sched_process_fork(当前,p);pid = get_task_pid(p, PIDTYPE_PID);nr = pid_vnr(pid);if (clone_flags & CLONE_PARENT_SETTID)put_user(nr, parent_tidptr);if (clone_flags & CLONE_VFORK) {p->vfork_done = &vfork;init_completion(&vfork);get_task_struct(p);}唤醒新任务(p);/* 分叉完成,孩子开始运行,告诉 ptracer */如果(不太可能(跟踪))ptrace_event_pid(跟踪,PID);if (clone_flags & CLONE_VFORK) {if (!wait_for_vfork_done(p, &vfork))ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);}put_pid(pid);} 别的 {nr = PTR_ERR(p);}返回 nr;}

它调用wake_up_new_task() 将任务放入运行队列并唤醒它.我很惊讶它甚至立即唤醒了任务.我会猜到调度程序会这样做,并且它会被赋予高优先级以尽快运行.内核本身不必接收函数指针,因为正如克隆()的手册页所述:

<块引用>

原始的 clone() 系统调用更接近于 fork(2)在那个孩子的处决从那个点继续称呼.因此,clone() 包装器的 fn 和 arg 参数函数被省略.

子进程继续执行系统调用.我不完全了解机制,但最终孩子将继续在新线程中执行.父线程(创建新子线程)返回,子线程跳转到指定函数.

我认为它适用于以下几行(在您提供的链接上):

testq %rax,%raxjl SYSCALL_ERROR_LABELjz L(thread_start)//子跳转到thread_startret//父级返回到原来的位置

因为 rax 是一个 64 位的寄存器,所以他们使用 GNU 语法汇编指令测试的 'q' 版本.他们测试 rax 是否为零.如果它小于零,则存在错误.如果为零,则跳转到 thread_start.如果它不为零也不为负(在父线程的情况下),继续执行并返回.新线程以 rax 为 0 创建.它允许区分父线程和子线程.

编辑

如您提供的链接所述,

参数从用户态在寄存器和堆栈中传递:rdi: fnrsi:child_stackrdx:标志rcx: 参数r8d:父项中的 TID 字段r9d:线程指针

所以当你的程序执行以下几行时:

/* 将参数插入新堆栈.*/16 美元,%rsimovq %rcx,8(%rsi)/* 保存函数指针.它将在孩子在 ebx 下摸索着.*/movq %rdi,0(%rsi)

它将函数指针和参数插入到新堆栈中.然后它调用内核,内核本身不必将任何东西压入堆栈.它只是接收新堆栈作为参数,然后让子线程的 RSP 寄存器指向它.我猜这发生在 copy_process() 函数(从 fork() 调用)中,如下所示:

retval = copy_thread_tls(clone_flags, stack_start, stack_size, p, tls);如果(返回值)转到 bad_fork_cleanup_io;

似乎是在本身调用copy_thread() 的copy_thread_tls() 函数中完成的.copy_thread() 在 include/linux/sched.h 中有它的原型,它是基于架构定义的.我不确定它是在哪里为 x86 定义的.

I am trying to implement a minimal kernel and I am trying to implement the clone syscall. In the man pages you can see the clone syscall defined as such:

int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...
                 /* pid_t *parent_tid, void *tls, pid_t *child_tid */ );

As you can see, it receives a function pointer. If you read the man page more closely you can actually see that the actual syscall implementation in the kernel does not receive a function pointer:

long clone(unsigned long flags, void *stack,
                      int *parent_tid, int *child_tid,
                      unsigned long tls);

So, my question is, who modifies the RIP register after a thread is created? Is it the libc?

I found this code in glibc: https://elixir.bootlin.com/glibc/latest/source/sysdeps/unix/sysv/linux/x86_64/clone.S but I am not sure at what point the function is actually called.

Extra information:

When looking at the clone.S source code you can see that it jumps to a thread_start branch after the syscall. On the branch after the clone syscall (so only the child does this) it pops the function address and the arguments from the stack. Who actually pushed these arguments and the function address on the stack? I guess it has to happen somewhere in the kernel because at the point of the syscall instruction they were not there.

Here is some gdb output:

Right before the syscall:

[-------------------------------------code-------------------------------------]
   0x7ffff7d8af22 <clone+34>:   mov    r8,r9
   0x7ffff7d8af25 <clone+37>:   mov    r10,QWORD PTR [rsp+0x8]
   0x7ffff7d8af2a <clone+42>:   mov    eax,0x38
=> 0x7ffff7d8af2f <clone+47>:   syscall 
   0x7ffff7d8af31 <clone+49>:   test   rax,rax
   0x7ffff7d8af34 <clone+52>:   jl     0x7ffff7d8af49 <clone+73>
   0x7ffff7d8af36 <clone+54>:   je     0x7ffff7d8af39 <clone+57>
   0x7ffff7d8af38 <clone+56>:   ret
Guessed arguments:
arg[0]: 0x3d0f00 
arg[1]: 0x7ffff8020b60 --> 0x7ffff7d3fb30 (<do_something>:  push   rbx)
arg[2]: 0x7fffffffda90 --> 0x0 
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffda78 --> 0x7ffff7d3f52c (<main+172>:    pop    rsi)
0008| 0x7fffffffda80 --> 0x7fffffffda94 --> 0x73658b0000000000 
0016| 0x7fffffffda88 --> 0x7fffffffda94 --> 0x73658b0000000000 
0024| 0x7fffffffda90 --> 0x0 
0032| 0x7fffffffda98 --> 0x492e085573658b00 
0040| 0x7fffffffdaa0 --> 0x7ffff7d3f0d0 (<_init>:   sub    rsp,0x8)
0048| 0x7fffffffdaa8 --> 0x7ffff7d40830 (<__libc_csu_init>: push   r15)
0056| 0x7fffffffdab0 --> 0x7ffff7d408d0 (<__libc_csu_fini>: push   rbp)
[------------------------------------------------------------------------------]

After the syscall instruction on the child thread (check the top of the stack - this does not happen on the parent's thread):

[-------------------------------------code-------------------------------------]
   0x7ffff7d8af25 <clone+37>:   mov    r10,QWORD PTR [rsp+0x8]
   0x7ffff7d8af2a <clone+42>:   mov    eax,0x38
   0x7ffff7d8af2f <clone+47>:   syscall 
=> 0x7ffff7d8af31 <clone+49>:   test   rax,rax
   0x7ffff7d8af34 <clone+52>:   jl     0x7ffff7d8af49 <clone+73>
   0x7ffff7d8af36 <clone+54>:   je     0x7ffff7d8af39 <clone+57>
   0x7ffff7d8af38 <clone+56>:   ret    
   0x7ffff7d8af39 <clone+57>:   xor    ebp,ebp
[------------------------------------stack-------------------------------------]
0000| 0x7ffff8020b60 --> 0x7ffff7d3fb30 (<do_something>:    push   rbx)
0008| 0x7ffff8020b68 --> 0x7ffff7dd5add --> 0x4c414d0074736574 ('test')
0016| 0x7ffff8020b70 --> 0x0 
0024| 0x7ffff8020b78 --> 0x411 
0032| 0x7ffff8020b80 ("Parameters: 0x7ffff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94
")
0040| 0x7ffff8020b88 ("rs: 0x7ffff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94
")
0048| 0x7ffff8020b90 ("fff7d3fb30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94
")
0056| 0x7ffff8020b98 ("30 4001536 0x7ffff8020b70 0x7fffffffda90 0x7ffff8000b60 0x7fffffffda94
")
[------------------------------------------------------------------------------]

解决方案

Normally the way it works is that, when the computer boots, Linux sets up a MSR (Model Specific Register) to work with the assembly instruction syscall. The assembly instruction syscall will make the RIP register jump to the address specified in the MSR to enter kernel mode. As stated in 64-ia-32-architectures-software-developer-vol-2b-manual from Intel:

SYSCALL invokes an OS system-call handler at privilege level 0. It does so by loading RIP from the IA32_LSTAR MSR

Once in kernel mode, the kernel will look at the arguments passed into conventional registers (RAX, RBX etc.) to determine what the syscall is asking. Then the kernel will invoke one of the sys_XXX functions whose prototypes are in linux/syscalls.h (https://elixir.bootlin.com/linux/latest/source/include/linux/syscalls.h#L217). The definition of sys_clone is in kernel/fork.c.

SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
         int __user *, parent_tidptr,
         int __user *, child_tidptr,
         unsigned long, tls)
#endif
{
    return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, tls);
}

The SYSCALLDEFINE5 macro takes the first argument and prefixes sys_ to it. This function is actually sys_clone and it calls _do_fork.

It means there really isn't a clone() function which is invoked by glibc to call into the kernel. The kernel is called with the syscall instruction, it jumps to an address specified in the MSR and then it invokes one of the syscalls in the sys_call_table.

The entry point to the kernel for x86 is here: https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/entry_64.S. If you scroll down you'll see the line: call *sys_call_table(, %rax, 8). Basically, call one of the functions of the sys_call_table. The implementation of the sys_call_table is here: https://elixir.bootlin.com/linux/latest/source/arch/x86/entry/syscall_64.c#L20.

// SPDX-License-Identifier: GPL-2.0
/* System call table for x86-64. */

#include <linux/linkage.h>
#include <linux/sys.h>
#include <linux/cache.h>
#include <linux/syscalls.h>
#include <asm/unistd.h>
#include <asm/syscall.h>

#define __SYSCALL_X32(nr, sym)
#define __SYSCALL_COMMON(nr, sym) __SYSCALL_64(nr, sym)

#define __SYSCALL_64(nr, sym) extern long __x64_##sym(const struct pt_regs *);
#include <asm/syscalls_64.h>
#undef __SYSCALL_64

#define __SYSCALL_64(nr, sym) [nr] = __x64_##sym,

asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
    /*
     * Smells like a compiler bug -- it doesn't work
     * when the & below is removed.
     */
    [0 ... __NR_syscall_max] = &__x64_sys_ni_syscall,
#include <asm/syscalls_64.h>
};

I recommend you read the following: https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-2.html. On this website is stated that

As you can see, we include the asm/syscalls_64.h header at the end of the array. This header file is generated by the special script at arch/x86/entry/syscalls/syscalltbl.sh and generates our header file from the syscall table (https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/syscalls/syscall_64.tbl).

...

...

So, after this, our sys_call_table takes the following form:

asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
   [0 ... __NR_syscall_max] = &sys_ni_syscall,
   [0] = sys_read,
   [1] = sys_write,
   [2] = sys_open,
   ...
   ...
   ...
};

Once you have the table generated, one of its entries is being jumped to when you use the syscall assembly instruction. For clone() it will call sys_clone() which itself calls _do_fork(). Which is defined as such:

long _do_fork(unsigned long clone_flags,
          unsigned long stack_start,
          unsigned long stack_size,
          int __user *parent_tidptr,
          int __user *child_tidptr,
          unsigned long tls)
{
    struct task_struct *p;
    int trace = 0;
    long nr;

    /*
     * Determine whether and which event to report to ptracer.  When
     * called from kernel_thread or CLONE_UNTRACED is explicitly
     * requested, no event is reported; otherwise, report if the event
     * for the type of forking is enabled.
     */
    if (!(clone_flags & CLONE_UNTRACED)) {
        if (clone_flags & CLONE_VFORK)
            trace = PTRACE_EVENT_VFORK;
        else if ((clone_flags & CSIGNAL) != SIGCHLD)
            trace = PTRACE_EVENT_CLONE;
        else
            trace = PTRACE_EVENT_FORK;

        if (likely(!ptrace_event_enabled(current, trace)))
            trace = 0;
    }

    p = copy_process(clone_flags, stack_start, stack_size,
             child_tidptr, NULL, trace, tls);
    /*
     * Do this prior waking up the new thread - the thread pointer
     * might get invalid after that point, if the thread exits quickly.
     */
    if (!IS_ERR(p)) {
        struct completion vfork;
        struct pid *pid;

        trace_sched_process_fork(current, p);

        pid = get_task_pid(p, PIDTYPE_PID);
        nr = pid_vnr(pid);

        if (clone_flags & CLONE_PARENT_SETTID)
            put_user(nr, parent_tidptr);

        if (clone_flags & CLONE_VFORK) {
            p->vfork_done = &vfork;
            init_completion(&vfork);
            get_task_struct(p);
        }

        wake_up_new_task(p);

        /* forking complete and child started to run, tell ptracer */
        if (unlikely(trace))
            ptrace_event_pid(trace, pid);

        if (clone_flags & CLONE_VFORK) {
            if (!wait_for_vfork_done(p, &vfork))
                ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);
        }

        put_pid(pid);
    } else {
        nr = PTR_ERR(p);
    }
    return nr;
}

It calls wake_up_new_task() which puts the task on the runqueue and wakes it. I'm surprised it even wakes the task immediatly. I would have guessed that the scheduler would have done it instead and that it would have been given a high priority to run as soon as possible. In itself, the kernel doesn't have to receive a function pointer because as stated on the manpage for clone():

The raw clone() system call corresponds more closely to fork(2) in that execution in the child continues from the point of the call. As such, the fn and arg arguments of the clone() wrapper function are omitted.

The child continues execution where the syscall was made. I don't understand exactly the mechanism but in the end the child will continue execution in a new thread. The parent thread (which created the new child thread) returns and the child thread jumps to the specified function instead.

I think it works with the following lines (on the link you provided):

testq   %rax,%rax
jl  SYSCALL_ERROR_LABEL
jz  L(thread_start) //Child jumps to thread_start

ret //Parent returns to where it was

Because rax is a 64 bits register, they use the 'q' version of the GNU syntax assembly instruction test. They test if rax is zero. If it is less than zero then there was an error. If it is zero then jump to thread_start. If it is not zero nor negative (in the case of the parent thread), continue execution and return. The new thread is created with rax as 0. It allows to diffenrentiate between the parent and the child thread.

EDIT

As stated on the link you provided,

The parameters are passed in register and on the stack from userland:
rdi: fn
rsi: child_stack
rdx: flags
rcx: arg
r8d: TID field in parent
r9d: thread pointer

So when your program executes the following lines:

/* Insert the argument onto the new stack.  */
subq    $16,%rsi
movq    %rcx,8(%rsi)

/* Save the function pointer.  It will be popped off in the
      child in the ebx frobbing below.  */
movq    %rdi,0(%rsi)

it inserts the function pointer and arguments onto the new stack. Then it calls the kernel which itself doesn't have to push anything onto the stack. It just receives the new stack as an argument and then makes the child's thread RSP register point to it. I would guess this happens in the copy_process() function (called from fork()) along the lines of:

retval = copy_thread_tls(clone_flags, stack_start, stack_size, p, tls);
if (retval)
    goto bad_fork_cleanup_io;

It seems to be done in the copy_thread_tls() function which itself calls copy_thread(). copy_thread() has its prototype in include/linux/sched.h and it is defined based on the architecture. I'm not sure where it is defined for x86.

这篇关于当您调用克隆系统调用时,谁设置了 RIP 寄存器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆