syscall如何知道要跳转到哪里? [英] How syscall knows where to jump?

查看:84
本文介绍了syscall如何知道要跳转到哪里?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Linux如何确定要通过syscall执行的另一个进程的地址?像这个例子一样?

How does Linux determine the address of another process to execute with a syscall? Like in this example?

mov rax, 59 
mov rdi, progName
syscall

澄清一下,我的问题似乎有点混乱,我要问的是syscall如何工作,与传递的寄存器或参数无关.当另一个进程被调用时,它如何知道在哪里跳转,返回等.

It seems there is a bit of confusion with my question, to clarify, what I was asking is how does syscall works, independently of the registers or arguments passed. How it knows where to jump, return etc when an other process is called.

推荐答案

syscall

syscall指令实际上只是一个INTEL/AMD CPU指令.这是简介:

syscall

The syscall instruction is really just an INTEL/AMD CPU instruction. Here is the synopsis:

IF (CS.L ≠ 1 ) or (IA32_EFER.LMA ≠ 1) or (IA32_EFER.SCE ≠ 1)
  THEN #UD;
FI;
RCX ← RIP;
RIP ← IA32_LSTAR;
R11 ← RFLAGS;
RFLAGS ← RFLAGS AND NOT(IA32_FMASK);
CS.Selector ← IA32_STAR[47:32] AND FFFCH
CS.Base ← 0;
CS.Limit ← FFFFFH;
CS.Type ← 11;
CS.S ← 1;
CS.DPL ← 0;
CS.P ← 1;
CS.L ← 1;
CS.D ← 0;
CS.G ← 1;
CPL ← 0;
SS.Selector ← IA32_STAR[47:32] + 8;
SS.Base ← 0;
SS.Limit ← FFFFFH;
SS.Type ← 3;
SS.S ← 1;
SS.DPL ← 0;
SS.P ← 1;
SS.B ← 1;
SS.G ← 1;

最重要的部分是保存和管理RIP寄存器的两条指令:

The most important part are the two instructions that save and manage the RIP register:

RCX ← RIP
RIP ← IA32_LSTAR

因此,换句话说,在IA32_LSTAR(寄存器)中保存的地址上必须有代码,而RCX是返回地址.

So in other words, there must be code at the address saved in IA32_LSTAR (a register) and RCX is the return address.

还调整了CSSS段,因此您的内核代码将能够在CPU级别0(特权级别)上进一步运行.

The CS and SS segments are also tweaked so your kernel code will be able to further run at CPU Level 0 (a privileged level.)

如果您无权执行syscall或该指令不存在,则可能会发生#UD.

The #UD may happen if you do not have the right to execute syscall or if the instruction doesn't exist.

这只是内核函数指针表的索引.首先,内核进行边界检查(如果RAX > __NR_syscall_max,则返回-ENOSYS),然后分派给(C语法)sys_call_table[rax](rdi, rsi, rdx, r10, r8, r9);

This is just an index into a table of kernel function pointers. First the kernel does a bounds-check (and returns -ENOSYS if RAX > __NR_syscall_max), then dispatches to (C syntax) sys_call_table[rax](rdi, rsi, rdx, r10, r8, r9);

; Intel-syntax translation of Linux 4.12 syscall entry point
       ...                 ; save user-space registers etc.
    call   [sys_call_table + rax * 8]       ; dispatch to sys_execve() or whatever kernel C function

;;; execve probably won't return via this path, but most other calls will
       ...                 ; restore registers except RAX return value, and return to user-space

现代Linux在实践中更加复杂,因为通过更改页表来解决诸如Meltdown和L1TF之类的x86漏洞,因此大多数内核内存在用户空间运行时不会被映射.上面的代码是如果您在64位代码中使用32位int 0x80 Linux ABI会发生什么?详细了解系统调用分派的内核方面.

Modern Linux is more complicated in practice because of workarounds for x86 vulnerabilities like Meltdown and L1TF by changing the page tables so most of kernel memory isn't mapped while user-space is running. The above code is a literal translation (from AT&T syntax) of call *sys_call_table(, %rax, 8) from ENTRY(entry_SYSCALL_64) in Linux 4.12 arch/x86/entry/entry_64.S (before Spectre/Meltdown mitigations were added). Also related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? has some more details about the kernel side of system-call dispatching.

该指令据说是 fast .这是因为在过去,人们将不得不使用诸如INT3之类的指令.中断利用内核堆栈,它将堆栈中的许多寄存器压入堆栈,并使用速度较慢的RTE退出异常状态并返回到中断后的地址.通常这要慢得多.

The instruction is said to be fast. This is because in the old days one would have to use an instruction such as INT3. The interrupts make use of the kernel stack, it pushes many registers on the stack and uses the rather slow RTE to exit the exception state and return to the address just after the interrupt. This is generally much slower.

使用syscall,您可以避免大部分的开销.但是,按照您的要求,这并没有真正的帮助.

With the syscall you may be able to avoid most of that overhead. However, in what you're asking, this is not really going to help.

syscall一起使用的另一条指令是swapgs.这为内核提供了一种访问自己的数据和堆栈的方法.您应该查看有关这些说明的英特尔/AMD文档,以了解更多详细信息.

Another instruction which is used along syscall is swapgs. This gives the kernel a way to access its own data and stack. You should look at the Intel/AMD documentation about those instructions for more details.

Linux系统具有所谓的任务表.每个进程和一个进程中的每个线程实际上都称为任务.

The Linux system has what it calls a task table. Each process and each thread within a process is actually called a task.

创建新进程时,Linux将创建一个任务.为此,它运行的代码可以执行以下操作:

When you create a new process, Linux creates a task. For that to work, it runs codes which does things such as:

  • 确保可执行文件存在
  • 设置新任务(包括从该可执行文件解析ELF程序头,以在新创建的虚拟地址空间中创建内存映射.)
  • 分配堆栈缓冲区
  • 加载可执行文件的前几个块(作为按需分页的优化),为要映射的虚拟页面分配一些物理页面.
  • 在任务中设置起始地址(可执行文件中的ELF入口点)
  • 将任务标记为就绪(也可以运行)

这当然是超级简化.

起始地址在您的ELF二进制文件中定义.实际上,只需要确定一个地址并将其保存在任务当前的RIP指针中,然后返回"用户空间即可.正常的需求分页机制将处理其余的工作:如果尚未加载代码,它将生成#PF页面错误异常,并且内核将在此时加载必要的代码.尽管在大多数情况下,加载程序已经将软件的某些部分作为优化程序进行加载,以避免出现初始页面错误.

The start address is defined in your ELF binary. It really only needs to determine that one address and save it in the task current RIP pointer and "return" to user-space. The normal demand-paging mechanism will take care of the rest: if the code is not yet loaded, it will generate a #PF page-fault exception and the kernel will load the necessary code at that point. Although in most cases the loader will already have some part of the software loaded as an optimization to avoid that initial page-fault.

(页面上未映射的#PF会导致内核向您的进程传递SIGSEGV segfault信号,但内核会静默处理有效"页面错误.)

(A #PF on a page that isn't mapped would result in the kernel delivering a SIGSEGV segfault signal to your process, but a "valid" page fault is handled silently by the kernel.)

所有新进程通常都加载到相同的虚拟地址(忽略PIE + ASLR).这是可能的,因为我们使用MMU(内存管理单元).该协处理器在虚拟地址空间和物理地址空间之间转换内存地址.

All new processes usually get loaded at the same virtual address (ignoring PIE + ASLR). This is possible because we use the MMU (Memory Management Unit). That coprocessor translates memory addresses between virtual address spaces and physical address space.

(编者注:MMU并不是真正的协处理器;在现代CPU中,虚拟内存逻辑与L1指令/数据高速缓存紧密地集成在每个内核中.不过,某些古老的CPU确实使用了外部MMU芯片. )

(Editor's note: the MMU isn't really a coprocessor; in modern CPUs virtual memory logic is tightly integrated into each core, along side the L1 instruction/data caches. Some ancient CPUs did use an external MMU chip, though.)

因此,现在我们了解到所有进程都具有相同的虚拟地址(Linux下的0x400000是ld选择的默认地址).为了确定实际的物理地址,我们使用MMU.内核如何确定该物理地址?好吧,它具有内存分配功能.这么简单.

So, now we understand that all processes have the same virtual address (0x400000 under Linux is the default chosen by ld). To determine the real physical address we use the MMU. How does the kernel decide of that physical address? Well, it has a memory allocation function. That simple.

它调用"malloc()"类型的函数,该函数搜索当前不使用的内存块并在该位置创建(也称为加载)进程.如果当前没有可用的内存块,则内核将检查是否有某些内容从内存中交换出来.如果失败,则该过程的创建将失败.

It calls a "malloc()" type of function which searches for a memory block which is not currently used and creates (a.k.a. loads) the process at that location. If no memory block is currently available, the kernel checks for swapping something out of memory. If that fails, the creation of the process fails.

在创建进程的情况下,它将开始分配相当大的内存块.分配1Mb或2Mb缓冲区以启动新进程并不常见.这样可以使事情进行得更快.

In case of a process creation, it will allocate pretty large blocks of memory to start with. It is not unusual to allocate 1Mb or 2Mb buffers to start a new process. This makes things go a lot faster.

此外,如果该进程已经在运行并且您再次启动它,则可以重新使用已经在运行的实例使用的大量内存.在那种情况下,内核不会分配/加载那些部分.它将使用MMU共享可以在流程的两个实例中通用的那些页面(即,在大多数情况下,流程的代码部分可以共享,因为它是只读的,某些数据可以在共享时共享).它也被标记为只读;如果未标记为只读,那么即使尚未修改数据,仍然可以共享数据-在这种情况下,它被标记为写入时复制.)

Also, if the process is already running and you starting it again, a lot of the memory used by the already running instance can be reused. In that case the kernel does not allocate/load those parts. It will use the MMU to share those pages that can be made common to both instances of the process (i.e. in most cases the code part of the process can be shared since it is read-only, some part of the data can be shared when it is also marked as read-only; if not marked read-only, the data can still be shared if it wasn't modified yet--in this case it's marked as copy on write.)

这篇关于syscall如何知道要跳转到哪里?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆