系统调用如何工作? [英] How do system calls work?

查看:198
本文介绍了系统调用如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道用户可以拥有一个进程,每个进程都有一个地址空间(其中包含有效的内存位置,此过程可以引用)。我知道一个进程可以调用系统调用并传递参数给它,就像任何其他库函数。这似乎表明所有的系统调用都在一个进程地址空间通过共享内存等。但也许,这只是一个错觉,由高级编程语言,系统调用看起来像任何其他功能的事实,当一个进程调用它。



但是,现在让我进一步深入分析下面发生的情况。编译器如何编译系统调用?它可能将进程提供的系统调用名称和参数推入堆栈,然后将汇编指令说TRAP或某些东西 - 基本上是汇编指令调用软件中断。



此TRAP汇编指令由硬件执行,首先将模式位从用户切换到内核,然后将代码指针设置为中断服务程序的开始。从这一点开始,ISR以内核模式执行,从堆栈中拾取参数(这是可能的,因为内核可以访问任何内存位置,甚至是用户进程拥有的内存位置),并执行系统调用和



我的理解是否正确?



附件是我理解的粗略图:

解决方案

您的理解非常接近;窍门是大多数编译器不会编写系统调用,因为程序调用的函数(例如 getpid(2) chdir(2) code>等)实际上是由标准C库提供的。标准C库包含系统调用的代码,无论它是通过 INT 0x80 还是 SYSENTER 调用。这将是一个奇怪的程序,使系统调用没有库做的工作。 (即使 perl 提供了一个 syscall()函数,可以直接进行系统调用! p>

接下来,内存。操作系统内核有时具有对用户进程内存的轻松地址空间访问。当然,保护模式是不同的,并且用户提供的数据必须被复制到内核的受保护地址空间中,以防止在系统调用正在飞行时用户提供的数据的修改。:

  static int do_getname(const char __user * filename,char * page)
{
int retval;
unsigned long len = PATH_MAX;

if(!segment_eq(get_fs(),KERNEL_DS)){
if((unsigned long)filename> = TASK_SIZE)
return -EFAULT;
if(TASK_SIZE - (unsigned long)filename< PATH_MAX)
len = TASK_SIZE - (unsigned long)filename;
}

retval = strncpy_from_user(page,filename,len);
if(retval> 0){
if(retval< len)
return 0;
return -ENAMETOOLONG;
} else if(!retval)
retval = -ENOENT;
return retval;
}

这不是系统调用本身, >帮助函数通过系统调用函数将文件名复制到内核的地址空间。它检查以确保整个文件名驻留在用户的数据范围内,调用从用户空间复制字符串的函数,并在返回之前执行一些健全性检查。



get_fs()和类似的函数是Linux的x86根的残余。



所有额外的段工作都是因为内核和用户空间 共享一部分可用的地址空间。在32位平台上(数字很容易理解),内核通常有一千兆字节的虚拟地址空间,用户进程通常有三千兆字节的虚拟地址空间。



当进程调用内核时,内核将修复页表权限,以允许它访问整个范围,并获得预填充 TLB条目用于用户提供的内存。巨大的成功。但是,当内核必须上下文切换回用户空间时,它必须刷新TLB以删除内核地址空间页面上的缓存特权。



但诀窍是,一千兆字节的虚拟地址空间不足以用于大型机器上的所有内核数据结构。维护缓存文件系统和块设备驱动程序的元数据,网络堆栈以及系统上所有进程的内存映射可以占用大量数据。因此,

因此不同的分裂是可用的:用户的两个演奏,内核的两个演奏,用户的一个演奏,内核的三个演奏等。随着内核的空间上升,用户进程的空间下降。因此, 4:4 内存分割,为用户进程提供4 GB,对内核提供4 GB,内核必须解析段描述符才能访问用户内存。 TLB被刷新进入和退出系统调用,这是一个非常显着的速度惩罚。但它允许内核保持显着更大的数据结构。



64位平台的大得多的页表和地址范围可能使所有前面看起来古怪。我肯定希望如此,无论如何。


I understand that a user can own a process and each process has an address space (which contains valid memory locations, this process can reference). I know that a process can call a system call and pass parameters to it, just like any other library function. This seems to suggest that all system calls are in a process address space by sharing memory, etc. But perhaps, this is only an illusion created by the fact that in high level programming language, system calls look like any other function, when a process calls it.

But, now let me take a step deeper and analyze more closely on what happens under the hood. How does compiler compile a system call? It perhaps pushes the system call name and parameters supplied by the process in a stack and then put the assembly instruction say "TRAP" or something -- basically the assembly instruction to call a software interrupt.

This TRAP assembly instruction is executed by hardware by first toggling the mode bit from user to kernel and then setting the code pointer to say beginning of interrupt service routines. From this point on, the ISR executes in kernel mode, which picks up the parameters from the stack (this is possible, because kernel has access to any memory location, even the ones owned by user processes) and executes the system call and in the end relinquishes the CPU, which again toggles the mode bit and the user process starts from where it left off.

Is my understanding correct?

Attached is rough diagram of my understanding:

解决方案

Your understanding is pretty close; the trick is that most compilers will never write system calls, because the functions that programs call (e.g. getpid(2), chdir(2), etc.) are actually provided by the standard C library. The standard C library contains the code for the system call, whether it is called via INT 0x80 or SYSENTER. It'd be a strange program that makes system calls without a library doing the work. (Even though perl provides a syscall() function that can directly make system calls! Crazy, right?)

Next, the memory. The operating system kernel sometimes has easy address-space access to the user process memory. Of course, protection modes are different, and user-supplied data must be copied into the kernel's protected address space to prevent modification of user-supplied data while the system call is in flight:

static int do_getname(const char __user *filename, char *page)
{
    int retval;
    unsigned long len = PATH_MAX;

    if (!segment_eq(get_fs(), KERNEL_DS)) {
        if ((unsigned long) filename >= TASK_SIZE)
            return -EFAULT;
        if (TASK_SIZE - (unsigned long) filename < PATH_MAX)
            len = TASK_SIZE - (unsigned long) filename;
    }

    retval = strncpy_from_user(page, filename, len);
    if (retval > 0) {
        if (retval < len)
            return 0;
        return -ENAMETOOLONG;
    } else if (!retval)
        retval = -ENOENT;
    return retval;
}

This, while it isn't a system call itself, is a helper function called by system call functions that copies filenames into the kernel's address space. It checks to make sure that the entire filename resides within the user's data range, calls a function that copies the string in from user space, and performs some sanity checks before the returning.

get_fs() and similar functions are remnants from Linux's x86-roots. The functions have working implementations for all architectures, but the names remain archaic.

All the extra work with segments is because the kernel and userspace might share some portion of the available address space. On a 32-bit platform (where the numbers are easy to comprehend), the kernel will typically have one gigabyte of virtual address space, and user processes will typically have three gigabytes of virtual address space.

When a process calls into the kernel, the kernel will 'fix up' the page table permissions to allow it access to the whole range, and gets the benefit of pre-filled TLB entries for user-provided memory. Great success. But when the kernel must context switch back to userspace, it has to flush the TLB to remove the cached privileges on kernel address space pages.

But the trick is, one gigabyte of virtual address space is not sufficient for all kernel data structures on huge machines. Maintaining the metadata of cached filesystems and block device drivers, networking stacks, and the memory mappings for all the processes on the system, can take a huge amount of data.

So different 'splits' are available: two gigs for user, two gigs for kernel, one gig for user, three gigs for kernel, etc. As the space for the kernel goes up, the space for user processes goes down. So there is a 4:4 memory split that gives four gigabytes to the user process, four gigabytes to the kernel, and the kernel must fiddle with segment descriptors to be able to access user memory. The TLB is flushed entering and exiting system calls, which is a pretty significant speed penalty. But it lets the kernel maintain significantly larger data structures.

The much larger page tables and address ranges of 64 bit platforms probably makes all the preceding look quaint. I sure hope so, anyway.

这篇关于系统调用如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆