C标准库函数与系统调用.哪个是open()? [英] C Standard Library Functions vs. System Calls. Which is `open()`?

查看:108
本文介绍了C标准库函数与系统调用.哪个是open()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道fopen()在C标准库中,因此我可以肯定地在C程序中调用fopen()函数.我感到困惑的是为什么我也可以调用open()函数. open()应该是系统调用,因此它不是标准库中的C函数.因为我能够成功调用open()函数,所以我正在调用C函数还是系统调用?

I know fopen() is in the C standard library, so that I can definitely call the fopen() function in a C program. What I am confused about is why I can call the open() function as well. open() should be a system call, so it is not a C function in the standard library. As I am successfully able to call the open() function, am I calling a C function or a system call?

推荐答案

EJP对问题的评论和史蒂夫·萨米特(Steve Summit)的答案恰到好处:open()既是系统调用又是标准C库中的函数. fopen()是标准C库中的函数,该函数设置文件句柄(一种类型为FILE的数据结构,其中包含诸如可选缓冲之类的其他内容),并在内部也调用open().

EJP's comments to the question and Steve Summit's answer are exactly to the point: open() is both a syscall and a function in the standard C library; fopen() is a function in the standard C library, that sets up a file handle -- a data structure of type FILE that contains additional stuff like optional buffering --, and internally calls open() also.

为了希望进一步理解,我将展示hello.c,它是一个示例性的Hello world-程序,它使用C语言为Linux在64位x86(x86-64 AKA AMD64体系结构)上编写,该程序未使用标准C库.完全没有.

In the hopes to further understanding, I shall show hello.c, an example Hello world -program written in C for Linux on 64-bit x86 (x86-64 AKA AMD64 architecture), which does not use the standard C library at all.

首先,hello.c需要使用内联汇编定义一些宏,以便我们能够调用syscall.这些都是非常依赖于体系结构和操作系统的,这就是为什么它只能在Linux上的x86-64体系结构上工作的原因:

First, hello.c needs to define some macros with inline assembly for us to be able to call the syscalls. These are very architecture- and operating system dependent, which is why this only works in Linux on x86-64 architecture:

/* Freestanding Hello World example in Linux on x86_64/x86.
 * Compile using
 *      gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello
*/
#define STDOUT_FILENO 1
#define EXIT_SUCCESS  0

#ifndef __x86_64__
#error  This program only works on x86_64 architecture!
#endif

#define SYS_write    1
#define SYS_exit    60

#define SYSCALL1_NORET(nr, arg1) \
    __asm__ ( "syscall\n\t" \
            : \
            : "a" (nr), "D" (arg1) \
            : "rcx", "r11" )

#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
    __asm__ ( "syscall\n\t" \
            : "=a" (retval) \
            : "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) \
            : "rcx", "r11" )

文件开头注释中的Freestanding是指独立执行环境" ;根本没有C库可用的情况.例如,Linux内核是用相同的方式编写的.顺便说一下,我们熟悉的普通环境称为托管执行环境" .

The Freestanding in the comment at the beginning of the file refers to "freestanding execution environment"; it is the case when there is no C library available at all. For example, the Linux kernel is written the same way. The normal environment we are familiar with is called "hosted execution environment", by the way.

接下来,我们可以围绕系统调用定义两个函数或包装器":

Next, we can define two functions, or "wrappers", around the syscalls:

static inline void my_exit(int retval)
{
    SYSCALL1_NORET(SYS_exit, retval);
}

static inline int my_write(int fd, const void *data, int len)
{
    int retval;

    if (fd == -1 || !data || len < 0)
        return -1;

    SYSCALL3(retval, SYS_write, fd, data, len);

    if (retval < 0)
        return -1;

    return retval;
}

以上,my_exit()大致等效于C标准库 函数,然后按my_write() write() .

Above, my_exit() is roughly equivalent to C standard library exit() function, and my_write() to write().

C语言没有定义执行syscall的任何方式,因此这就是为什么我们总是需要某种包装器"功能的原因. (GNU C库确实提供了 syscall() 函数,用于我们执行我们希望执行的所有系统调用-但此示例的要点是根本不使用C库.)

The C language does not define any kind of a way to do a syscall, so that is why we always need a "wrapper" function of some sort. (The GNU C library does provide a syscall() function for us to do any syscall we wish -- but the point of this example is to not use the C library at all.)

包装函数总是涉及一些(内联)汇编.同样,由于C没有内置的方法可以进行syscall,因此我们需要通过添加一些汇编代码来扩展"该语言.该(内联)程序集和syscall编号是使此示例与操作系统和体系结构相关的原因.是的:例如,GNU C库包含很多架构的等效包装器.

The wrapper functions always involve a bit of (inline) assembly. Again, since C does not have a built-in way to do a syscall, we need to "extend" the language by adding some assembly code. This (inline) assembly, and the syscall numbers, is what makes this example, operating system and architecture dependent. And yes: the GNU C library, for example, contains the equivalent wrappers for quite a few architectures.

C库中的某些函数不使用任何系统调用.我们还需要一个等价于 strlen() :

Some of the functions in the C library do not use any syscalls. We also need one, the equivalent of strlen():

static inline int my_strlen(const char *str)
{
    int len = 0L;

    if (!str)
        return -1;

    while (*str++)
        len++;

    return len;
}

请注意,以上代码中的任何地方都没有使用NULL.这是因为它是C库定义的宏.相反,我依赖于逻辑空值":当且仅当pointer是零指针时,(!pointer)为true,这是Linux中所有体系结构上的NULL.我本来可以定义NULL,但我没有定义,希望有人会注意到它的不足.

Note that there is no NULL used anywhere in the above code. It is because it is a macro defined by the C library. Instead, I'm relying on "logical null": (!pointer) is true if and only if pointer is a zero pointer, which is what NULL is on all architectures in Linux. I could have defined NULL myself, but I didn't, in the hopes that somebody might notice the lack of it.

最后,main()本身是GNU C库调用的东西,例如在Linux中,二进制文件的实际起始点称为_start. _start由托管的运行时环境提供,用于初始化C库数据结构并进行其他类似的准备.我们的示例程序是如此简单,我们不需要它,因此我们可以将简单的主程序部分放入_start中:

Finally, main() itself is something the GNU C library calls, as in Linux, the actual start point of the binary is called _start. The _start is provided by the hosted runtime environment, and initializes the C library data structures and does other similar preparations. Our example program is so simple we do not need it, so we can just put our simple main program part into _start instead:

void _start(void)
{
    const char *msg = "Hello, world!\n";
    my_write(STDOUT_FILENO, msg, my_strlen(msg));
    my_exit(EXIT_SUCCESS);
}

如果将以上所有内容放在一起,并使用

If you put all of the above together, and compile it using

gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello

根据文件开头的注释,您将获得一个很小的(约2 KB)静态二进制文件,该二进制文件在运行时会显示

per the comment at the start of the file, you will end up with a small (about two kilobytes) static binary, that when run,

./hello

输出

Hello, world!

您可以使用file hello检查文件的内容.如果文件大小确实很重要,则可以运行strip hello删除所有(不需要的)符号,从而将文件大小进一步减小到大约一个半字节. (但是,这会使对象转储变得不那么有趣,因此,在执行此操作之前,请先检查下一步.)

You can use file hello to examine the contents of the file. You could run strip hello to remove all (unneeded) symbols, reducing the file size further down to about one and a half kilobytes, if file size was really important. (It will make the object dump less interesting, however, so before you do that, check out the next step first.)

我们可以使用objdump -x hello来检查文件中的各个部分:

We can use objdump -x hello to examine the sections in the file:

hello:     file format elf64-x86-64
hello
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004001e1

Program Header:
    LOAD off    0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
         filesz 0x00000000000002f0 memsz 0x00000000000002f0 flags r-x
    NOTE off    0x0000000000000120 vaddr 0x0000000000400120 paddr 0x0000000000400120 align 2**2
         filesz 0x0000000000000024 memsz 0x0000000000000024 flags r--
EH_FRAME off    0x000000000000022c vaddr 0x000000000040022c paddr 0x000000000040022c align 2**2
         filesz 0x000000000000002c memsz 0x000000000000002c flags r--
   STACK off    0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
         filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .note.gnu.build-id 00000024  0000000000400120  0000000000400120  00000120  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         000000d9  0000000000400144  0000000000400144  00000144  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .rodata       0000000f  000000000040021d  000000000040021d  0000021d  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .eh_frame_hdr 0000002c  000000000040022c  000000000040022c  0000022c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .eh_frame     00000098  0000000000400258  0000000000400258  00000258  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .comment      00000034  0000000000000000  0000000000000000  000002f0  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
0000000000400120 l    d  .note.gnu.build-id     0000000000000000 .note.gnu.build-id
0000000000400144 l    d  .text  0000000000000000 .text
000000000040021d l    d  .rodata        0000000000000000 .rodata
000000000040022c l    d  .eh_frame_hdr  0000000000000000 .eh_frame_hdr
0000000000400258 l    d  .eh_frame      0000000000000000 .eh_frame
0000000000000000 l    d  .comment       0000000000000000 .comment
0000000000000000 l    df *ABS*  0000000000000000 hello.c
0000000000400144 l     F .text  0000000000000016 my_exit
000000000040015a l     F .text  000000000000004e my_write
00000000004001a8 l     F .text  0000000000000039 my_strlen
0000000000000000 l    df *ABS*  0000000000000000 
000000000040022c l       .eh_frame_hdr  0000000000000000 __GNU_EH_FRAME_HDR
00000000004001e1 g     F .text  000000000000003c _start
0000000000601000 g       .eh_frame      0000000000000000 __bss_start
0000000000601000 g       .eh_frame      0000000000000000 _edata
0000000000601000 g       .eh_frame      0000000000000000 _end

.text部分包含我们的代码和.rodata不可变常量;在这里,只是Hello, world!字符串文字.其余部分是链接器添加的内容以及系统使用的内容.我们可以看到,我们有f(hex)= 15个字节的只读数据,而d9(hex)= 217个字节的代码;文件的其余部分(大约一千字节左右)是链接器添加的ELF内容,供内核在执行此二进制文件时使用.

The .text section contains our code, and .rodata immutable constants; here, just the Hello, world! string literal. The rest of the sections are stuff the linker adds and the system uses. We can see that we have f(hex) = 15 bytes of read-only data, and d9(hex) = 217 bytes of code; the rest of the file (about a kilobyte or so) is ELF stuff added by the linker for the kernel to use when executing this binary.

我们甚至可以通过运行objdump -d hello来检查hello中包含的实际汇编代码:

We can even examine the actual assembly code contained in hello, by running objdump -d hello:

hello:     file format elf64-x86-64


Disassembly of section .text:

0000000000400144 <my_exit>:
  400144:       55                      push   %rbp
  400145:       48 89 e5                mov    %rsp,%rbp
  400148:       89 7d fc                mov    %edi,-0x4(%rbp)
  40014b:       b8 3c 00 00 00          mov    $0x3c,%eax
  400150:       8b 55 fc                mov    -0x4(%rbp),%edx
  400153:       89 d7                   mov    %edx,%edi
  400155:       0f 05                   syscall 
  400157:       90                      nop
  400158:       5d                      pop    %rbp
  400159:       c3                      retq   

000000000040015a <my_write>:
  40015a:       55                      push   %rbp
  40015b:       48 89 e5                mov    %rsp,%rbp
  40015e:       89 7d ec                mov    %edi,-0x14(%rbp)
  400161:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
  400165:       89 55 e8                mov    %edx,-0x18(%rbp)
  400168:       83 7d ec ff             cmpl   $0xffffffff,-0x14(%rbp)
  40016c:       74 0d                   je     40017b <my_write+0x21>
  40016e:       48 83 7d e0 00          cmpq   $0x0,-0x20(%rbp)
  400173:       74 06                   je     40017b <my_write+0x21>
  400175:       83 7d e8 00             cmpl   $0x0,-0x18(%rbp)
  400179:       79 07                   jns    400182 <my_write+0x28>
  40017b:       b8 ff ff ff ff          mov    $0xffffffff,%eax
  400180:       eb 24                   jmp    4001a6 <my_write+0x4c>
  400182:       b8 01 00 00 00          mov    $0x1,%eax
  400187:       8b 7d ec                mov    -0x14(%rbp),%edi
  40018a:       48 8b 75 e0             mov    -0x20(%rbp),%rsi
  40018e:       8b 55 e8                mov    -0x18(%rbp),%edx
  400191:       0f 05                   syscall 
  400193:       89 45 fc                mov    %eax,-0x4(%rbp)
  400196:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  40019a:       79 07                   jns    4001a3 <my_write+0x49>
  40019c:       b8 ff ff ff ff          mov    $0xffffffff,%eax
  4001a1:       eb 03                   jmp    4001a6 <my_write+0x4c>
  4001a3:       8b 45 fc                mov    -0x4(%rbp),%eax
  4001a6:       5d                      pop    %rbp
  4001a7:       c3                      retq   

00000000004001a8 <my_strlen>:
  4001a8:       55                      push   %rbp
  4001a9:       48 89 e5                mov    %rsp,%rbp
  4001ac:       48 89 7d e8             mov    %rdi,-0x18(%rbp)
  4001b0:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  4001b7:       48 83 7d e8 00          cmpq   $0x0,-0x18(%rbp)
  4001bc:       75 0b                   jne    4001c9 <my_strlen+0x21>
  4001be:       b8 ff ff ff ff          mov    $0xffffffff,%eax
  4001c3:       eb 1a                   jmp    4001df <my_strlen+0x37>
  4001c5:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4001c9:       48 8b 45 e8             mov    -0x18(%rbp),%rax
  4001cd:       48 8d 50 01             lea    0x1(%rax),%rdx
  4001d1:       48 89 55 e8             mov    %rdx,-0x18(%rbp)
  4001d5:       0f b6 00                movzbl (%rax),%eax
  4001d8:       84 c0                   test   %al,%al
  4001da:       75 e9                   jne    4001c5 <my_strlen+0x1d>
  4001dc:       8b 45 fc                mov    -0x4(%rbp),%eax
  4001df:       5d                      pop    %rbp
  4001e0:       c3                      retq   

00000000004001e1 <_start>:
  4001e1:       55                      push   %rbp
  4001e2:       48 89 e5                mov    %rsp,%rbp
  4001e5:       48 83 ec 10             sub    $0x10,%rsp
  4001e9:       48 c7 45 f8 1d 02 40    movq   $0x40021d,-0x8(%rbp)
  4001f0:       00 
  4001f1:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4001f5:       48 89 c7                mov    %rax,%rdi
  4001f8:       e8 ab ff ff ff          callq  4001a8 <my_strlen>
  4001fd:       89 c2                   mov    %eax,%edx
  4001ff:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  400203:       48 89 c6                mov    %rax,%rsi
  400206:       bf 01 00 00 00          mov    $0x1,%edi
  40020b:       e8 4a ff ff ff          callq  40015a <my_write>
  400210:       bf 00 00 00 00          mov    $0x0,%edi
  400215:       e8 2a ff ff ff          callq  400144 <my_exit>
  40021a:       90                      nop
  40021b:       c9                      leaveq 
  40021c:       c3                      retq  

程序集本身并没有那么有趣,除了在my_writemy_exit中,您可以看到SYSCALL...()宏生成的内联程序集如何将变量加载到特定的寄存器中,并执行"do syscall -恰好是x86-64汇编指令,在此也称为syscall;在32位x86架构中,它是int $80,而在其他架构中则是其他东西.

The assembly itself is not really that interesting, except that in my_write and my_exit you can see how the inline assembly generated by the SYSCALL...() macro just loads the variables into specific registers, and does the "do syscall" -- which just happens to be an x86-64 assembly instruction also called syscall here; in 32-bit x86 architecture, it is int $80, and yet something else in other architectures.

最后一个皱纹,与我为什么在与C库中的函数类似的函数中使用前缀my_的原因有关:C编译器可以为某些C库函数提供优化的快捷方式.对于GCC,此处列出了这些内容;该列表包括strlen().

There is a final wrinkle, related to the reason why I used the prefix my_ for the functions analog to the functions in the C library: the C compiler can provide optimized shortcuts for some C library functions. For GCC, these are listed here; the list includes strlen().

这意味着我们实际上不需要my_strlen()函数,因为即使在独立环境中,我们也可以使用GCC提供的优化的__builtin_strlen()函数.内置的通常是非常优化的.对于使用GCC-5.4.0的x86-64上的__builtin_strlen(),它可以优化为只有几个寄存器加载和repnz scasb %es:(%rdi),%al指令(看起来很长,但实际上只占用两个字节).

This means we do not actually need the my_strlen() function, because we can use the optimized __builtin_strlen() function GCC provides, even in freestanding environment. The built-ins are usually very optimized; in the case of __builtin_strlen() on x86-64 using GCC-5.4.0, it optimizes to just a couple of register loads and a repnz scasb %es:(%rdi),%al instruction (which looks long, but actually takes just two bytes).

换句话说,最后的皱纹是存在第三种类型的函数,即编译器内置函数,它们由编译器以优化形式提供(但与C库提供的函数一样),具体取决于所使用的编译器选项和体系结构.

In other words, the final wrinkle is that there is a third type of function, compiler built-ins, that are provided by the compiler (but otherwise just like the functions provided by the C library) in optimized form, depending on the compiler options and architecture used.

如果要扩展上面的示例,以便我们打开一个文件并将Hello, world!写入文件,然后比较低级unistd.h(open()/write()/close())和标准I/O stdio.h(fopen()/puts()/fclose())方法,我们发现主要区别在于标准I/O方法使用的FILE句柄包含很多多余的东西(这使得标准文件的处理变得相当灵活,只是在这样一个琐碎的示例中没有用),在它的缓冲方法中最明显.在汇编级别,我们仍然会看到使用相同的系统调用-openwriteclose.

If we were to expand the above example so that we'd open a file and write the Hello, world! into it, and compare low-level unistd.h (open()/write()/close()) and standard I/O stdio.h (fopen()/puts()/fclose()) approaches, we'd find that the major difference is in that the FILE handle used by the standard I/O approach contains a lot of extra stuff (that makes the standard file handles quite versatile, just not useful in such a trivial example), most visible in the buffering approach it has. On the assembly level, we'd still see the same syscalls -- open, write, close -- used.

尽管乍一看ELF格式(用于Linux中的二进制文件)包含许多不需要的东西"(在上面的示例程序中约为1千字节),但实际上它是一种非常强大的格式.它以及Linux中的动态加载器,提供了一种在程序启动时自动加载库的方法(使用LD_PRELOAD环境变量),以及在其他库中 interpose 函数的方法-本质上是替换它们新功能,但仍可以调用该函数的原始插入版本.这些允许使用许多有用的技巧,修复,实验和调试方法.

Even though at first glance the ELF format (used for binaries in Linux) contains a lot of "unneeded stuff" (about a kilobyte for our example program above), it is actually a very powerful format. It, and the dynamic loader in Linux, provides a way to auto-load libraries when a program starts (using LD_PRELOAD environment variable), and to interpose functions in other libraries -- essentially, replace them with new ones, but with a way to still be able to call the original interposed version of the function. There are lots of useful tricks, fixes, experiments, and debugging methods these allow.

这篇关于C标准库函数与系统调用.哪个是open()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆