C stdio 输入流如何实现行缓冲? [英] How is line buffering implemented for C stdio input streams?

查看:52
本文介绍了C stdio 输入流如何实现行缓冲?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道完全缓冲的输入可以通过为可能大于应用程序所需的数据块发出单个 read 系统调用来实现.但我不明白如何在没有内核支持的情况下将行缓冲应用于输入.我想人们必须读取一个数据块然后查找换行符,但如果是这样,那么与完全缓冲有什么区别?

I understand that fully buffered input can be implemented by issuing a single read syscall for a block of data possibly larger than required by the application. But I don't understand how line buffering could ever be applied to input without support from the kernel. I imagine one would have to read a block of data and then look for newlines, but if so, what is the difference with full buffering?

更具体地说:

假设我有一个输入流 FILE* in.关于 stdio 库如何从操作系统检索字节以填充其缓冲区,以下内容之间有什么区别吗?

Suppose I have an input stream FILE* in. Is there any difference between the following, with regards to how the stdio library will retrieve bytes from the operating system to fill its buffer?

  • 行缓冲:setvbuf(in, NULL, _IOLBF, BUFSIZ)
  • 全缓冲:setvbuf(in, NULL, _IOFBF, BUFSIZ)

如果是,那有什么区别?

If so, what is that difference?

推荐答案

FILE 结构具有默认的内部缓冲区.在 fopen 之后,以及在 freadfgets 等上,缓冲区由 stdio 层从 read(2) 调用.

A FILE struct has a default internal buffer. After fopen, and on an fread, fgets, etc., the buffer is populated by the stdio layer from a read(2) call.

当您执行fgets 时,它会将数据复制到您的 缓冲区,从内部缓冲区中提取数据[直到找到换行符].如果未找到换行符,流内部缓冲区将通过另一个 read(2) 调用进行补充.然后,继续扫描换行符并填充缓冲区.

When you do fgets, it will copy data to your buffer, pulling it from the internal buffer [until newline is found]. If no newline is found, the stream internal buffer is replenished with another read(2) call. Then, the scan for newline and fill of your buffer continues.

这可能会重复多次[尤其是如果您正在执行fread].剩下的任何内容都可用于下一个流读取操作(例如 freadfgetsfgetc).

This can repeat a number of times [particularly true if you're doing fread]. Whatever is left over is available for the next stream read operation (e.g. fread, fgets, fgetc).

您可以使用 setlinebuf 设置流缓冲区的大小.为了效率,典型的默认大小是机器页面大小[IIRC].

You can set the size of stream buffer with setlinebuf. For efficiency, the typical default size is the machine page size [IIRC].

因此,流缓冲区比您领先一步",可以这么说.它的操作很像一个环形队列[实际上,如果不是现实的话].

So, the stream buffer "stays one step ahead of you", so to speak. It operates much like a ring queue [in effect, if not actuality].

不知道,但行缓冲[或任何缓冲模式]通常用于输出文件(例如,默认设置为stdout).它说,如果您看到换行符,请执行隐含的 fflush.全缓冲意味着在缓冲区已满时执行 fflush.无缓冲意味着对 每个 字符执行 fflush.

Dunno for sure, but line buffering [or any buffering mode] is usually for output files (e.g. set for stdout by default). It says, if you see a newline, do an implied fflush. Full buffering means do the fflush when the buffer is full. Unbuffered means do fflush on every character.

如果你打开一个输出日志文件,你会得到完整的缓冲[最有效],所以如果你的程序崩溃,你可能不会得到最后 N 行的输出(即它们仍然在缓冲区中待处理).您可以设置行缓冲,以便在程序崩溃后获得最后一行.

If you open an output logfile, you get full buffering [most efficient], so if your program crashes, you might not get the last N lines output (i.e. they're still pending in the buffer). You can set line buffering so you get the last trace line after a program crash.

在输入时,行缓冲对文件 [AFAICT] 没有任何意义.它只是尝试尽可能使用最有效的大小(例如流缓冲区大小).

On input, line buffering doesn't have any meaning for a file [AFAICT]. It just tries to use the most efficient size possible (e.g. the stream buffer size).

我认为重要的一点是,在输入时,您事先不知道换行符在哪里,所以 _IOLBF 像任何其他模式一样运行——因为它必须.(即)您将 read(2) 读到流 buf 大小(或完成未完成的 fread 所需的数量).换句话说,唯一重要的是fread 的内部缓冲区大小和大小/计数参数,而不是 缓冲模式.

I think that the important point is that, on input, you don't know where the newline is beforehand, so _IOLBF operates like any other mode--because it has to. (i.e.) you do read(2) up to stream buf size (or the amount needed to fulfill the outstanding fread). In other words, the only things that matter are the internal buffer size and the size/count parameters of the fread and not the buffering mode.

对于 TTY 设备(例如 stdin),流将等待换行符 [除非您在底层 fildes(例如 0)上使用 TIOC* ioctl 来设置一次一次字符(即原始模式)],与流模式无关.这是因为 [内核中] 的 TTY 设备规范处理层将阻止读取(例如,这就是为什么您可以键入退格键等而无需应用程序处理).

For TTY device (e.g. stdin), the stream will wait for newline [unless you use a TIOC* ioctl on the underlying fildes (e.g. 0) to set char-at-a-time aka raw mode], regardless of the stream mode. That's because the TTY device canonical processing layer [in the kernel] will hold up the read (e.g. that's why you can type backspace, etc. without the application having to deal with it).

但是,在 TTY 设备/流上执行 fgets 将在内部得到特殊处理(例如)它将执行选择/轮询并获取待处理字符的数量并仅读取该数量,因此它获胜不要阻塞读取.然后它将寻找换行符,如果没有找到换行符,则重新发出选择/轮询.但是,如果找到换行符,它将从 fgets 返回.换句话说,它会做任何必要的事情来允许标准输入上的预期行为.如果用户输入 10 个字符 + 换行符,它不会阻止读取 4096 字节.

However, doing fgets on a TTY device/stream will get special treatment internally (e.g.) it will do select/poll and get the number of pending chars and read only that amount, so it won't block on the read. It will then look for newline, and reissue select/poll if no newline found. But, if newline found, it returns from the fgets. In other words, it will do whatever is necessary to allow the expected behavior on stdin. It wouldn't do for it to block on a 4096 byte read if the user entered 10 chars + newline.

更新:

回答你的第二轮跟进问题

To answer your second round of followup questions

我认为在进程中运行的 tty 子系统和 stdio 代码是完全独立的.它们接口的唯一方式是通过进程发出读取系统调用;这些可能会被阻止,这取决于 tty 设置.

I see the tty subsystem and the stdio code running in the process as completely independent. The only way they interface is by the process issuing read syscalls; these may block or not, and this is what depends on the tty settings.

通常情况下,确实如此.大多数应用程序不会尝试调整 TTY 层设置.但是,如果应用程序愿意,它可以这样做,但不能通过任何流/stdio 函数.

Normally, that is true. Most applications do not try to adjust the TTY layer settings. But, an app can do so if it wishes to, but not via any stream/stdio functions.

但该过程完全不知道这些设置,也无法更改它们.

But the process is completely unaware of those settings and can't change them.

同样,通常是正确的.但是,同样,流程可以改变它们.

Again, normally true. But, again, the process can change them.

如果我们在同一页面上,您所说的暗示 setvbuf 调用将更改 tty 设备的缓冲策略,我发现这与我对 Unix I/O 的理解难以协调.

If we're on the same page, what you're saying implies that a setvbuf call will change the buffering policy of the tty device, and I find that hard to reconcile with my understanding of Unix I/O.

setvbuf 仅设置stream 缓冲区大小和策略.它与内核完全无关.内核只看到read(2),并且知道应用程序是原始的还是流通过fread [或<代码>fgets].它不会以任何方式影响 TTY 层.

No setvbuf only sets the stream buffer size and policy. It has nothing to do with the kernel at all. The kernel only sees read(2) and has no idea whether the app did it raw or whether the stream did it via fread [or fgets]. It does not affect the TTY layer in any way.

fgetc 上循环的普通应用程序中,用户输入abcdef\nfgetc 将阻止[在驱动程序中]直到输入换行符.这是执行此操作的 TTY 规范处理层.然后,当输入换行符时,fgetc 完成的 read(2) 将返回 7 的值.第一个 fgetc 将返回,其余六个将快速发生,由 stream 的 内部缓冲区完成.

In a normal app that is looping on fgetc and a user inputs abcdef\n, the fgetc will block [in the driver] until the newline is entered. This is the TTY canonical processing layer doing this. Then, when the newline is entered, the read(2) done by the fgetc will return with the value of 7. the first fgetc will return and the remaining six will occur rapidly, being fulfilled from the stream's internal buffer.

不过……

更复杂的应用程序可能会通过 ioctl(fileno(stdin),TIOC*,...) 更改 TTY 层策略.流将不会意识到这一点.因此,在这样做时,必须小心.因此,如果一个进程想要,它可以完全控制文件单元后面的 TTY 层,但必须通过 ioctl

More sophisticated apps may change the TTY layer policy via ioctl(fileno(stdin),TIOC*,...). The stream will not be aware of this. So when doing so, one must be careful. Thus, if a process wants, it can fully control the TTY layer behind the file unit, but must do manually via the ioctl

使用 ioctl 修改 [甚至禁用] TTY 规范处理 [又名TTY 原始模式"] 可以由需要真正的一次字符输入的应用程序使用.例如vimemacsgetkey

Using the ioctl to modify [or even disable] TTY canonical processing [aka "TTY raw mode"] can be used by applications that need true char-at-a-time input. For example, vim, emacs, getkey, etc.

虽然应用程序可以混合原始模式和stdio流[并有效地这样做],但正常用法是在正常模式/用法中使用流 绕过整个stdio层,执行ioctl(0,TIOC*,...)然后执行read(2) 直接.

While an application can intermix raw mode and a stdio stream [and do so effectively], the normal usage is to either use streams in their normal mode/usage or bypass the stdio layer entirely, do ioctl(0,TIOC*,...) and then do read(2) directly.

这是一个示例 getkey 程序:

Here's a sample getkey program:

// getkey -- wait for user input

#include <stdio.h>
#include <fcntl.h>
#include <termios.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <string.h>
#include <errno.h>

#define sysfault(_fmt...) \
    do { \
        printf(_fmt); \
        exit(1); \
    } while (0)

int
main(int argc,char **argv)
{
    int fd;
    int remain;
    int err;
    int oflag;
    int stdflg;
    char *cp;
    struct termios tiold;
    struct termios tinew;
    int len;
    int flag;
    char buf[1];
    int code;

    --argc;
    ++argv;

    stdflg = 0;

    for (;  argc > 0;  --argc, ++argv) {
        cp = *argv;
        if (*cp != '-')
            break;

        switch (cp[1]) {
        case 's':
            stdflg = 1;
            break;
        }
    }

    printf("using %s\n",stdflg ? "fgetc" : "read");

    fd = fileno(stdin);

    oflag = fcntl(fd,F_GETFL);
    fcntl(fd,F_SETFL,oflag | O_NONBLOCK);

    err = tcgetattr(fd,&tiold);
    if (err < 0)
        sysfault("getkey: tcgetattr failure -- %s\n",strerror(errno));

    tinew = tiold;

#if 1
    tinew.c_iflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP |
        INLCR | IGNCR | ICRNL | IXON);
    tinew.c_oflag &= ~OPOST;
    tinew.c_lflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
    tinew.c_cflag &= ~(CSIZE | PARENB);
    tinew.c_cflag |= CS8;

#else
    cfmakeraw(&tinew);
#endif

#if 0
    tinew.c_cc[VMIN] = 0;
    tinew.c_cc[VTIME] = 0;
#endif

    err = tcsetattr(fd,TCSAFLUSH,&tinew);
    if (err < 0)
        sysfault("getkey: tcsetattr failure -- %s\n",strerror(errno));

    for (remain = 9;  remain > 0;  --remain) {
        printf("\rHit any key within %d seconds to abort ...",remain);
        fflush(stdout);

        sleep(1);

        if (stdflg) {
            len = fgetc(stdin);
            if (len != EOF)
                break;
        }
        else {
            len = read(fd,buf,sizeof(buf));
            if (len > 0)
                break;
        }
    }

    tcsetattr(fd,TCSAFLUSH,&tiold);
    fcntl(fd,F_SETFL,oflag);

    code = (remain > 0);

    printf("\n");
    printf("%s (%d remaining) ...\n",code ? "abort" : "normal",remain);

    return code;
}

这篇关于C stdio 输入流如何实现行缓冲?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆