为什么fread循环需要额外的Ctrl + D才能通过glibc发出EOF信号? [英] Why does an fread loop require an extra Ctrl+D to signal EOF with glibc?

查看:71
本文介绍了为什么fread循环需要额外的Ctrl + D才能通过glibc发出EOF信号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常,要指示Linux终端上标准输入所附加程序的EOF,如果我只是按Enter键,则需要按Ctrl + D一次,否则按两次.我注意到 patch 命令是不同的.有了它,如果我只是按Enter键,则需要按Ctrl + D两次,否则需要按3次.(执行 cat | patch 则没有这种怪异.此外,如果我在键入任何实际输入之前按Ctrl + D,则根本没有这种怪异.)补丁的源代码,我可以追溯到

Normally, to indicate EOF to a program attached to standard input on a Linux terminal, I need to press Ctrl+D once if I just pressed Enter, or twice otherwise. I noticed that the patch command is different, though. With it, I need to press Ctrl+D twice if I just pressed Enter, or three times otherwise. (Doing cat | patch instead doesn't have this oddity. Also, If I press Ctrl+D before typing any real input at all, it doesn't have this oddity.) Digging into patch's source code, I traced this back to the way it loops on fread. Here's a minimal program that does the same thing:

#include <stdio.h>

int main(void) {
    char buf[4096];
    size_t charsread;
    while((charsread = fread(buf, 1, sizeof(buf), stdin)) != 0) {
        printf("Read %zu bytes. EOF: %d. Error: %d.\n", charsread, feof(stdin), ferror(stdin));
    }
    printf("Read zero bytes. EOF: %d. Error: %d. Exiting.\n", feof(stdin), ferror(stdin));
    return 0;
}

按原样编译并运行上述程序时,这是事件的时间表:

When compiling and running the above program exactly as-is, here's a timeline of events:

  1. 我的程序调用 fread .
  2. fread 调用 read 系统调用.
  3. 我键入"asdf".
  4. 我按Enter.
  5. read 系统调用返回5.
  6. fread 再次调用 read 系统调用.
  7. 我按Ctrl + D.
  8. read 系统调用返回0.
  9. fread 返回5.
  10. 我的程序打印读取5个字节.EOF:1.错误:0.
  11. 我的程序再次调用 fread .
  12. fread 调用 read 系统调用.
  13. 我再次按Ctrl + D.
  14. read 系统调用返回0.
  15. fread 返回0.
  16. 我的程序打印读取零字节.EOF:1.错误:0.正在退出.
  1. My program calls fread.
  2. fread calls the read system call.
  3. I type "asdf".
  4. I press Enter.
  5. The read system call returns 5.
  6. fread calls the read system call again.
  7. I press Ctrl+D.
  8. The read system call returns 0.
  9. fread returns 5.
  10. My program prints Read 5 bytes. EOF: 1. Error: 0.
  11. My program calls fread again.
  12. fread calls the read system call.
  13. I press Ctrl+D again.
  14. The read system call returns 0.
  15. fread returns 0.
  16. My program prints Read zero bytes. EOF: 1. Error: 0. Exiting.

为什么这种读取stdin的方式有这种行为,而不像其他所有程序都读取它那样?这是 patch 中的错误吗?应该如何编写这种循环以避免这种行为?

Why does this means of reading stdin have this behavior, unlike the way that every other program seems to read it? Is this a bug in patch? How should this kind of loop be written to avoid this behavior?

更新:这似乎与libc有关.我最初在Ubuntu 16.04的glibc 2.23-0ubuntu3上体验过它.@Barmar在评论中指出,它不会在macOS上发生.听到这些消息后,我尝试针对同样来自Ubuntu 16.04的musl 1.1.9-1编译相同的程序,但它没有这个问题.在musl上,事件序列已删除了步骤12至14,这就是为什么它没有问题的原因,但在其他方面却是相同的(除了 readv 的无关详细信息代替了阅读).

UPDATE: This seems to be related to libc. I originally experienced it on glibc 2.23-0ubuntu3 from Ubuntu 16.04. @Barmar noted in the comments that it doesn't happen on macOS. After hearing this, I tried compiling the same program against musl 1.1.9-1, also from Ubuntu 16.04, and it didn't have this problem. On musl, the sequence of events has steps 12 through 14 removed, which is why it doesn't have the problem, but is otherwise the same (except for the irrelevant detail of readv in place of read).

现在,问题就变成了:glibc的行为是否错误,或者假设其libc不会具有此行为,补丁是否是错误的?

Now, the question becomes: is glibc wrong in its behavior, or is patch wrong in assuming that its libc won't have this behavior?

推荐答案

我已设法确认这是由于2.28之前的glibc版本中存在明确的错误(提交 C标准的相关引用:

I've managed to confirm that this is due to an unambiguous bug in glibc versions prior to 2.28 (commit 2cc7bad). Relevant quotes from the C standard:

字节输入/输出功能-本节中描述的执行功能输入/输出:[...],<代码>读取

The byte input/output functions — those functions described in this subclause that perform input/output: [...], fread

字节输入函数从流中读取字符,就像连续读取一样调用 fgetc 函数.

The byte input functions read characters from the stream as if by successive calls to the fgetc function.

如果设置了流的文件结束指示符,,如果流在文件末尾,则设置流的文件结束指示符,并且 fgetc 函数返回 EOF .否则, fgetc 函数从 stream 指向的输入流中返回下一个字符.

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream.

(强调或"我的)

以下程序演示了 fgetc 的错误:

The following program demonstrates the bug with fgetc:

#include <stdio.h>

int main(void) {
    while(fgetc(stdin) != EOF) {
        puts("Read and discarded a character from stdin");
    }
    puts("fgetc(stdin) returned EOF");
    if(!feof(stdin)) {
        /* Included only for completeness. Doesn't occur in my testing. */
        puts("Standard violation! After fgetc returned EOF, the end-of-file indicator wasn't set");
        return 1;
    }
    if(fgetc(stdin) != EOF) {
        /* This happens with glibc in my testing. */
        puts("Standard violation! When fgetc was called with the end-of-file indicator set, it didn't return EOF");
        return 1;
    }
    /* This happens with musl in my testing. */
    puts("No standard violation detected");
    return 0;
}

演示该错误:

  1. 编译程序并执行
  2. 按Ctrl + D
  3. 按Enter

确切的错误是,如果设置了文件结束流指示符,但是该流不在文件末尾,则glibc的fgetc将返回流中的下一个字符,而不是标准要求的EOF

The exact bug is that if the end-of-file stream indicator is set, but the stream is not at end-of-file, glibc's fgetc will return the next character from the stream, rather than EOF as the standard requires.

由于 fread 是根据 fgetc 定义的,因此这是我最初看到的原因.以前据报道它是 glibc错误#1190 ,并且自提交 2cc7bad ,于2018年8月登陆glibc 2.28.

Since fread is defined in terms of fgetc, this is the cause of what I originally saw. It's previously been reported as glibc bug #1190 and has been fixed since commit 2cc7bad in February 2018, which landed in glibc 2.28 in August 2018.

这篇关于为什么fread循环需要额外的Ctrl + D才能通过glibc发出EOF信号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆