为什么多个EOF进入结束程序? [英] Why multiple EOF enters to end program?

查看:107
本文介绍了为什么多个EOF进入结束程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

试图了解我的代码的行为。我期望Ctrl-D导致程序打印数组并退出,但是需要3次按,并且在第二次按后进入while循环。

Trying to understand the behavior of my code. I'm expecting Ctrl-D to lead to the program printing the array and exiting, however it takes 3 presses, and it enters the while loop after the second press.

#include <stdio.h>
#include <stdlib.h>

void unyon(int p, int q);
int connected(int p, int q);

int main(int argc, char *argv[]) {
    int c, p, q, i, size, *ptr;

    scanf("%d", &size);

    ptr = malloc(size * sizeof(int));

    while((c = getchar()) != EOF){
        scanf("%d", &p);
        scanf("%d", &q);

        printf("p = %d, q = %d\n", p, q);
    }

    for(i = 0; i < size; ++i)
        printf("%d\n", *ptr + i);

    free(ptr);
    return 0;
}

我在这里阅读了这篇文章,但我不太理解。
如何通过仅输入一个EOF来结束scanf

I read the post here, but I don't quite understand it. How to end scanf by entering only one EOF

阅读后,我期望第一个Ctrl-D清除缓冲区,然后期望c = getchar()第二个Ctrl-D并跳出。相反,第二个Ctrl-D进入循环并打印p和q,并且需要第三个Ctrl-D退出。

After reading that, I'm expecting the first Ctrl-D to clear the buffer, and then I'm expecting c = getchar() to pick up the second Ctrl-D and jump out. Instead the second Ctrl-D enters the loop and prints p and q, and it takes a third Ctrl-D to drop out.

这使事实更加混乱以下代码在第一个Ctrl-D-

This is made more confusing by the fact that the code below drops out on the first Ctrl-D-

#include <stdio.h>

main() {

    int c, nl;

    nl = 0;
    while((c = getchar()) != EOF)
        if (c == '\n')
            ++nl;
    printf("%d\n", nl);
}


推荐答案

让我们把程序剥离下来

scanf("%d", &size);             // Statement 1
while((c = getchar()) != EOF){  //           2
    scanf("%d", &p);            //           3
    scanf("%d", &q);            //           4
}

这绝对不是可行的方法;我们将稍后介绍正确的用法。现在,我们只分析发生了什么。准确了解 scanf 的工作方式非常重要。 %d 格式代码使它首先跳过任何空格字符,然后读取字符,只要可以将这些字符制成十进制整数即可。最终,将读取不属于十进制整数的某些字符;最有可能是换行符。因为格式字符串现在已完成,所以刚刚读取的未使用字符将重新插入流中

That is definitely not the way to go; we'll get to the correct usage in a bit. For now, let's just analyze what happens. It's important to understand precisely how scanf works. The %d format code causes it to first skip over any whitespace characters, and then read characters as long as the characters can be made into a decimal integer. Eventually some character will be read which is not part of a decimal integer; most likely a newline character. Because the format string is now finished, the unused character which has just been read will be reinserted into the stream.

因此,当调用生成 getchar 后, getchar 将读取并返回以整数结尾的换行符。在循环内部,有两个对 scanf(%d)的调用,每个调用的行为均如上所述:跳过空白(如果有的话),读取一个十进制整数,然后将未使用的字符重新插入输入流。

So when the call to getchar is made, getchar will read and return the newline character which terminated the integer. Inside the loop, there are then two calls to scanf("%d"), each of which will behave as indicated above: skip whitespace if any, read a decimal integer, and reinsert the unused character back into the input stream.

现在,让我们假设您运行程序,并输入数字 42 ,然后按Enter键,然后按Ctrl-D关闭输入流。

Now, let's suppose that you run the program, and enter the number 42 followed by the enter key, and then Ctrl-D to close the input stream.

42 将由语句1读取,并且(如上所述)换行符将由语句2读取。因此,在执行语句3时,不再有要读取的数据。由于在读取任何数字之前已发出文件结束信号,因此 scanf 将返回 EOF 。但是,该代码不会测试 scanf 的返回值。它继续执行语句4。

The 42 will be read by statement 1, and (as mentioned above) the newline will be read by statement 2. So when statement 3 is executed, there is no more data to be read. Because end-of-file is signaled before any digit is read, scanf will return EOF. However, the code does not test the return value of scanf; it goes on to statement 4.

此时应该发生的 scanf 应该立即返回 EOF ,而不尝试读取更多输入。这就是C标准所说的应该发生的,也是Posix所说的应该发生的。一旦在流上发出文件结束信号,任何输入请求都应立即返回 EOF ,直到手动清除文件结束指示符为止。 (请参阅下面的标准报价。)

What should happen at this point is that the scanf in statement 4 should immediately return EOF without attempting to read more input. That's what the C standard says should happen, and it is what Posix says should happen. Once end-of-file has been signaled on a stream, any input request should immediately return EOF until the end-of-file indicator is manually cleared. (See below for standards quotes.)

但是glibc(由于我们暂时不讨论的原因)不符合标准。它尝试再次读取。因此,用户必须输入另一个Ctrl-D,这将导致语句4的 scanf 返回 EOF 。同样,该代码不会检查返回代码,因此它将继续执行while循环,并在语句2处再次调用 getchar 。由于相同的错误, getchar 不会立即返回 EOF ,而是尝试从终端读取字符。因此,用户现在必须键入第三个Ctrl-D,以使 getchar 返回 EOF 。最后,代码检查返回代码,而while循环终止。

But glibc, for reasons we won't go into just yet, does not conform to the standard. It attempts another read. And so the user must enter another Ctrl-D, which will cause the scanf at statement 4 to return EOF. Again, the code does not check the return code, so it continues with the while loop and calls getchar again at statement 2. Because of the same bug, getchar does not immediately return EOF, but instead attempts to read a character from the terminal. So the user must now type a third Ctrl-D to cause getchar to return EOF. Finally, the code checks a return code, and the while loop terminates.

所以这就是正在发生的情况的解释。 。现在,很容易看到代码中至少有一个错误: scanf 的返回值从未检查过。这不仅意味着缺少 EOF ,还意味着将忽略输入错误。 (如果无法将输入解析为整数,则 scanf 将返回0。)这很严重,因为如果 scanf 无法成功匹配格式代码,相应参数的值为 undefined ,并且不得使用。

So that is the explanation of what is happening. Now, it is easy to see at least one mistake in the code: the return value of scanf is never checked. Not only does this mean that EOF is missed, it also means that input errors are ignored. (scanf would have returned 0 if the input could not be parsed as an integer.) That's serious, because if scanf cannot succesfully match the format code, the value of the corresponding argument is undefined and must not be used.

总之:始终检查 * scanf 中的返回值。 (以及其他I / O库函数。)

In short: Always check return values from *scanf. (And other I/O library functions.)

但是还有一个更细微的错误,在这种情况下几乎没有什么区别,但总的来说可能很严重。语句2中的 getchar 读取的字符被简单地丢弃,无论它是什么。通常它是空格,因此它被丢弃并不重要,但实际上您不知道是因为该字符被丢弃了。也许是逗号。也许是一封信。

But there is a more subtle mistake as well, which makes little difference in this case but could, in general, be serious. The character read by getchar in statement 2 is simply discarded, regardless of what it was. Normally it will be whitespace, so it doesn't matter that it is discarded, but you don't actually know that because the character is discarded. Maybe it was a comma. Maybe it was a letter. Maybe it matters what it was.

依赖于这样的假设是不好的风格,即 getchar 在语句2中不重要。如果您确实需要查看下一个字符,则应将其重新插入输入流,就像 scanf 那样:

It is bad style to rely on the assumption that whatever character is read by the getchar at statement 2 is unimportant. If you really need to peek at the next character, you should reinsert it into the input stream, just as scanf does:

while ((c = getchar()) != EOF) {
  ungetc(c, stdin);  /* Put c back into the input stream */
  ...
}

但是实际上,该测试根本不是您想要的。正如我们已经看到的,此时 getchar 极不可能返回 EOF 。 (有可能,但可能性很小)。更可能的是,即使下一个 scanf 将遇到换行符, getchar 也会读取换行符。 -文件。因此,毫无疑问地偷看下一个角色。正确的解决方案是检查 scanf 的返回码,如上所述。

But actually, that test is not what you want at all. As we have already seen, it is extremely unlikely that getchar will return EOF at this point. (It's possible, but it's very unlikely). Much more more probable is that getchar will read a newline character, even though the next scanf will encounter the end-of-file. So there was absolutely no point peeking at the next character; the correct solution is to check the return code of scanf, as indicated above.

将它们组合在一起真的想要这里更像是这样:

Putting that together, what you really want here is something more like:

/* No reason to use two scanf calls to read two consecutive numbers */
while ((count = scanf("%d%d", &p, &q)) == 2) {
  /* Do something with p and q */
}
if (count != EOF) {
  /* Invalid format. Issue an error message, at least */
}
/* Do whatever needs to be done at the end of input. */






最后,让我们检查一下glibc的行为。 长期存在的错误报告 = OP中引用的问题的答案。https://stackoverflow.com/a/19890073/1566221>。如果您不愿仔细阅读bugzilla线程中的最新文章,则会找到指向关于glibc开发人员邮件列表的讨论


Finally, let's examine glibc's behaviour. There is a very long-standing bug report linked to by an answer to the question cited in the OP. If you take the trouble to read through to the most recent post in the bugzilla thread, you'll find a link to a discussion on the glibc developer mailing list.

让我给出TL; DR版本,然后保存一下数字考古的麻烦。从C99开始,该标准已经明确规定EOF是粘性的。 § 7.21.3 / 11指出,所有输入都被执行,就好像 fgetc 读取了连续字节一样:

Let me give the TL;DR version, and save you the trouble of digital archaeology. Since C99, the standard has been clear that EOF is "sticky". §7.21.3/11 states that all input is performed as though successive bytes were read by fgetc:


...字节输入函数从流中读取字符,就像通过连续调用 fgetc 函数一样。

§ 7.21.7.1/3指出 fgetc 返回 EOF 如果设置了流的文件结束指示符,则立即执行:

And §7.21.7.1/3 states that fgetc returns EOF immediately if the stream's end-of-file indicator is set:


如果流的文件结束指示符设置,或者如果流在文件末尾,则设置流的文件结束指示符,并且 fgetc 函数返回 EOF 。否则, fgetc 函数从流指向的输入流中返回下一个字符。如果发生读取错误,将设置流的错误指示符,并且 fgetc 函数
返回 EOF

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

因此,一旦设置了文件结尾指示符,由于检测到文件结尾或发生了某些读取错误,随后输入操作必须必须立即返回 EOF ,而无需尝试从流中读取。各种各样的东西都可以清除文件结束指示符,包括 clearerr seek ungetc ;一旦清除了文件结束指示符,下一个输入函数调用将再次尝试从流中读取。

So once the end-of-file indicator is set, because either end of file was detected or some read error occurred, subsequent input operations must immediately return EOF without attempting to read from the stream. Various things can clear the end-of-file indicator, including clearerr, seek, and ungetc; once the end-of-file indicator has been cleared, the next input function call will again attempt to read from the stream.

但是,并非总是如此。在C99之前,未指定从已经返回 EOF 的流中读取的结果。并且不同的标准库选择以不同的方式处理它。

However, it wasn't always like that. Before C99, the result of reading from a stream which had already returned EOF was unspecified. And different standard libraries chose to handle it in different ways.

因此决定不更改glibc以使其符合(当时)新标准,而是维护与某些其他C库的兼容性,尤其是Solaris。 (错误报告中引用了glibc来源中的注释。)

So a decision was made to not change glibc to conform to the (then) new standard, but rather to maintain compatibility with certain other C libraries, notably Solaris. (A comment in the glibc source is quoted in the bug report.)

尽管有一个令人信服的论点(至少对我来说是令人信服的),但该错误并未得到解决。可能会破坏任何重要的东西,但仍然不愿意对此做任何事情。因此,十年后的今天,这里仍然有一个尚待解决的错误报告和一个不一致的实现。

Although there is a compelling argument (at least, compelling to me) that fixing the bug is not likely to break anything important, there is still a certain reluctance to do anything about it. And so, here we are, ten years later, with a still-open bug report, and a non-conforming implementation.

这篇关于为什么多个EOF进入结束程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆