如何在 C 中读取/解析输入?常见问题 [英] How to read / parse input in C? The FAQ

查看:27
本文介绍了如何在 C 中读取/解析输入?常见问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试读取/解析输入时,我的 C 程序有问题.

帮助?

<小时>

这是一个常见问题条目.

StackOverflow 有许多与在 C 中读取输入相关的问题,答案通常集中在特定用户的特定问题上,而没有真正描绘出整个画面.

这是试图全面涵盖许多常见错误,因此可以通过将它们标记为与此问题重复来回答这一特定问题系列:

  • 为什么最后一行打印了两次?
  • 为什么我的 scanf("%d", ...)/scanf("%c", ...) 失败?
  • 为什么 gets() 会崩溃?
  • ...

答案被标记为社区维基.随意改进和(谨慎)扩展.

解决方案

The Beginner's C Input Primer

  • 文本模式与二进制模式
  • 检查 fopen() 是否失败
  • 陷阱
    • 检查您调用的任何函数是否成功
    • EOF,或为什么最后一行打印两次"
    • 永远不要使用gets()
    • 不要在 stdin 上使用 fflush()代码> 或任何其他打开供阅读的流
    • 请勿将*scanf() 用于可能格式错误的输入莉>
    • *scanf() 未按预期工作时
  • 阅读,然后解析
    • 通过 fgets() 读取(部分)输入行
    • 解析内存中的行
  • 清理

文本模式与二进制模式

二进制模式"流完全按照写入的方式读入.但是,可能(也可能不会)在流的末尾附加了实现定义数量的空字符 ('').

文本模式"流可以进行多种转换,包括(但不限于):

  • 删除行尾前的空格;
  • 在输出时将换行符 (' ') 更改为其他内容(例如,Windows 上的 ")并返回到 ' ' 输入;
  • 添加、更改或删除既不是打印字符(isprint(c) 为真)、水平制表符或换行符的字符.

很明显,文本和二进制模式不能混用.以文本方式打开文本文件,以二进制方式打开二进制文件.

检查 fopen() 是否失败

尝试打开文件可能会因各种原因而失败 - 缺少权限或找不到文件是最常见的原因.在这种情况下,fopen() 将返回 NULL代码>指针.总是在尝试读取或写入文件之前检查 fopen 是否返回了 NULL 指针.

fopen失败时,通常会设置全局errno 变量指示为什么失败.(这在技术上不是 C 语言的要求,但 POSIX 和 Windows 都保证这样做.) errno 是一个代码编号,可以与 errno.h,但在简单的程序中,通常您需要做的就是将其转换为错误消息并打印,使用 perror()strerror().错误消息还应包括您传递给 fopen 的文件名;如果你不这样做,当问题是文件名不是你想象的那样时,你会很困惑.

#include #include #include int main(int argc, char **argv){如果 (argc <2) {fprintf(stderr, "用法: %s 文件
", argv[0]);返回 1;}文件 *fp = fopen(argv[1], r");如果(!fp){//或者,只是`perror(argv[1])`fprintf(stderr, "无法打开 %s: %s
", argv[1], strerror(errno));返回 1;}//从这里读取 fpfclose(fp);返回0;}

陷阱

检查您调用的任何函数是否成功

这应该是显而易见的.但是确实检查您调用的任何函数的文档以了解其返回值和错误处理,并检查这些条件.

如果您及早发现病情,这些错误很容易发生,但如果您不及时发现,就会导致很多头疼.

EOF,或为什么最后一行打印两次"

函数 feof() 返回 true 如果已达到 EOF.对什么是达到"的误解EOF其实就是让很多初学者写成这样:

//损坏的代码而 (!feof(fp)) {fgets(buffer, BUFFER_SIZE, fp);printf(%s", 缓冲区);}

这使得输入的最后一行打印两次,因为当最后一行被读取时(直到最后一个换行符,输入流中的最后一个字符),EOF是<强>未设置.

仅当您尝试读取过去最后一个字符时才会设置 EOF!

所以上面的代码再次循环,fgets() 失败读取另一行,设置 EOF 并保持 buffer 的内容不变,然后再次打印.

而是直接检查fgets是否失败:

//好代码而 (fgets(buffer, BUFFER_SIZE, fp)) {printf(%s", 缓冲区);}

永远不要使用gets()强>

没有办法安全地使用此功能.因此,它已被删除em> 来自 C11 出现的语言.

不要在 上使用 fflush()stdin 或任何其他打开供阅读的流

许多人希望 fflush(stdin) 丢弃尚未读取的用户输入.它不会那样做.在普通的 ISO C 中,调用 fflush() 在输入流上有未定义行为.它在 POSIX 和 MSVC 中确实具有明确定义的行为,但是这两者都不会使其丢弃尚未读取的用户输入.

通常,清除待处理输入的正确方法是读取并丢弃直到并包括换行符的字符,但不能超过:

int c;做 c = getchar();而 (c != EOF && c != '
');

请勿使用 *scanf() 来处理潜在的格式错误输入

许多教程教您使用 *scanf() 阅读任何一种输入,因为它用途广泛.

但是*scanf() 的目的其实是为了批量读取可以在某种程度上依赖预定义格式的数据.(例如被其他程序编写.)

即使这样,*scanf() 也会绊倒粗心:

  • 使用可能会以某种方式受到用户影响的格式字符串是一个巨大的安全漏洞.
  • 如果输入与预期格式不匹配,*scanf() 立即停止解析,留下任何未初始化的剩余参数.
  • 它会告诉您它成功完成了多少分配——这就是为什么您应该检查它的返回码(见上文)——但不是它的确切位置停止解析输入,使正常错误恢复变得困难.
  • 它会跳过输入中的任何前导空格,除非它不跳过([cn 转换).(见下一段.)
  • 在某些极端情况下,它的行为有些奇怪.

*scanf() 未按预期工作时

*scanf() 的一个常见问题是用户未考虑的输入流中的未读空格 (' ', ' ', ...).

读取数字(%d" 等)或字符串(%s"),会在任何空格处停止.虽然大多数 *scanf() 转换说明符跳过输入中的前导空格,[, c>n 不.所以换行符仍然是第一个待处理的输入字符,使得 %c%[ 无法匹配.

您可以通过显式读取来跳过输入中的换行符,例如通过 fgetc(),或者在您的 *scanf() 格式字符串.(格式字符串中的单个空格与输入中的任意 个空格匹配.)

阅读,然后解析

我们只是建议不要使用 *scanf() 除非你真的,积极地,知道你在做什么.那么,用什么来替代?

不是一次性读取和解析输入,而是 *scanf() 尝试做,分开步骤.

通过 fgets()

fgets() 有一个参数将其输入限制为最多那么多字节,避免缓冲区溢出.如果输入行完全适合您的缓冲区,则缓冲区中的最后一个字符将是换行符 (' ').如果它不完全适合,您正在查看部分阅读的行.

解析内存中的行

对内存解析特别有用的是 strtol()strtod() 函数系列,它们提供与*scanf() 转换说明符 d, <代码>iuoxaefg.

但它们也会确切地告诉您它们停止解析的位置,并且对对于目标类型来说太大的数字进行了有意义的处理.

除此之外,C 还提供了广泛的字符串处理函数.由于您将输入保存在内存中,并且始终确切地知道您已经对其进行了多远的解析,因此您可以根据自己的喜好多次返回以尝试理解输入.

如果所有其他方法都失败了,您可以使用整行来为用户打印有用的错误消息.

清理

确保明确关闭您(成功)打开的任何流.这会刷新所有尚未写入的缓冲区,并避免资源泄漏.

fclose(fp);

I have problems with my C program when I try to read / parse input.

Help?


This is a FAQ entry.

StackOverflow has many questions related to reading input in C, with answers usually focussed on the specific problem of that particular user without really painting the whole picture.

This is an attempt to cover a number of common mistakes comprehensively, so this specific family of questions can be answered simply by marking them as duplicates of this one:

  • Why does the last line print twice?
  • Why does my scanf("%d", ...) / scanf("%c", ...) fail?
  • Why does gets() crash?
  • ...

The answer is marked as community wiki. Feel free to improve and (cautiously) extend.

解决方案

The Beginner's C Input Primer

  • Text mode vs. Binary mode
  • Check fopen() for failure
  • Pitfalls
    • Check any functions you call for success
    • EOF, or "why does the last line print twice"
    • Do not use gets(), ever
    • Do not use fflush() on stdin or any other stream open for reading, ever
    • Do not use *scanf() for potentially malformed input
    • When *scanf() does not work as expected
  • Read, then parse
    • Read (part of) a line of input via fgets()
    • Parse the line in-memory
  • Clean Up

Text mode vs. Binary mode

A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('') appended at the end of the stream.

A "text mode" stream may do a number of transformations, including (but not limited to):

  • removal of spaces immediately before a line-end;
  • changing newlines (' ') to something else on output (e.g. " " on Windows) and back to ' ' on input;
  • adding, altering, or deleting characters that are neither printing characters (isprint(c) is true), horizontal tabs, or new-lines.

It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.

Check fopen() for failure

The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL pointer. Always check whether fopen returned a NULL pointer, before attempting to read or write to the file.

When fopen fails, it usually sets the global errno variable to indicate why it failed. (This is technically not a requirement of the C language, but both POSIX and Windows guarantee to do it.) errno is a code number which can be compared against constants in errno.h, but in simple programs, usually all you need to do is turn it into an error message and print that, using perror() or strerror(). The error message should also include the filename you passed to fopen; if you don't do that, you will be very confused when the problem is that the filename isn't what you thought it was.

#include <stdio.h>
#include <string.h>
#include <errno.h>

int main(int argc, char **argv)
{
    if (argc < 2) {
        fprintf(stderr, "usage: %s file
", argv[0]);
        return 1;
    }

    FILE *fp = fopen(argv[1], "r");
    if (!fp) {
        // alternatively, just `perror(argv[1])`
        fprintf(stderr, "cannot open %s: %s
", argv[1], strerror(errno));
        return 1;
    }

    // read from fp here

    fclose(fp);
    return 0;
}

Pitfalls

Check any functions you call for success

This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.

These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.

EOF, or "why does the last line print twice"

The function feof() returns true if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:

// BROKEN CODE
while (!feof(fp)) {
    fgets(buffer, BUFFER_SIZE, fp);
    printf("%s", buffer);
}

This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.

EOF only gets set when you attempt to read past the last character!

So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer untouched, which then gets printed again.

Instead, check whether fgets failed directly:

// GOOD CODE
while (fgets(buffer, BUFFER_SIZE, fp)) {
    printf("%s", buffer);
}

Do not use gets(), ever

There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.

Do not use fflush() on stdin or any other stream open for reading, ever

Many people expect fflush(stdin) to discard user input that has not yet been read. It does not do that. In plain ISO C, calling fflush() on an input stream has undefined behaviour. It does have well-defined behavior in POSIX and in MSVC, but neither of those make it discard user input that has not yet been read.

Usually, the right way to clear pending input is read and discard characters up to and including a newline, but not beyond:

int c;
do c = getchar(); while (c != EOF && c != '
');

Do not use *scanf() for potentially malformed input

Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.

But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)

Even then *scanf() can trip the unobservant:

  • Using a format string that in some way can be influenced by the user is a gaping security hole.
  • If the input does not match the expected format, *scanf() immediately stops parsing, leaving any remaining arguments uninitialized.
  • It will tell you how many assignments it has successfully done -- which is why you should check its return code (see above) -- but not where exactly it stopped parsing the input, making graceful error recovery difficult.
  • It skips any leading whitespaces in the input, except when it does not ([, c, and n conversions). (See next paragraph.)
  • It has somewhat peculiar behaviour in some corner cases.

When *scanf() does not work as expected

A frequent problem with *scanf() is when there is an unread whitespace (' ', ' ', ...) in the input stream that the user did not account for.

Reading a number ("%d" et al.), or a string ("%s"), stops at any whitespace. And while most *scanf() conversion specifiers skip leading whitespace in the input, [, c and n do not. So the newline is still the first pending input character, making either %c and %[ fail to match.

You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)

Read, then parse

We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?

Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.

Read (part of) a line of input via fgets()

fgets() has a parameter for limiting its input to at most that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline (' '). If it did not all fit, you are looking at a partially-read line.

Parse the line in-memory

Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d, i, u, o, x, a, e, f, and g.

But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.

Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.

And if all else fails, you have the whole line available to print a helpful error message for the user.

Clean Up

Make sure you explicitly close any stream you have (successfully) opened. This flushes any as-yet unwritten buffers, and avoids resource leaks.

fclose(fp);

这篇关于如何在 C 中读取/解析输入?常见问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆