如何读取/解析输入用C?常见问题解答 [英] How to read / parse input in C? The FAQ

查看:104
本文介绍了如何读取/解析输入用C?常见问题解答的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有我的C程序问题,当我尝试读取/解析输入。

I have problems with my C program when I try to read / parse input.

帮助?

这是一个常见问题。

计算器具有的许多的与读取输入C,与通常在特定用户的特定问题集中没有真正画的全貌答案的问题。

StackOverflow has many questions related to reading input in C, with answers usually focussed on the specific problem of that particular user without really painting the whole picture.

这是讨论该问题​​的COM prehensively一种尝试,要特别注意常见的错误,所以大部分这些问题可以通过将它们标记为这一次的副本很简单地回答:

This is an attempt to cover the subject comprehensively, with special attention to common mistakes, so most of those questions can be answered simply by marking them as duplicates of this one:


  • 为什么最后一行打印两次?

  • 为什么我的 scanf函数(%d个,...) / scanf函数(%C,...)失败?

  • 为什么获得()崩溃?

  • ...

  • Why does the last line print twice?
  • Why does my scanf("%d", ...) / scanf("%c", ...) fail?
  • Why does gets() crash?
  • ...

答案被标记为社区维基。随意提高和(谨慎地)延伸。

The answer is marked as community wiki. Feel free to improve and (cautiously) extend.

推荐答案


  • 文本模式与二进制模式

  • 检查fopen()函数失败

  • 陷阱

    • 检查你调用成功任何功能

    • EOF,或者为什么最后一行打印两次

    • 请不要使用gets(),曾经

    • 请不要使用* scanf()的潜在恶意输入

    • 当* scanf()函数没有按预期


    • 阅读(部分),通过与fgets输入的一行()

    • 解析行内存

    一个二进制模式流中准确读​​,因为它已被写入。然而,有可能(也可能不会)是空字符( \\ 0 ')在流的末尾。

    A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('\0') appended at the end of the stream.

    一个文本模式流可以做许多的变换,包括(但不限于):

    A "text mode" stream may do a number of transformations, including (but not limited to):


    • 立即行结束前的空格去除;

    • 改变换行符(的'\\ n'),以输出别的东西(如\\ r \\ n在Windows上),并返回到的'\\ n'输入;

    • 添加,更改或删除既不是打印字符的字符( isprint判断(三)==真),水平制表符或新行。

    • removal of spaces immediately before a line-end;
    • changing newlines ('\n') to something else on output (e.g. "\r\n" on Windows) and back to '\n' on input;
    • adding, altering, or deleting characters that are neither printing characters (isprint( c ) == true), horizontal tabs, or new-lines.

    这应该是显而易见的文本和二进制模式不混合。在文本模式下打开文本文件,并以二进制模式二进制文件。

    It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.

    要打开文件的尝试可能会失败,因为各种原因 - 缺少权限,或找不到文件是最常见的。在这种情况下, fopen()函数将返回 NULL 指针。

    The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL pointer.

    它的可以的设置全局错误号变量,其中可利用变成一个纯文本错误信息的价值 PERROR();这是POSIX,而不是C语言的要求,因此它可能不是每一个平台上工作。

    It may set the global errno variable, the value of which can be turned into a plain-text error message using perror(); this is a requirement by POSIX, not the C language, so it may not work on every platform.

    #include <stdio.h>
    #include <errno.h>
    
    int main()
    {
        errno = 0;
        FILE * fp = fopen( "file.txt", "rb" );
        if ( fp != NULL )
        {
            // ready to read
        }
        else
        {
            // If supported by fopen(), will print a message
            // *why* fopen() failed, exactly.
            perror( "fopen() failed" );
        }
        fclose( fp );
    }
    

    缺陷

    检查您拨打成功的功能

    这应该是显而易见的。但是的的检查任何函数调用他们的返回值和错误处理,该文档的检查的那些条件。

    This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.

    这是当你及早发现病情,很容易失误,反而导致大量的头划伤的,如果你不知道。

    These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.

    EOF,或者为什么最后一行打印两次

    功能的feof(FILE *流)收益真正如果EOF已经达到。什么达到EOF其实就是一个误会让很多初学者写的东西是这样的:

    The function feof( FILE * stream ) returns true if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:

    // BROKEN CODE
    while ( ! feof( fp ) )
    {
        fgets( buffer, BUFFER_SIZE, fp );
        puts( buffer );
    }
    

    此使输入的打印的的最后一行两次的,因为当最后一行被读出(高达最终换行,在输入流中的最后一个字符), EOF为< STRONG>不可以设置。

    This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.

    EOF当您试图只读被设置的过去的最后一个字符!

    EOF only gets set when you attempt to read past the last character!

    所以上面的code循环一次,与fgets()无法读取另一条线,设置EOF的和叶<$ C $的内容C>缓存触及的,然后把它再次打印。

    So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer untouched, which then gets printed again.

    所以,检查的feof()之后的读取,但是的的处理:

    So, check for feof() after the read, but before processing:

    // GOOD CODE
    while ( fgets( buffer, BUFFER_SIZE, fp ) != NULL )
    {
        puts( buffer );
    }
    

    不要使用获得(),永远

    Do not use gets(), ever

    有没有办法安全地使用此功能。正因为如此,它一直的删除从C11的出现的语言。

    There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.

    不要使用 * scanf()的潜在的恶意输入

    Do not use *scanf() for potentially malformed input

    很多教程教你使用 * scanf()的读取任何类型的输入,因为它是如此多才多艺。

    Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.

    但目的 * scanf()的真是看,可以稍微的依赖的时在pdefined一个$ P $是批量数据格式。 (如被写的另一个程序。)

    But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)

    即使这样 * scanf()的并触发眛:


    • 使用格式字符串,在某种程度上可以由用户的影响是巨大的安全漏洞。

    • 如果输入不符合预期的格式, * scanf()的立即停止解析,留下任何剩余的参数初始化。

    • 它会告诉你有多少的分配的已成功完成,但不准确的地方才停分析输入,使优美的错误恢复困难。

    • 它会跳过输入,任何前导空格时,它没有( [ C ,以及除 N 转换)。 (见下段)。

    • 它在某个角落情况下,有些奇特的行为。

    • Using a format string that in some way can be influenced by the user is a gaping security hole.
    • If the input does not match the expected format, *scanf() immediately stops parsing, leaving any remaining arguments uninitialized.
    • It will tell you how many assignments it has successfully done, but not where exactly it stopped parsing the input, making graceful error recovery difficult.
    • It skips any leading whitespaces in the input, except when it does not ([, c, and n conversions). (See next paragraph.)
    • It has somewhat peculiar behaviour in some corner cases.

    * scanf()的如预期不起作用

    When *scanf() does not work as expected

    一个常见的​​问题 * scanf()的是当有未读的空白(的'\\ n',...)在输入流中,用户没有考虑。

    A frequent problem with *scanf() is when there is an unread whitespace (' ', '\n', ...) in the input stream that the user did not account for.

    阅读一些(%D等),或者一个字符串(%S ),停止在任何空白。虽然大部分的 * scanf()的转换指定的跳过的前导空格中输入 [ C N 没有。所以,换行仍然是一个未完成的输入字符,使得无论是%C %[不匹配。

    Reading a number ("%d" et al.), or a string ("%s"), stops at any whitespace. And while most *scanf() conversion specifiers skip leading whitespace in the input, [, c and n do not. So the newline is still the first pending input character, making either %c and %[ fail to match.

    您可以跳过输入换行符,通过明确地阅读它例如通过龟etc(),或通过添加一个空格,在 * scanf()的格式字符串。 (格式字符串匹配单个空格的任何的输入空格的数量。)

    You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)

    然而,不要拨打 fflush()您的输入流,如果你打算保持便携。这是明确只对POSIX平台;在纯C,调用 fflush()上的输入流是不确定的行为

    However, do not call fflush() on your input stream if you intend to remain portable. This is well-defined only for POSIX platforms; in plain C, calling fflush() on an input stream is undefined behaviour.

    我们只是劝不要使用 * scanf()的除非你真的,积极,知道自己在做什么。那么,是什么来作为替代品使用?

    We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?

    而不是阅读和分析输入一气呵成,如 * scanf()的试图这样做,分离步骤。

    Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.

    阅读(部分)通过输入行与fgets()

    Read (part of) a line of input via fgets()

    与fgets() 具有限制其输入到至少很多字节,避免您的缓冲区溢出的参数。如果输入线并融入您的缓冲区完全,在缓冲区中的最后一个字符将换行符(的'\\ n')。如果不是,你正在寻找一个部分读线。

    fgets() has a parameter for limiting its input to at least that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline ('\n'). If it is not, you are looking at a partially-read line.

    解析行内存

    特别有用的内存解析是与strtol()和 strtod()做功能的家庭,提供类似功能的 * scanf()的转换说明 D I U 0 X A 电子˚F先按g

    Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d, i, u, o, x, a, e, f, and g.

    但他们也告诉你的究竟的他们停止解析,并有数字太大,目标类型的有意义的处理。

    But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.

    除了这些,C提供广泛的字符串处理功能的。既然你已经在内存中的投入,始终准确地知道有多远你已经解析了,你可以走回去多少次你想使输入感。

    Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.

    如果一切都失败了,您有可打印有用的错误消息,该用户的整条生产线。

    And if all else fails, you have the whole line available to print a helpful error message for the user.

    fclose( fp );
    

    这篇关于如何读取/解析输入用C?常见问题解答的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆