%ms和%s scanf之间的区别 [英] difference between %ms and %s scanf

查看:708
本文介绍了%ms和%s scanf之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读 scanf 手册我遇到下面这行:


米'字符。这与字符串转换一起使用
(%s,%c,%[),


例子说明了在某些情况下这种选择的不同和需要? C标准没有在 scanf() $ c $>格式。

GNU lib C以这种方式定义了可选的 a 指标 scanf ):


可选的 a 字符。这与字符串转换一起使用,并且减轻了调用者分配相应缓冲区以保存输入的需要:相反, scanf()分配足够大小的缓冲区,并且将该缓冲区的地址赋给相应的指针参数,该参数应该是指向 char * 变量的指针(该变量在调用之前不需要初始化)。



当缓冲区不再需要时,调用者应该随后 free >该缓冲区。这是一个GNU扩展; C99使用 a 字符作为转换说明符(在GNU实现中也可以这样使用它)。

手册页的 NOTES 部分表示:


<如果程序使用 gcc -std = c99 gcc编译,则code> a D_ISOC99_SOURCE (除非指定 _GNU_SOURCE ),在这种情况下, a 被解释为一个浮点数的说明符(见上面)。

从2.7版开始,glibc也提供了 m 修饰符与修饰符相同。 m 修饰符具有以下优点:


  • 它也可能是应用于%c 转换说明符(例如%3mc )。

  • $ b $它避免了对%a 浮点转换说明符的歧义(并且不受 gcc -std = c99影响
  • 它在即将到来的POSIX.1标准修订版中被指定。



http://linux.die.net/man/3/scanf 仅将此选项记录为:


可选的'm'字符。这用于字符串转换(%s %c %[),并且减轻了调用者分配一个相应的缓冲区来保存输入的需要:相反, scanf()分配一个足够大小的缓冲区,这个缓冲区的地址赋给相应的指针参数,它应该是一个指向 char * 变量的指针(这个变量在调用之前不需要初始化)。当调用者不再需要这个缓冲区时,调用者应该随后 free(3)>



Posix标准在POSIX.1-2008版本中记录了这个扩展(参见 http://pubs.opengroup.org/onlinepubs/9699919799/functions/fscanf.html ):



%c %s %[转换说明符应接受一个可选的赋值分配字符 m ,这将导致分配一个内存缓冲区来保存转换的字符串,包括终止空字符。在这种情况下,与转换说明符对应的参数应该是对指针变量的引用,该变量将接收指向已分配缓冲区的指针。系统应该分配一个缓冲区,就好像 malloc()已被调用一样。应用程序应负责在使用后释放内存。如果内存不足以分配缓冲区,函数应将 errno 设置为[ ENOMEM ],并且转换错误应为结果。如果该函数返回 EOF ,则通过此调用成功分配给使用赋值分配字符 m 的任何内存将被释放


使用此扩展名,您可以编写:

  char * p; 
scanf(%ms,& p);

导致 scanf 解析标准中的单词输入并分配足够的内存以存储其字符,并加上终止'\ 0'。指向分配数组的指针将被存储到 p 中,并且 scanf()将返回 1 ,除非不能从 stdin 读取非空白字符。



完全是其他系统可能会使用 m 来表示类似的语义或完全不同的东西。非标准的扩展是不可移植的,应该非常仔细地使用,在标准方法不切实际或完全不可能的情况下记录下来。注意,解析a任何大小的单词在 scanf()的标准版本中确实是不可能的:



你可以用最大尺寸,并应指定在'\ 0'之前存储的最大字符数:

  char buffer [20]; 
scanf(%19s,buffer);

但是这并不能告诉你在标准输入中可以解析多少个字符。在任何情况下,如果输入的时间足够长,未通过最大数量的字符可能会引发未定义的行为,并且攻击者甚至可能使用特制输入来危害您的程序:

  char buffer [20]; 
scanf(%s,buffer); //潜在的未定义行为,
//可能被攻击者利用。


Reading the scanf manual I encounter this line:

An optional 'm' character. This is used with string conversions (%s, %c, %[),

Can someone explain it with simple example stating the difference and the need of such option in some cases ?

解决方案

The C Standard does not define such an optional character in the scanf() formats.

The GNU lib C, does define an optional a indicator this way (from the man page for scanf):

An optional a character. This is used with string conversions, and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call).

The caller should subsequently free this buffer when it is no longer required. This is a GNU extension; C99 employs the a character as a conversion specifier (and it can also be used as such in the GNU implementation).

The NOTES section of the man page says:

The a modifier is not available if the program is compiled with gcc -std=c99 or gcc -D_ISOC99_SOURCE (unless _GNU_SOURCE is also specified), in which case the a is interpreted as a specifier for floating-point numbers (see above).

Since version 2.7, glibc also provides the m modifier for the same purpose as the a modifier. The m modifier has the following advantages:

  • It may also be applied to %c conversion specifiers (e.g., %3mc).

  • It avoids ambiguity with respect to the %a floating-point conversion specifier (and is unaffected by gcc -std=c99 etc.)

  • It is specified in the upcoming revision of the POSIX.1 standard.

The online linux manual page at http://linux.die.net/man/3/scanf only documents this option as:

An optional 'm' character. This is used with string conversions (%s, %c, %[), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required.

The Posix standard documents this extension in its POSIX.1-2008 edition (see http://pubs.opengroup.org/onlinepubs/9699919799/functions/fscanf.html ):

The %c, %s, and %[ conversion specifiers shall accept an optional assignment-allocation character m, which shall cause a memory buffer to be allocated to hold the string converted including a terminating null character. In such a case, the argument corresponding to the conversion specifier should be a reference to a pointer variable that will receive a pointer to the allocated buffer. The system shall allocate a buffer as if malloc() had been called. The application shall be responsible for freeing the memory after usage. If there is insufficient memory to allocate a buffer, the function shall set errno to [ENOMEM] and a conversion error shall result. If the function returns EOF, any memory successfully allocated for parameters using assignment-allocation character m by this call shall be freed before the function returns.

Using this extension, you could write:

char *p;
scanf("%ms", &p);

Causing scanf to parse a word from standard input and allocate enough memory to store its characters plus a terminating '\0'. A pointer to the allocated array would be stored into p and scanf() would return 1, unless no non whitespace characters can be read from stdin.

It is entirely possible that other systems use m for similar semantics or for something else entirely. Non-standard extensions are non portable and should be used very carefully, documented as such, in circumstances where a standard approach is cumbersome impractical or altogether impossible.

Note that parsing a word of arbitrary size is indeed impossible with the standard version of scanf():

You can parse a word with a maximum size and should specify the maximum number of characters to store before the '\0':

    char buffer[20];
    scanf("%19s", buffer);

But this does not tell you how many more characters are available to parse in standard input. In any case, not passing the maximum number of characters may invoke undefined behavior if the input is long enough, and specially crafted input may even be used by an attacker to compromise your program:

    char buffer[20];
    scanf("%s", buffer); // potential undefined behavior,
                         // that could be exploited by an attacker.

这篇关于%ms和%s scanf之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆