istream提取的字符>>双 [英] Characters extracted by istream >> double

查看:97
本文介绍了istream提取的字符>>双的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例代码在Coliru

  #include< iostream> 
#include< sstream>
#include< string>

int main()
{
double d; std :: string s;

std :: istringstream iss(234cdefipxngh);
iss>> d;
iss.clear();
iss>> s;
std :: cout<< d<< ,< s < '\\\
;
}



我在这里读取N3337(可能是和C + +11)。在[istream.formatted.arithmetic]中,我们有(释义):


operator> );



与插入器的情况一样,这些提取器依赖于语言环境的num_get<>(22.4.2.1)object
执行解析输入流数据。这些提取器表现为格式化的输入函数(如在27.7.2.2.1中描述的
)。在构造了哨兵对象之后,转换就好像由以下代码片段执行:



typedef num_get< charT,istreambuf_iterator< charT,traits> > numget;

iostate err = iostate :: goodbit;

use_facet< numget>(loc).get(* this,0,* this,err,val);

setstate(err);


查看22.4.2.1:



< blockquote>

此操作的详细信息分三个阶段进行

- 阶段1:确定转换说明符

- 阶段2:从中提取字符并确定对于阶段1中确定的转换规范预期的格式
的相应char值。

- 阶段3:存储结果


在第二阶段的描述中,太长了,我不能在这里粘贴整个东西。然而,它清楚地说,所有的字符应该在转换之前提取;并进一步提取以下字符:




  • 任何 0123456789abcdefxABCDEFX + -

  • 区域设置 decimal_point()




最后,阶段3的规则包括:


- 对于浮点值,函数 strtold



要存储的数字值可以是以下之一:



- 如果转换函数无法转换整个字段,则为零。


这似乎清楚地指出我的代码的输出应该 0,'ipxngh'。但是,它实际上输出了别的东西。



这是一个编译器/库错误?有没有任何规定,我忽略一个区域设置来改变阶段2的行为? (在另一个问题某人发布了一个实际提取字符的系统示例,但也提取了不在N3337中指定的列表中的 ipxn )。



更新



正如perreal指出的,第2阶段的文本是相关的:


如果discard为true,那么如果'。'还没有被累积,那么字符
的位置被记住,字符否则被忽略。否则,如果'。'已经累加
,则字符被丢弃,阶段2终止。如果不被丢弃,则进行
检查以确定是否允许 c 作为阶段1返回的转换说明符的输入字段的下一个字符



如果字符被丢弃或累积,那么in被前进++ in并处理
返回到开始第2阶段。因此,如果字符在允许的字符列表中,则阶段2可以终止,但不是有效的字符,因此阶段2可以终止。 %g 。它不完全确切,但可能这是指从C99的 fscanf 的定义,它允许:



  • 非空的十进制数字序列,可选择包含小数点
    字符,然后是6.4.4.2中定义的可选指数部分;

  • a 0x或0X,然后是十六进制数字的非空序列,可选地包含
    小数点字符,然后是6.4.4.2中定义的可选二进制指数部分;

  • INF或INFINITY,忽略大小写

  • NAN或NAN(n-char-序列优先),忽略NAN部分中的大小写,其中:


以及


C区域设置,可以接受附加的区域设置特定主题序列表单。


因此,Coliru输出实际上是正确的;事实上,处理必须试图验证提取到%g 的有效输入的字符序列,同时提取每个字符。 / p>

下一个问题:像前面链接的线程一样,允许接受 i n p 等?



这些是%g 的有效字符,但它们不在允许Stage 2的原子列表中读取(即 c == 0 为我的最新报价,所以字符既不会被丢弃也不会累加)。

解决方案

这是一个混乱,因为很可能gcc / libstdc ++和clang / libc ++的实现都不符合。不清楚检查是否允许c作为阶段1返回的转换说明符的输入字段的下一个字符,但我认为使用短语下一个字符表示检查应该是上下文敏感的(即,取决于已经累积的字符),因此尝试解析例如21abc应当在遇到'a'。这与 LWG问题2041 ,它在C ++ 11的起草过程中删除后将这句话添加回标准。 libc ++未能这样做是错误17782



另一方面,libstdc ++拒绝通过 0 0xABp-4 >它显然不符合标准(它应该解析0xAB作为hexfloat,C99 fscanf <


/ code>, p n 请参见 LWG问题2381 。 p>

该标准非常精确地描述了处理过程 - 它必须由指定的代码段进行仿佛,不接受这些字符。比较 LWG问题221 的解决方案,其中他们添加 x X 到字符列表,因为 num_get 因为如此描述将不会解析整型输入的 0x



Clang / libc ++接受inf和nan以及hexfloat但不是infinity作为扩展名。请参见错误19611


Sample code at Coliru:

#include <iostream>
#include <sstream>
#include <string>

int main()
{
    double d; std::string s;

    std::istringstream iss("234cdefipxngh");
    iss >> d;
    iss.clear();
    iss >> s;
    std::cout << d << ", '" << s << "'\n";
}

I'm reading off N3337 here (presumably that is the same as C++11). In [istream.formatted.arithmetic] we have (paraphrased):

operator>>(double& val);

As in the case of the inserters, these extractors depend on the locale’s num_get<> (22.4.2.1) object to perform parsing the input stream data. These extractors behave as formatted input functions (as described in 27.7.2.2.1). After a sentry object is constructed, the conversion occurs as if performed by the following code fragment:

typedef num_get< charT,istreambuf_iterator<charT,traits> > numget;
iostate err = iostate::goodbit;
use_facet< numget >(loc).get(*this, 0, *this, err, val);
setstate(err);

Looking over to 22.4.2.1:

The details of this operation occur in three stages
— Stage 1: Determine a conversion specifier
— Stage 2: Extract characters from in and determine a corresponding char value for the format expected by the conversion specification determined in stage 1.
— Stage 3: Store results

In the description of Stage 2, it's too long for me to paste the whole thing here. However it clearly says that all characters should be extracted before conversion is attempted; and further that exactly the following characters should be extracted:

  • any of 0123456789abcdefxABCDEFX+-
  • The locale's decimal_point()
  • The locale's thousands_sep()

Finally, the rules for Stage 3 include:

— For a floating-point value, the function strtold.

The numeric value to be stored can be one of:

— zero, if the conversion function fails to convert the entire field.

This all seems to clearly specify that the output of my code should be 0, 'ipxngh'. However, it actually outputs something else.

Is this a compiler/library bug? Is there any provision that I'm overlooking for a locale to change the behaviour of Stage 2? (In another question someone posted an example of a system that does actually extract the characters, but also extracts ipxn which are not in the list specified in N3337).

Update

As pointed out by perreal, this text from Stage 2 is relevant:

If discard is true, then if ’.’ has not yet been accumulated, then the position of the character is remembered, but the character is otherwise ignored. Otherwise, if ’.’ has already been accumulated, the character is discarded and Stage 2 terminates. If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.

If the character is either discarded or accumulated then in is advanced by ++in and processing returns to the beginning of stage 2.

So, Stage 2 can terminate if the character is in the list of allowed characters, but is not a valid character for %g. It doesn't say exactly, but presumably this refers to the definition of fscanf from C99 , which allows:

  • a nonempty sequence of decimal digits optionally containing a decimal-point character, then an optional exponent part as defined in 6.4.4.2;
  • a 0x or 0X, then a nonempty sequence of hexadecimal digits optionally containing a decimal-point character, then an optional binary exponent part as defined in 6.4.4.2;
  • INF or INFINITY, ignoring case
  • NAN or NAN(n-char-sequence opt ), ignoring case in the NAN part, where:

and also

In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.

So, actually the Coliru output is correct; and in fact the processing must attempt to validate the sequence of characters extracted so far as a valid input to %g, while extracting each character.

Next question: is it permitted, as in the thread I linked to earlier, to accept i , n, p etc in Stage 2?

These are valid characters for %g , however they are not in the list of atoms which Stage 2 is allowed to read (i.e. c == 0 for my latest quote, so the character is neither discarded nor accumulated).

解决方案

This is a mess because it's likely that neither gcc/libstdc++'s nor clang/libc++'s implementation is conforming. It's unclear "a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1" means, but I think that the use of the phrase "next character" indicates that check should be context-sensitive (i.e., dependent on the characters that have already been accumulated), and so an attempt to parse, e.g., "21abc", should stop when 'a' is encountered. This is consistent with the discussion in LWG issue 2041, which added this sentence back to the standard after it had been deleted during the drafting of C++11. libc++'s failure to do so is bug 17782.

libstdc++, on the other hand, refuses to parse "0xABp-4" past the 0, which is actually clearly nonconforming based on the standard (it should parse "0xAB" as a hexfloat, as clearly allowed by the C99 fscanf specification for %g).

The accepting of i, p, and n is not allowed by the standard. See LWG issue 2381.

The standard describes the processing very precisely - it must be done "as if" by the specified code fragment, which does not accept those characters. Compare the resolution of LWG issue 221, in which they added x and X to the list of characters because num_get as then-described won't otherwise parse 0x for integer inputs.

Clang/libc++ accepts "inf" and "nan" along with hexfloats but not "infinity" as an extension. See bug 19611.

这篇关于istream提取的字符&gt;&gt;双的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆