为什么std :: basic_istream :: ignore()提取的字符多于指定的字符? [英] Why does std::basic_istream::ignore() extract more characters than specified?

查看:82
本文介绍了为什么std :: basic_istream :: ignore()提取的字符多于指定的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码:

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main(int argc, char* argv[]) {
    stringstream buffer("1234567890 ");
    cout << "pos-before: " << buffer.tellg() << endl;
    buffer.ignore(10, ' ');
    cout << "pos-after: " << buffer.tellg() << endl;
    cout << "eof: " << buffer.eof() << endl;
}

它会产生以下输出:

pos-before: 0
pos-after: 11
eof: 0

我希望pos-after10而不是11.根据规范,当以下任何一种情况发生时,忽略方法应停止设置以下条件:

I would expect pos-after to be 10 and not 11. According to the specification, the ignore method should stop when any one of the following condition is set:

    提取了
  1. 个字符.在特殊情况下,当count等于std::numeric_limits<std::streamsize>::max()
  2. 时,将禁用此测试
  3. 文件条件结束出现在输入序列中,在这种情况下,该函数调用setstate(eofbit)
  4. 输入序列中下一个可用字符c是delim,由Traits::eq_int_type(Traits::to_int_type(c), delim)确定.分隔符字符被提取并丢弃.如果delim为Traits::eof()
  5. ,则此测试被禁用
  1. count characters were extracted. This test is disabled in the special case when count equals std::numeric_limits<std::streamsize>::max()
  2. end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
  3. the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()

在这种情况下,我希望规则1在所有其他规则之前触发,并在流位置为10时停止.

In this case I expect rule 1 to trigger before all the other rules and to stop when the stream position is 10.

执行表明并非如此.我误解了什么?

Execution shows that it is not the case. What did I misunderstood ?

我还尝试了代码的一种变体,其中我只忽略了9个字符.在这种情况下,输出是预期的输出:

I also tried a variation of the code where I ignore only 9 characters. In this case the output is the expected one:

pos-before: 0
pos-after: 9
eof: 0

因此,在ignore()提取字符数的情况下,它仍会检查下一个字符是否为delimiter,如果也是,它也会提取该字符. 我可以用g++clang++复制.

So it looks like in the case where ignore() extracted the count of characters, it still checks if the next character is the delimiter and if it is, it extracts it too. I can reproduce with g++ and clang++.

我还尝试了这种代码变体:

I also tried this variation of the code:

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main(int argc, char* argv[]) {
    cout << "--- 10x get\n";
    stringstream buffer("1234567890");
    cout << "pos-before: " << buffer.tellg() << '\n';
    for(int i=0; i<10; ++i)
        buffer.get();
    cout << "pos-after: " << buffer.tellg() << '\n';
    cout << "eof: " << buffer.eof() << '\n';
    
    cout << "--- ignore(10)\n";
    stringstream buffer2("1234567890");
    cout << "pos-before: " << buffer2.tellg() << '\n';
    buffer2.ignore(10);
    cout << "pos-after: " << buffer2.tellg() << '\n';
    cout << "eof: " << buffer2.eof() << '\n';
}

结果是:

--- 10x get
pos-before: 0
pos-after: 10
eof: 0
--- ignore(10)
pos-before: 0
pos-after: -1
eof: 1

我们看到使用ignore()会在文件上产生文件结束条件.表示ignore()确实尝试在提取10个字符之后提取字符 .但是在这种情况下,第三个条件被禁用,ignore()不应尝试查看下一个字符是什么.

We see that using ignore() produces an end-of-file condition on the file. Indicating that ignore() did try to extract a character after having extracted 10 characters. But in this case, the 3rd condition is disabled and ignore() should not have tried to look at what the next character was.

推荐答案

cppreference是臭名昭著的-您通常不应该在语言的极端情况下依赖它,而应参考规范,它指出:

cppreference is notorious -- you should generally not rely on it for corner cases in the language, and refer to the spec instead, which says:

效果:表现为未格式化的输入函数(如上所述).建立哨兵后 对象,提取字符并丢弃它们.提取字符,直到出现以下任何情况 发生:

Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:

  • n!= numeric_limits :: max()(18.3.2)并且已提取n个字符,因此 远
  • 文件结尾出现在输入序列上(在这种情况下,该函数调用setstate(eofbit), 可能会引发ios_base :: failure(27.5.5.4));
  • traits :: eq_int_type(traits :: to_int_type(c),delim)用于下一个可用的输入字符 c(在这种情况下将c提取出来).
  • n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
  • end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
  • traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).

使用任何"在这里,而不是其中的一个".清楚地表明,ignore将在多个条件之一适用的情况下停止.这是这里的主要问题-第一个条件和第三个条件都适用,这会带来一个未指定的特殊情况-第三个条件指出还将提取下一个可用字符(与定界符匹配).

Using "any of" here instead of "one of" makes it clear that ignore will stop if more than one of the conditions applies. That's essentiall the issue here -- both the first and thrid conditions apply, which brings up an underspecified corner case -- the third condition states that the next available character (that matches the delimiter) will also be extracted.

因此,这正是库在这种情况下所做的工作-第三个条件适用,因此它提取了字符.第一个条件也适用的事实并不重要.

So this is exactly what the library is doing in this case -- the third condition applies, so it extracts the character. The fact that the first condition also applies is immaterial.

这篇关于为什么std :: basic_istream :: ignore()提取的字符多于指定的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆