通过istringstream的c ++字符串tokenize的性能开销 [英] performance overhead of c++ string tokenize via istringstream

查看:326
本文介绍了通过istringstream的c ++字符串tokenize的性能开销的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道

的性能开销是什么。

  string line,word; 
while(std :: getline(cin,line))
{
istringstream istream(line);
while(istream>> word)
// parse word here
}


b $ b

我认为这是标记 c ++ 的方式来令牌化输入。






  • 每行复制三次,首先通过 getline ,然后通过 istream 构造函数,最后通过 operator>> 为每个单词?

  • 建筑&破坏 istream 是一个问题?如果在外部之前定义 istream 循环?



    • 感谢!



      更新:



        string line,word; 
      stringstream stream;
      while(std :: getline(cin,line))
      {
      stream.clear();
      stream<<线;
      while(stream>> word)
      // parse word here
      }


      b $ b

      使用流作为局部栈,推动行,并弹出单词。
      这将摆脱可能的频繁构造函数&析构函数调用,并利用流内部缓冲效果(是否正确?)。



      :: string支持运算符<< operator>> ,或者扩展iostream支持sth。如 locate_new_line

      解决方案

      不幸的是,iostreams不适用于性能密集型工作。问题是不是在内存中复制内容(复制字符串是快速的),它是虚函数调度,可能调用每个字符的几个间接函数调用。



      关于复制的问题,是的,当你初始化一个新的 stringstream 时,所有内容都被复制。 (字符也通过 getline >> 从流复制到输出字符串, )



      使用C ++ 11的 move 工具,可以消除无关的副本:

        string line,word; 
      while(std :: getline(cin,line))//初始化行
      {//将数据从行移动到istream(因此它不再是行):
      istringstream istream :: move(line));
      while(istream>> word)
      // parse word here
      }


      b $ b

      总之,性能只是一个问题,如果一个测量工具告诉你它是。 Iostream是灵活和健壮的, filebuf 基本上足够快,所以你可以原型的代码,使其工作,然后优化瓶颈,而不重写一切。


      I would like to know what's the performance overhead of

      string line, word;
      while (std::getline(cin, line))
      {
          istringstream istream(line);
          while (istream >> word)
              // parse word here
      }
      

      I think this is the standard c++ way to tokenize input.

      To be specific:

      • Does each line copied three times, first via getline, then via istream constructor, last via operator>> for each word?
      • Would frequent construction & destruction of istream be an issue? What's the equivalent implementation if I define istream before the outer while loop?

      Thanks!

      Update:

      An equivalent implementation

      string line, word;
      stringstream stream;
      while (std::getline(cin, line))
      {
          stream.clear();
          stream << line;
          while (stream >> word)
              // parse word here
      }
      

      uses a stream as a local stack, that pushes lines, and pops out words. This would get rid of possible frequent constructor & destructor call in the previous version, and utilize stream internal buffering effect (Is this point correct?).

      Alternative solutions, might be extends std::string to support operator<< and operator>>, or extends iostream to support sth. like locate_new_line. Just brainstorming here.

      解决方案

      Unfortunately, iostreams is not for performance-intensive work. The problem is not copying things in memory (copying strings is fast), it's virtual function dispatches, potentially to the tune of several indirect function calls per character.

      As for your question about copying, yes, as written everything gets copied when you initialize a new stringstream. (Characters also get copied from the stream to the output string by getline or >>, but that obviously can't be prevented.)

      Using C++11's move facility, you can eliminate the extraneous copies:

      string line, word;
      while (std::getline(cin, line)) // initialize line
      {       // move data from line into istream (so it's no longer in line):
          istringstream istream( std::move( line ) );
          while (istream >> word)
              // parse word here
      }
      

      All that said, performance is only an issue if a measurement tool tells you it is. Iostreams is flexible and robust, and filebuf is basically fast enough, so you can prototype the code so it works and then optimize the bottlenecks without rewriting everything.

      这篇关于通过istringstream的c ++字符串tokenize的性能开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆