通过istringstream的c ++字符串tokenize的性能开销 [英] performance overhead of c++ string tokenize via istringstream
问题描述
我想知道
的性能开销是什么。 string line,word;
while(std :: getline(cin,line))
{
istringstream istream(line);
while(istream>> word)
// parse word here
}
b $ b
我认为这是标记 c ++
的方式来令牌化输入。
- 每行复制三次,首先通过
getline
,然后通过istream
构造函数,最后通过operator>>
为每个单词? - 建筑&破坏
istream
是一个问题?如果在外部之前定义
循环?istream
而
感谢!
更新:
string line,word;
stringstream stream;
while(std :: getline(cin,line))
{
stream.clear();
stream<<线;
while(stream>> word)
// parse word here
}
b $ b
使用流作为局部栈,推动行,并弹出单词。
这将摆脱可能的频繁构造函数&析构函数调用,并利用流内部缓冲效果(是否正确?)。
:: string支持运算符<<
和 operator>>
,或者扩展iostream支持sth。如 locate_new_line
。
不幸的是,iostreams不适用于性能密集型工作。问题是不是在内存中复制内容(复制字符串是快速的),它是虚函数调度,可能调用每个字符的几个间接函数调用。
关于复制的问题,是的,当你初始化一个新的 stringstream
时,所有内容都被复制。 (字符也通过 getline
或>>
从流复制到输出字符串, )
使用C ++ 11的 move
工具,可以消除无关的副本:
string line,word;
while(std :: getline(cin,line))//初始化行
{//将数据从行移动到istream(因此它不再是行):
istringstream istream :: move(line));
while(istream>> word)
// parse word here
}
b $ b
总之,性能只是一个问题,如果一个测量工具告诉你它是。 Iostream是灵活和健壮的, filebuf
基本上足够快,所以你可以原型的代码,使其工作,然后优化瓶颈,而不重写一切。
I would like to know what's the performance overhead of
string line, word;
while (std::getline(cin, line))
{
istringstream istream(line);
while (istream >> word)
// parse word here
}
I think this is the standard c++
way to tokenize input.
To be specific:
- Does each line copied three times, first via
getline
, then viaistream
constructor, last viaoperator>>
for each word? - Would frequent construction & destruction of
istream
be an issue? What's the equivalent implementation if I defineistream
before the outerwhile
loop?
Thanks!
Update:
An equivalent implementation
string line, word;
stringstream stream;
while (std::getline(cin, line))
{
stream.clear();
stream << line;
while (stream >> word)
// parse word here
}
uses a stream as a local stack, that pushes lines, and pops out words. This would get rid of possible frequent constructor & destructor call in the previous version, and utilize stream internal buffering effect (Is this point correct?).
Alternative solutions, might be extends std::string to support operator<<
and operator>>
, or extends iostream to support sth. like locate_new_line
. Just brainstorming here.
Unfortunately, iostreams is not for performance-intensive work. The problem is not copying things in memory (copying strings is fast), it's virtual function dispatches, potentially to the tune of several indirect function calls per character.
As for your question about copying, yes, as written everything gets copied when you initialize a new stringstream
. (Characters also get copied from the stream to the output string by getline
or >>
, but that obviously can't be prevented.)
Using C++11's move
facility, you can eliminate the extraneous copies:
string line, word;
while (std::getline(cin, line)) // initialize line
{ // move data from line into istream (so it's no longer in line):
istringstream istream( std::move( line ) );
while (istream >> word)
// parse word here
}
All that said, performance is only an issue if a measurement tool tells you it is. Iostreams is flexible and robust, and filebuf
is basically fast enough, so you can prototype the code so it works and then optimize the bottlenecks without rewriting everything.
这篇关于通过istringstream的c ++字符串tokenize的性能开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!