从stringstream中高效地读取格式化的行 [英] Reading formatted lines from stringstream performantly
问题描述
为了从文本文件加载大量的顶点,我将整个文件加载到内存,然后我想扫描每一行的三个浮动。
以下作品,但我想知道它是好还是浪费。
std :: stringstream sstr ;
sstr<< file.rdbuf(); // file is an ifstream
for(uint i = 0; i {
sstr.getline(line,bufsz);
std :: istringstream iss(line);
iss>> verts [i] .x>> verts [i] .y>> verts [i] .z;
}
在参考网站cplusplus.com上,我发现了 istringstream
:
(2)初始化构造函数:构造带有str副本的istringstream对象作为内容。
因此,如果 istringstream
在构建时真正复制每个字符串,非常浪费,特别是因为我已经有一个stringstream在第一。
与 sscanf
相比,上面是确实很慢:1.94秒对于的0.56s> sscanf
。
code> iss ?
如何读取格式化的值, / p>
或更少的具体:上面的case的C ++方法是同样好的表现为 sscanf
?
确定,按照我的意见。我为你做了一些测试。
代码:
bloated_read.cpp :
#include< fstream&
#include< sstream>
#include< vector>
struct Vert {
double x,y,z;
};
int main(){
const int num_verts = 1000000;
const int buff_size = 1024;
std :: vector< Vert> verts(num_verts);
std :: ifstream file(numbers);
std :: stringstream sstr;
sstr<< file.rdbuf(); // file is an ifstream
char * line = new char [buff_size];
for(int i = 0; i< num_verts; ++ i)
{
sstr.getline(line,buff_size);
std :: istringstream iss(line);
iss>> verts [i] .x>> verts [i] .y>> verts [i] .z;
}
返回0;
}
请注意,我跳过了常见的include和 Vert
以下代码的定义。
slimmed_read.cpp:
int main(){
const int num_verts = 1000000;
std :: vector< Vert> verts(num_verts);
std :: ifstream file(numbers);
for(int i = 0; i {
file>> verts [i] .x>> verts [i] .y>> verts [i] .z;
}
返回0;
}
sscanf_read.cpp:
#include< cstdio> // sscanf
int main(){
const int num_verts = 1000000;
const int buff_size = 1024;
std :: vector< Vert> verts(num_verts);
std :: ifstream file(numbers);
char * line = new char [buff_size];
for(int i = 0; i {
file.getline(line,buff_size);
sscanf(line,%f%f%f,& verts [i] .x,& verts [i] .y,& verts [i] .z);
}
返回0;
}
结果:
我对num_verts 10 ^ 5和10 ^ 6做了两个测试,所以我改变了输入文件内容和代码中的相应行。总时钟运行时间:
size | 。肿slim | sscanf
10 ^ 5 | 0.401s | 0.242s | 0.190s
10 ^ 6 | 4.041s | 2.392s | 1.896s
似乎一致。您可以直接使用 ifstream
刷掉40%,如果使用 sscanf
来解析 line
。 10 ^ 6
的文件具有 56 444KB
,因此总解析效率为 29,07MB / s
。我会说,仍然有一些改进的余地,但是在我的系统上,我认为这是在硬盘限制的边界。
结论:
如果不需要进一步的加速,我会选择 sscanf
它仍然是相当简单的实现和理解。还要检查您的硬盘驱动器的速度,以确保它值得一试。
也不应该认为c和c ++访问数据之间没有混合。使用c ++设施将数据提取到缓冲区中,并使用c函数读取缓冲区。我想没关系。
For loading large amounts of vertices from a text file, I'm loading the whole file into memory and then I'd like to scan each line for the three floats. The following works, but I'd like to know whether it's good or wasteful.
std::stringstream sstr;
sstr << file.rdbuf(); // file is an ifstream
for (uint i=0; i<num_verts; ++i)
{
sstr.getline(line, bufsz);
std::istringstream iss(line);
iss >> verts[i].x >> verts[i].y >> verts[i].z;
}
On the reference site cplusplus.com I found the following for istringstream
:
(2) initialization constructor: Constructs an istringstream object with a copy of str as content.
So if istringstream
is really copying each string upon construction, then that's very wasteful, especially since I already have a stringstream in the first place.
Compared with sscanf
, the above is indeed quite slow: 1.94 seconds vs. 0.56s for sscanf
.
Is the string being copied upon initialization of iss
?
How would one read formatted values while simultaneously advancing in the string line-wise with just the stringstream?
Or less specific: What's the C++ method for the above case that's equally well performing as sscanf
?
OK, following my comment. I made some a tests for you.
Codes:
bloated_read.cpp:
#include <fstream>
#include <sstream>
#include <vector>
struct Vert {
double x,y,z;
};
int main(){
const int num_verts = 1000000;
const int buff_size = 1024;
std::vector<Vert> verts(num_verts);
std::ifstream file("numbers");
std::stringstream sstr;
sstr << file.rdbuf(); // file is an ifstream
char *line = new char [buff_size];
for (int i=0; i<num_verts; ++i)
{
sstr.getline(line, buff_size);
std::istringstream iss(line);
iss >> verts[i].x >> verts[i].y >> verts[i].z;
}
return 0;
}
Please note i skipped the common includes and Vert
definition for the following codes.
slimmed_read.cpp:
int main(){
const int num_verts = 1000000;
std::vector<Vert> verts(num_verts);
std::ifstream file("numbers");
for (int i=0; i<num_verts; ++i)
{
file >> verts[i].x >> verts[i].y >> verts[i].z;
}
return 0;
}
sscanf_read.cpp:
#include <cstdio> //sscanf
int main(){
const int num_verts = 1000000;
const int buff_size = 1024;
std::vector<Vert> verts(num_verts);
std::ifstream file("numbers");
char *line = new char [buff_size];
for (int i=0; i<num_verts; ++i)
{
file.getline(line, buff_size);
sscanf(line, "%f %f %f", &verts[i].x, &verts[i].y, &verts[i].z );
}
return 0;
}
Results:
I did two tests for num_verts 10^5 and 10^6, so I changed input file contents and appropriate line in code. Total wall-clock running time:
size | bloated | slim | sscanf
10^5 | 0.401s | 0.242s | 0.190s
10^6 | 4.041s | 2.392s | 1.896s
Seems consistent. You can brush off 40% by using ifstream
directly and get 25% more if you use sscanf
to parse the line
. The file for 10^6
had 56 444KB
, so the total parsing efficiency was 29,07MB/s
. I would say that there still is some room for improvement, however on my system I think this is on borderline of being HDD bound.
Conclusions:
I would go for sscanf
version if further speed ups are not needed. It is still fairly simple to implement and understand. Also check the speed of your HDDs to make sure it is worth a shot.
One should also not that there is no mixing between c and c++ style of accessing data. Data is extracted into a buffer using c++ facilities, and c functions are used to read the buffer. I guess it's ok.
这篇关于从stringstream中高效地读取格式化的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!