从stringstream中高效地读取格式化的行 [英] Reading formatted lines from stringstream performantly

查看:202
本文介绍了从stringstream中高效地读取格式化的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了从文本文件加载大量的顶点,我将整个文件加载到内存,然后我想扫描每一行的三个浮动。
以下作品,但我想知道它是好还是浪费。

  std :: stringstream sstr ; 
sstr<< file.rdbuf(); // file is an ifstream

for(uint i = 0; i {
sstr.getline(line,bufsz);
std :: istringstream iss(line);
iss>> verts [i] .x>> verts [i] .y>> verts [i] .z;
}



在参考网站cplusplus.com上,我发现了 istringstream


(2)初始化构造函数:构造带有str副本的istringstream对象作为内容。


因此,如果 istringstream 在构建时真正复制每个字符串,非常浪费,特别是因为我已经有一个stringstream在第一。



sscanf 相比,上面是确实很慢:1.94秒对于的0.56s> sscanf



code> iss ?



如何读取格式化的值, / p>

或更少的具体:上面的case的C ++方法是同样好的表现为 sscanf

解决方案

确定,按照我的意见。我为你做了一些测试。



代码:



bloated_read.cpp :

  #include< fstream& 
#include< sstream>
#include< vector>

struct Vert {
double x,y,z;
};

int main(){
const int num_verts = 1000000;
const int buff_size = 1024;
std :: vector< Vert> verts(num_verts);
std :: ifstream file(numbers);
std :: stringstream sstr;
sstr<< file.rdbuf(); // file is an ifstream
char * line = new char [buff_size];

for(int i = 0; i< num_verts; ++ i)
{
sstr.getline(line,buff_size);
std :: istringstream iss(line);
iss>> verts [i] .x>> verts [i] .y>> verts [i] .z;
}

返回0;
}

请注意,我跳过了常见的include和 Vert 以下代码的定义。



slimmed_read.cpp:

  int main(){
const int num_verts = 1000000;
std :: vector< Vert> verts(num_verts);
std :: ifstream file(numbers);

for(int i = 0; i {
file>> verts [i] .x>> verts [i] .y>> verts [i] .z;
}

返回0;
}

sscanf_read.cpp:

  #include< cstdio> // sscanf 
int main(){
const int num_verts = 1000000;
const int buff_size = 1024;
std :: vector< Vert> verts(num_verts);
std :: ifstream file(numbers);
char * line = new char [buff_size];

for(int i = 0; i {
file.getline(line,buff_size);
sscanf(line,%f%f%f,& verts [i] .x,& verts [i] .y,& verts [i] .z);
}

返回0;
}

结果



我对num_verts 10 ^ 5和10 ^ 6做了两个测试,所以我改变了输入文件内容和代码中的相应行。总时钟运行时间:

  size | 。肿slim | sscanf 
10 ^ 5 | 0.401s | 0.242s | 0.190s
10 ^ 6 | 4.041s | 2.392s | 1.896s

似乎一致。您可以直接使用 ifstream 刷掉40%,如果使用 sscanf 来解析 line 10 ^ 6 的文件具有 56 444KB ,因此解析效率为 29,07MB / s 。我会说,仍然有一些改进的余地,但是在我的系统上,我认为这是在硬盘限制的边界。



结论:



如果不需要进一步的加速,我会选择 sscanf 它仍然是相当简单的实现和理解。还要检查您的硬盘驱动器的速度,以确保它值得一试。



也不应该认为c和c ++访问数据之间没有混合。使用c ++设施将数据提取到缓冲区中,并使用c函数读取缓冲区。我想没关系。


For loading large amounts of vertices from a text file, I'm loading the whole file into memory and then I'd like to scan each line for the three floats. The following works, but I'd like to know whether it's good or wasteful.

std::stringstream sstr;
sstr << file.rdbuf();    // file is an ifstream

for (uint i=0; i<num_verts; ++i)
{
    sstr.getline(line, bufsz);
    std::istringstream iss(line);
    iss >> verts[i].x >> verts[i].y >> verts[i].z;
}

On the reference site cplusplus.com I found the following for istringstream:

(2) initialization constructor: Constructs an istringstream object with a copy of str as content.

So if istringstream is really copying each string upon construction, then that's very wasteful, especially since I already have a stringstream in the first place.

Compared with sscanf, the above is indeed quite slow: 1.94 seconds vs. 0.56s for sscanf.

Is the string being copied upon initialization of iss?

How would one read formatted values while simultaneously advancing in the string line-wise with just the stringstream?

Or less specific: What's the C++ method for the above case that's equally well performing as sscanf?

解决方案

OK, following my comment. I made some a tests for you.

Codes:

bloated_read.cpp:

#include <fstream>
#include <sstream>
#include <vector>

struct Vert {
 double x,y,z;
};

int main(){
const int num_verts = 1000000;
const int buff_size = 1024;
std::vector<Vert> verts(num_verts);
std::ifstream file("numbers");
std::stringstream sstr;
sstr << file.rdbuf();    // file is an ifstream
char *line = new char [buff_size];

for (int i=0; i<num_verts; ++i)
{
    sstr.getline(line, buff_size);
    std::istringstream iss(line);
    iss >> verts[i].x >> verts[i].y >> verts[i].z;
}

return 0;
}

Please note i skipped the common includes and Vert definition for the following codes.

slimmed_read.cpp:

int main(){
const int num_verts = 1000000;
std::vector<Vert> verts(num_verts);
std::ifstream file("numbers");

for (int i=0; i<num_verts; ++i)
{
    file >> verts[i].x >> verts[i].y >> verts[i].z;
}

return 0;
}

sscanf_read.cpp:

#include <cstdio> //sscanf
int main(){
const int num_verts = 1000000;
const int buff_size = 1024;
std::vector<Vert> verts(num_verts);
std::ifstream file("numbers");
char *line = new char [buff_size];

for (int i=0; i<num_verts; ++i)
{
    file.getline(line, buff_size);
    sscanf(line, "%f %f %f", &verts[i].x, &verts[i].y, &verts[i].z );
}

return 0;
}

Results:

I did two tests for num_verts 10^5 and 10^6, so I changed input file contents and appropriate line in code. Total wall-clock running time:

size | bloated | slim   | sscanf
10^5 |  0.401s | 0.242s | 0.190s
10^6 |  4.041s | 2.392s | 1.896s

Seems consistent. You can brush off 40% by using ifstream directly and get 25% more if you use sscanf to parse the line. The file for 10^6 had 56 444KB, so the total parsing efficiency was 29,07MB/s. I would say that there still is some room for improvement, however on my system I think this is on borderline of being HDD bound.

Conclusions:

I would go for sscanf version if further speed ups are not needed. It is still fairly simple to implement and understand. Also check the speed of your HDDs to make sure it is worth a shot.

One should also not that there is no mixing between c and c++ style of accessing data. Data is extracted into a buffer using c++ facilities, and c functions are used to read the buffer. I guess it's ok.

这篇关于从stringstream中高效地读取格式化的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆