在C ++中从CSV获取数据的最快方法 [英] Fastest way to get data from a CSV in C++
问题描述
我有一个这样的大型CSV文件(大约75 MB):
I have a large CSV (75 MB approximately) of this kind:
1,2,4
5,2,0
1,6,3
8,3,1
...
然后我用以下代码存储数据:
And I store my data with this code:
#include <sstream>
#include <fstream>
#include <vector>
int main()
{
char c; // to eat the commas
int x, y, z;
std::vector<int> xv, yv, zv;
std::ifstream file("data.csv");
std::string line;
while (std::getline(file, line)) {
std::istringstream ss(line);
ss >> x >> c >> y >> c >> z;
xv.push_back(x);
yv.push_back(y);
zv.push_back(z);
}
return 0;
}
这花了我很大的CSV(〜75MB):
And it tooks me in this large CSV (~75MB):
real 0m7.389s
user 0m7.232s
sys 0m0.132s
太好了!
最近,我使用Sublime Text片段,找到了另一种读取文件的方法:
Recently, using a Snippet of Sublime Text, I found another way to read a file:
#include <iostream>
#include <vector>
#include <cstdio>
int main()
{
std::vector<char> v;
if (FILE *fp = fopen("data.csv", "r")) {
char buf[1024];
while (size_t len = fread(buf, 1, sizeof(buf), fp))
v.insert(v.end(), buf, buf + len);
fclose(fp);
}
}
这花了我很大的CSV(〜75MB)(无数据):
And it tooks me (without getting data) in this large CSV (~75MB):
real 0m0.118s
user 0m0.036s
sys 0m0.080s
那是时间上的巨大差异!
That's a huge difference on time!
问题是如何在char向量中以更快的方式在3个向量中获取数据!我不知道该如何以比第一个建议的方法更快的速度做事.
The question is how I can get the data in 3 vectors in a faster way in a vector of chars! I don't know how can I do in a faster way than the first proposed.
非常感谢! ^^
推荐答案
当然,您的第二个版本会更快-它仅将文件读取到内存中,而不会解析其中的值.与使用C风格I/O的第一个版本等效的是
Of course your second version will be much faster - it merely reads the file into memory, without parsing the values in it. The equivalent of the first version using C-style I/O would be along the lines of
if (FILE *fp = fopen("data.csv", "r")) {
while (fscanf(fp, "%d,%d,%d", &x, &y, &z) == 3) {
xv.push_back(x);
yv.push_back(y);
zv.push_back(z);
}
fclose(fp);
}
对我来说,它比C ++风格的版本快三倍.但是没有中间stringstream
which, for me, is about three times faster than the C++-style version. But a C++ version without the intermediate stringstream
while (file >> x >> c >> y >> c >> z) {
xv.push_back(x);
yv.push_back(y);
zv.push_back(z);
}
差不多快.
这篇关于在C ++中从CSV获取数据的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!