如何优雅而有效地将文件读入向量? [英] How to read a file into a vector elegantly and efficiently?
问题描述
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<char> f1()
{
ifstream fin{ "input.txt", ios::binary };
return
{
istreambuf_iterator<char>(fin),
istreambuf_iterator<char>()
};
}
vector<char> f2()
{
vector<char> coll;
ifstream fin{ "input.txt", ios::binary };
char buf[1024];
while (fin.read(buf, sizeof(buf)))
{
copy(begin(buf), end(buf),
back_inserter(coll));
}
copy(begin(buf), begin(buf) + fin.gcount(),
back_inserter(coll));
return coll;
}
int main()
{
f1();
f2();
}
显然, f1()
比 f2()
更简洁;所以我更喜欢 f1()
而不是 f2()
。但是,我担心 f1()
的效率不如 f2()
。
Obviously, f1()
is more concise than f2()
; so I prefer f1()
to f2()
. However, I worry that f1()
is less efficient than f2()
.
所以,我的问题是:
主流C ++编译器会优化 f1( )
使其与 f2()
一样快?
Will the mainstream C++ compilers optimize f1()
to make it as fast as f2()
?
更新:
我使用了130M文件在发布模式下进行测试(Visual Studio 2015 with Clang 3.8):
I have used a file of 130M to test in release mode (Visual Studio 2015 with Clang 3.8):
f1()
花费了 1614
ms,而 f2()
需要 616
毫秒。
f1()
takes 1614
ms, while f2()
takes 616
ms.
f2()
比 f1()
快。
多么可悲的结果!
推荐答案
我已经使用 mingw482
在您这边检查了您的代码。
出于好奇,我通过以下实现添加了附加功能 f3
:
I've checked your code on my side using with mingw482
.
Out of curiosity I've added an additional function f3
with the following implementation:
inline vector<char> f3()
{
ifstream fin{ filepath, ios::binary };
fin.seekg (0, fin.end);
size_t len = fin.tellg();
fin.seekg (0, fin.beg);
vector<char> coll(len);
fin.read(coll.data(), len);
return coll;
}
我已经使用文件〜9000M
长。对于我的平台,结果与您的结果有所不同。
I've tested using a file ~90M
long. For my platform the results were a bit different than for you.
- f1()〜850ms
- f2()〜600ms
- f3()〜70ms
结果计算如下平均连续10次读取文件。
The results were calculated as mean of 10 consecutive file reads.
f3
函数花费的时间最少,因为在向量< char> coll(len);
它已分配了所有必需的内存,并且无需进行其他重新分配。至于 back_inserter ,则要求该类型具有 push_back
成员函数。当超过容量
时,哪个用于向量进行重新分配。如文档所述:
The f3
function takes the least time since at vector<char> coll(len);
it has all the required memory allocated and no further reallocations need to be done. As to the back_inserter it requires the type to have push_back
member function. Which for vector does the reallocation when capacity
is exceeded. As described in docs:
push_back
这有效地将容器大小增加了一个,这会导致-b只有在新矢量大小超过当前矢量容量的情况下,才会自动重新分配已分配的存储空间。
This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.
在 f1
和 f2
中,后者是稍快一些,尽管它们都使用 back_inserter
。 f2
可能更快,因为它可以分块读取文件,从而可以进行一些缓冲。
Among f1
and f2
implementations the latter is slightly faster although both use the back_inserter
. The f2
is probably faster since it reads the file in chunks which allows some buffering to take place.
这篇关于如何优雅而有效地将文件读入向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!