如何优雅而有效地将文件读入向量? [英] How to read a file into a vector elegantly and efficiently?

查看:90
本文介绍了如何优雅而有效地将文件读入向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

using namespace std;

vector<char> f1()
{
    ifstream fin{ "input.txt", ios::binary };
    return
    {
        istreambuf_iterator<char>(fin),
        istreambuf_iterator<char>()
    };
}

vector<char> f2()
{
    vector<char> coll;
    ifstream fin{ "input.txt", ios::binary };
    char buf[1024];
    while (fin.read(buf, sizeof(buf)))
    {
        copy(begin(buf), end(buf),
            back_inserter(coll));
    }

    copy(begin(buf), begin(buf) + fin.gcount(),
        back_inserter(coll));

    return coll;
}

int main()
{
    f1();
    f2();
}

显然, f1() f2()更简洁;所以我更喜欢 f1()而不是 f2()。但是,我担心 f1()的效率不如 f2()

Obviously, f1() is more concise than f2(); so I prefer f1() to f2(). However, I worry that f1() is less efficient than f2().

所以,我的问题是:

主流C ++编译器会优化 f1( )使其与 f2()一样快?

Will the mainstream C++ compilers optimize f1() to make it as fast as f2()?

更新:

我使用了130M文件在发布模式下进行测试(Visual Studio 2015 with Clang 3.8):

I have used a file of 130M to test in release mode (Visual Studio 2015 with Clang 3.8):

f1()花费了 1614 ms,而 f2()需要 616 毫秒。

f1() takes 1614 ms, while f2() takes 616 ms.

f2() f1()快。

多么可悲的结果!

推荐答案

我已经使用 mingw482 在您这边检查了您的代码。
出于好奇,我通过以下实现添加了附加功能 f3

I've checked your code on my side using with mingw482. Out of curiosity I've added an additional function f3 with the following implementation:

inline vector<char> f3()
{
    ifstream fin{ filepath, ios::binary };
    fin.seekg (0, fin.end);
    size_t len = fin.tellg();
    fin.seekg (0, fin.beg);

    vector<char> coll(len);
    fin.read(coll.data(), len);
    return coll;
}

我已经使用文件〜9000M 长。对于我的平台,结果与您的结果有所不同。

I've tested using a file ~90M long. For my platform the results were a bit different than for you.


  • f1()〜850ms

  • f2()〜600ms

  • f3()〜70ms

结果计算如下平均连续10次读取文件。

The results were calculated as mean of 10 consecutive file reads.

f3 函数花费的时间最少,因为在向量< char> coll(len); 它已分配了所有必需的内存,并且无需进行其他重新分配。至于 back_inserter ,则要求该类型具有 push_back 成员函数。当超过容量时,哪个用于向量进行重新分配。如文档所述:

The f3 function takes the least time since at vector<char> coll(len); it has all the required memory allocated and no further reallocations need to be done. As to the back_inserter it requires the type to have push_back member function. Which for vector does the reallocation when capacity is exceeded. As described in docs:


push_back

这有效地将容器大小增加了一个,这会导致-b只有在新矢量大小超过当前矢量容量的情况下,才会自动重新分配已分配的存储空间。

This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.

f1 f2 中,后者是稍快一些,尽管它们都使用 back_inserter f2 可能更快,因为它可以分块读取文件,从而可以进行一些缓冲。

Among f1 and f2 implementations the latter is slightly faster although both use the back_inserter. The f2 is probably faster since it reads the file in chunks which allows some buffering to take place.

这篇关于如何优雅而有效地将文件读入向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆