C++ 在vector<string>中加载大型txt文件的快速方法 [英] C++ Fast way to load large txt file in vector&lt;string&gt;

查看:53
本文介绍了C++ 在vector<string>中加载大型txt文件的快速方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件 ~12.000.000 十六进制行和 1,6GB文件示例:

I have a file ~12.000.000 hex lines and 1,6GB Example of file:

999CBA166262923D53D3EFA72F5C4E8EE1E1FF1E7E33C42D0CE8B73604034580F2
889CBA166262923D53D3EFA72F5C4E8EE1E1FF1E7E33C42D0CE8B73604034580F2

代码示例:

vector<string>  buffer;

ifstream fe1("strings.txt");
string line1;
    while (getline(fe1, line1)) {
        buffer.push_back(line1);
    }

现在加载大约需要 20 分钟.任何建议如何加快速度?非常感谢.

Now the loading takes about 20 minutes. Any suggestions how to speed up? Thanks a lot in advance.

推荐答案

将大文本文件加载到 std::vector 中是相当低效和浪费的,因为它会分配堆内存对于每个 std::string 并多次重新分配向量.这些堆分配中的每一个都需要底层的堆簿记信息 (通常在 64-位系统),并且每一行都需要一个 std::string 对象(8-32 字节取决于标准库),因此以这种方式加载的文件会占用更多空间RAM 大于磁盘.

Loading a large text file into std::vector<std::string> is rather inefficient and wasteful because it allocates heap memory for each std::string and re-allocates the vector multiple times. Each of these heap allocations requires heap book-keeping information under the hood (normally 8 bytes per allocation on a 64-bit system), and each line requires an std::string object (8-32 bytes depending on the standard library), so that a file loaded this way takes a lot more space in RAM than on disk.

一种快速的方法是将文件映射到内存并实现迭代器来遍历其中的行.这回避了上述问题.

One fast way is to map the file into memory and implement iterators to walk over lines in it. This sidesteps the issues mentioned above.

工作示例:

#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/iterator/iterator_facade.hpp>
#include <boost/range/iterator_range_core.hpp>

#include <iostream>

class LineIterator
    : public boost::iterator_facade<
          LineIterator,
          boost::iterator_range<char const*>,
          boost::iterators::forward_traversal_tag,
          boost::iterator_range<char const*>
          >
{
    char const *p_, *q_;
    boost::iterator_range<char const*> dereference() const { return {p_, this->next()}; }
    bool equal(LineIterator b) const { return p_ == b.p_; }
    void increment() { p_ = this->next(); }
    char const* next() const { auto p = std::find(p_, q_, '\n'); return p + (p != q_); }
    friend class boost::iterator_core_access;

public:
    LineIterator(char const* begin, char const* end) : p_(begin), q_(end) {}
};

inline boost::iterator_range<LineIterator> crange(boost::interprocess::mapped_region const& r) {
    auto p = static_cast<char const*>(r.get_address());
    auto q = p + r.get_size();
    return {LineIterator{p, q}, LineIterator{q, q}};
}

inline std::ostream& operator<<(std::ostream& s, boost::iterator_range<char const*> const& line) {
    return s.write(line.begin(), line.size());
}

int main() {
    boost::interprocess::file_mapping file("/usr/include/gnu-versions.h", boost::interprocess::read_only);
    boost::interprocess::mapped_region memory(file, boost::interprocess::read_only);

    unsigned n = 0;
    for(auto line : crange(memory))
        std::cout << n++ << ' ' << line;
}

这篇关于C++ 在vector<string>中加载大型txt文件的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆