C ++：快速的方法来读取文件映射到一个矩阵 [英] C++: Fast way to read mapped file into a matrix

查看：228 发布时间：2016/8/12 17:16:21 c++ memory boost

本文介绍了C ++：快速的方法来读取文件映射到一个矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想读一个映射文件转换成一个矩阵。该文件是这样的：

I'm trying to read a mapped file into a matrix. The file is something like this:

name;phone;city\n
Luigi Rossi;02341567;Milan\n
Mario Bianchi;06567890;Rome\n
....

和它的宁静大。在code我已经写了正常工作，但它不是那么快：

and it's quiet big. The code I've written works properly but it's not so fast:

#include <iostream>
#include <fstream>
#include <string>
#include <boost/iostreams/device/mapped_file.hpp>

using namespace std;

int main() {

    int i;
    int j=0;
    int k=0;

    vector< vector<char> > M(10000000, vector<string>(3));

    mapped_file_source file("file.csv");

    // Check if file was successfully opened
    if(file.is_open()) {

      // Get pointer to the data
      const char * c = (const char *)file.data();

      int size=file.size();

      for(i = 0; i < (size+1); i++){

       if(c[i]=='\n' || i==size){
        j=j+1;
        k=0;
       }else if(c[i]==';'){
        k=k+1;
       }else{
        M[j][k]+=c[i];
       }    
     }//end for


   }//end if    

 return(0)


}

有一个更快的方法？我读过一些有关memcyp，但我不知道如何使用它来加快我的code。

Is there a faster way? I've read something about memcyp but I don't know how to use it to speed up my code.

推荐答案

我有无数的例子做这个/类似的这么写的。

I have numerous examples doing this/similar written up on SO.

让我列举最相关的：

我已经做了相当多的这些基准。是的，对于连续freading，读/ scanf函数有一个微小的边缘（如见 scanf函数/输入输出流和文件与映射和解析浮动，或被阅读稍微1通顺序读取速度更快）。

这是有趣的方法是做懒洋洋地解析（为什么复制整个输入到内存？有什么意义内存映射的话）。这里的答案显示，这种方法（模拟一个multimap中有）：

An interesting approach is to do parsing lazily (why copy the whole input into memory? What's the point memory mapping then). The answer here shows this approach (emulating a multimap there):

Using提高::输入输出流:: mapped_file_source用的std :: multimap中（方法＃2）

Using boost::iostreams::mapped_file_source with std::multimap (approach #2)

在所有其他情况下，可以考虑一声就可以了灵奇的工作，可能使用的boost :: string_ref 而不是矢量＆lt;烧焦＆GT; （除非映射文件不是常量，当然）。

In all other cases, consider slamming a Spirit Qi job on it, potentially using boost::string_ref instead of vector<char> (unless the mapped file is not "const", of course).

的 string_ref 也显示INT之前链接的最后答案。这另一个有趣的例子（懒转换到非转义的字符串值）是这里的How与Boost.X pressive正确解析胡子？

The string_ref is also shown int the last answer linked before. Another interesting example of this (with lazy conversions to un-escaped string values) is here How to parse mustache with Boost.Xpressive correctly?

下面是齐的工作就可以了抨击：

Here's that Qi job slammed on it:

它解析的〜在2.9s 3200万线994 MIB文件插入到一个vector

it parses a 994 MiB file of ~32 million lines in 2.9s into a vector of

struct Line {
    boost::string_ref name, city;
    long id;
};

请注意，我们参照它们的位置在存储器映射+长度解析数，并存储字符串（ string_ref ）

<大骨节病> 住在Coliru

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/utility/string_ref.hpp>

namespace qi = boost::spirit::qi;
using sref   = boost::string_ref;

namespace boost { namespace spirit { namespace traits {
    template <typename It>
    struct assign_to_attribute_from_iterators<sref, It, void> {
        static void call(It f, It l, sref& attr) { attr = { f, size_t(std::distance(f,l)) }; }
    };
} } }

struct Line {
    sref name, city;
    long id;
};

BOOST_FUSION_ADAPT_STRUCT(Line, (sref,name)(long,id)(sref,city))

int main() {
    boost::iostreams::mapped_file_source mmap("input.txt");

    using namespace qi;

    std::vector<Line> parsed;
    parsed.reserve(32000000);
    if (phrase_parse(mmap.begin(), mmap.end(), 
                omit[+graph] >> eol >>
                (raw[*~char_(";\r\n")] >> ';' >> long_ >> ';' >> raw[*~char_(";\r\n")]) % eol,
                qi::blank, parsed))
    {
        std::cout << "Parsed " << parsed.size() << " lines\n";
    } else {
        std::cout << "Failed after " << parsed.size() << " lines\n";
    }

    std::cout << "Printing 10 random items:\n";
    for(int i=0; i<10; ++i) {
        auto& line = parsed[rand() % parsed.size()];
        std::cout << "city: '" << line.city << "', id: " << line.id << ", name: '" << line.name << "'\n";
    }
}

使用生成的输入像

do grep -v "'" /etc/dictionaries-common/words | sort -R | xargs -d\\n -n 3 | while read a b c; do echo "$a $b;$RANDOM;$c"; done

的输出例如

Parsed 31609499 lines
Printing 10 random items:
city: 'opted', id: 14614, name: 'baronets theosophy'
city: 'denominated', id: 24260, name: 'insignia ophthalmic'
city: 'mademoiselles', id: 10791, name: 'smelter orienting'
city: 'ducked', id: 32155, name: 'encircled flippantly'
city: 'garotte', id: 3080, name: 'keeling South'
city: 'emirs', id: 14511, name: 'Aztecs vindicators'
city: 'characteristically', id: 5473, name: 'constancy Troy'
city: 'savvy', id: 3921, name: 'deafer terrifically'
city: 'misfitted', id: 14617, name: 'Eliot chambray'
city: 'faceless', id: 24481, name: 'shade forwent'

这篇关于C ++：快速的方法来读取文件映射到一个矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

C ++：快速的方法来读取文件映射到一个矩阵 [英] C++: Fast way to read mapped file into a matrix

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

C ++：快速的方法来读取文件映射到一个矩阵 [英] C++: Fast way to read mapped file into a matrix

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭