最简单的阅读映射到内存中的CSV文件的方法吗？ [英] Simplest way to read a CSV file mapped to memory?

查看：711 发布时间：2016/8/12 17:11:41 c++ csv boost io memory-mapped-files

本文介绍了最简单的阅读映射到内存中的CSV文件的方法吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我在C文件++（11）读我使用内存映射它们：

When I read from files in C++(11) I map them in to memory using:

boost::interprocess::file_mapping* fm = new file_mapping(path, boost::interprocess::read_only);
boost::interprocess::mapped_region* region = new mapped_region(*fm, boost::interprocess::read_only);
char* bytes = static_cast<char*>(region->get_address());

这很好，当我想用字节极快的读取字节。不过，我已经创建了我想映射到内存，读取每一行和逗号分割每行一个CSV文件。

Which is fine when I wish to read byte by byte extremely fast. However, I have created a csv file which I would like to map to memory, read each line and split each line on the comma.

有没有一种方法，我可以用我上面code的一些修改，这样做吗？

Is there a way I can do this with a few modifications of my above code?

（我映射到内存，因为我有一个可怕的很多记忆，我不想与磁盘/ IO流的瓶颈）。

(I am mapping to memory because I have an awful lot of memory and I do not want any bottleneck with disk/IO streaming).

推荐答案

下面是我采取的速度不够快。它通过拉链CSV 116 MIB（2.5Mio行^[1]）在〜1秒。

Here's my take on "fast enough". It zips through 116 MiB of CSV (2.5Mio lines^[1]) in ~1 second.

结果然后在零拷贝随机访问的，所以没有开销（除非页面被换出）。

The result is then randomly accessible at zero-copy, so no overhead (unless pages are swapped out).

对于比较：

这就是〜3倍速度更快不是一个天真的 WC csv.txt 发生在同一个文件

这是关于尽可能快地下面的Perl一个衬里（其中列出了所有线路上的不同领域的数）：

that's ~3x faster than a naive wc csv.txt takes on the same file
it's about as fast as the following perl one liner (which lists the distinct field counts on all lines):

perl -ne '$fields{scalar split /,/}++; END { map { print "$_\n" } keys %fields  }' csv.txt

这只是慢于（LANG = C WC csv.txt）这（约1.5倍），避免了区域功能

it's only slower than (LANG=C wc csv.txt) which avoids locale functionality (by about 1.5x)

下面是在所有的解析器它的荣耀：

Here's the parser in all it's glory:

using CsvField = boost::string_ref;
using CsvLine  = std::vector<CsvField>;
using CsvFile  = std::vector<CsvLine>;  // keep it simple :)

struct CsvParser : qi::grammar<char const*, CsvFile()> {
    CsvParser() : CsvParser::base_type(lines)
    {
        using namespace qi;

        field = raw [*~char_(",\r\n")] 
            [ _val = construct<CsvField>(begin(_1), size(_1)) ]; // semantic action
        line  = field % ',';
        lines = line  % eol;
    }
    // declare: line, field, fields
};

唯一棘手的事情（有唯一的优化）是语义动作来构建一个 CsvField 从源头迭代器与人物的匹配数量。

The only tricky thing (and the only optimization there) is the semantic action to construct a CsvField from the source iterator with the matches number of characters.

下面是主要的：

int main()
{
    boost::iostreams::mapped_file_source csv("csv.txt");

    CsvFile parsed;
    if (qi::parse(csv.data(), csv.data() + csv.size(), CsvParser(), parsed))
    {
        std::cout << (csv.size() >> 20) << " MiB parsed into " << parsed.size() << " lines of CSV field values\n";
    }
}

打印

116 MiB parsed into 2578421 lines of CSV values

您可以使用这些值，就像的std ::字符串：

You can use the values just as std::string:

for (int i = 0; i < 10; ++i)
{
    auto l     = rand() % parsed.size();
    auto& line = parsed[l];
    auto c     = rand() % line.size();

    std::cout << "Random field at L:" << l << "\t C:" << c << "\t" << line[c] << "\n";
}

它打印例如：

Random field at L:1979500    C:2    sateen's
Random field at L:928192     C:1    sackcloth's
Random field at L:1570275    C:4    accompanist's
Random field at L:479916     C:2    apparel's
Random field at L:767709     C:0    pinks
Random field at L:1174430    C:4    axioms
Random field at L:1209371    C:4    wants
Random field at L:2183367    C:1    Klondikes
Random field at L:2142220    C:1    Anthony
Random field at L:1680066    C:2    pines

完全工作样本是在这里的 住在Coliru

The fully working sample is here Live On Coliru

^[1]我通过反复追加的输出创建的文件

^[1] I created the file by repeatedly appending the output of

while read a && read b && read c && read d && read e
do echo "$a,$b,$c,$d,$e"
done < /etc/dictionaries-common/words

到 csv.txt ，直到数250万线。

这篇关于最简单的阅读映射到内存中的CSV文件的方法吗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

最简单的阅读映射到内存中的CSV文件的方法吗？ [英] Simplest way to read a CSV file mapped to memory?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

最简单的阅读映射到内存中的CSV文件的方法吗？ [英] Simplest way to read a CSV file mapped to memory?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭