如何读/写矢量<组块* GT;作为内存映射文件(S)? [英] How to read/write vector<Chunk*> as memory mapped file(s)?

查看:122
本文介绍了如何读/写矢量<组块* GT;作为内存映射文件(S)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大组数据块(50GB〜)的。在我的code我必须能够做到以下几点:

I have a large set of data chunks (~50GB). In my code I have to be able to do the following things:


  1. 反复迭代所有块,并做他们一些计算。

  1. Repeatedly iterate over all chunks and do some computations on them.

反复迭代所有块,并做他们一些计算,每个迭代走访块的顺序是(尽可能)随机的。

Repeatedly iterate over all chunks and do some computations on them, where in each iteration the order of visited chunks is (as far as possible) randomized.

到目前为止,我已经分裂成数据10二进制文件,并多次(与的boost ::系列化创建)读一前一后进行计算。对于(2),我读了随机顺序的10个文件,并处理每一个序列,这是不够好。

So far, I have split the data into 10 binary files (created with boost::serialization) and repeatedly read one after the other and perform the computations. For (2), I read the 10 files in random order and process each one in sequence, which is good enough.

但是,读取文件中的一个(的boost ::系列化使用),需要很长的时间,我想加快速度。

However, reading the one of the files (using boost::serialization) takes a long time and I'd like to speed it up.

我可以使用内存映射文件,而不是的boost ::系列化

Can I use memory mapped files instead of boost::serialization?

在我特别想有一个矢量<块*> 中的每个文件。我希望能够以这样一种文件非常快读

In particular, I'd have a vector<Chunk*> in each file. I want to be able to read in such a file very, very quickly.

我如何读/写这样的矢量&lt;块*&GT; 数据结构?我看过的boost ::进程间:: file_mapping ,但我不知道该怎么做。

How can I read/write such a vector<Chunk*> data structure? I have looked at boost::interprocess::file_mapping, but I'm not sure how to do it.

我读到这( http://boost.cowic.de/rc/pdf/ interprocess.pdf ),但它并没有说太多关于内存映射文件。我想我会保存矢量&lt;块*&gt;首先在映射的内存,然后存储在大块本身。而且,矢量&lt;块*&GT; 实际上将成为 offset_ptr&LT;块&GT; * ,即offset_ptr数组?

I read this (http://boost.cowic.de/rc/pdf/interprocess.pdf), but it doesn't say much about memory mapped files. I think I'd store the vector<Chunk*> first in the mapped memory, then store the Chunks themselves. And, vector<Chunk*> would actually become offset_ptr<Chunk>*, i.e., an array of offset_ptr?

推荐答案

一个内存映射文件是一个内存块,因为它可能会以字节为单位举办任何其他内存,小端的话,比特或任何其它数据结构。如果便携性是一个问题(例如字节顺序),需要谨慎对待。

A memory mapped file is a chunk of memory, as any other memory it may be organized in bytes, little endian words, bits, or any other data structure. If portability is a concern (e.g. endianness) some care is needed.

以下code可以是一个很好的起点:

The following code may be a good starting point:

#include <cstdint>
#include <memory>
#include <vector>
#include <iostream>
#include <boost/iostreams/device/mapped_file.hpp>

struct entry {
  std::uint32_t a;
  std::uint64_t b;
} __attribute__((packed)); /* compiler specific, but supported 
                              in other ways by all major compilers */

static_assert(sizeof(entry) == 12, "entry: Struct size mismatch");
static_assert(offsetof(entry, a) == 0, "entry: Invalid offset for a");
static_assert(offsetof(entry, b) == 4, "entry: Invalid offset for b");

int main(void) {
  boost::iostreams::mapped_file_source mmap("map");
  assert(mmap.is_open());
  const entry* data_begin = reinterpret_cast<const entry*>(mmap.data());
  const entry* data_end = data_begin + mmap.size()/sizeof(entry);
  for(const entry* ii=data_begin; ii!=data_end; ++ii)
    std::cout << std::hex << ii->a << " " << ii->b << std::endl;
  return 0;
}

该data_begin和DATA_END指针可以与大多数STL函数用作任何其他迭代器。

The data_begin and data_end pointers can be used with most STL functions as any other iterator.

这篇关于如何读/写矢量&lt;组块* GT;作为内存映射文件(S)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆