C ++随机访问迭代器,用于按需加载元素的容器 [英] C++ random access iterators for containers with elements loaded on demand

查看:215
本文介绍了C ++随机访问迭代器,用于按需加载元素的容器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发一个小项目,需要从文件加载消息。消息按顺序存储在文件中,文件可能变得很大,因此将整个文件内容加载到内存中是不值得的。



因此我们决定实现 FileReader 类能够快速移动到文件中的特定元素并根据请求加载它们。常用的内容如下:

  SpecificMessage m; 
FileReader fr;
fr.open(file.bin);
fr.moveTo(120); //转到消息#120
fr.read(& m); //尝试反序列化为SpecificMessage

FileReader本身效果很好。因此,我们考虑添加符合STL的迭代器支持:随机访问迭代器,它提供对特定消息的只读引用。按以下方式使用

  for(auto iter = fr.begin< SpecificMessage>(); iter!= fr.end< SpecificMessage>(); ++ iter){
// ...
}

备注:以上假设该文件仅包含SpecialMessage类型的消息。我们一直在使用 boost :: iterator_facade 来简化实现。



现在我的问题归结为:如何正确实现迭代器?由于 FileReader 实际上并不在内部保存一系列消息,而是根据请求加载它们。



到目前为止我们尝试过:



将消息存储为迭代器成员



此方法将消息存储在迭代器实例中。哪个适用于简单的用例,但无法用于更复杂的用途。例如。 std :: reverse_iterator 有一个看起来像这样的取消引用操作

 参考operator *()const 
{//返回指定值
_RanIt _Tmp = current;
return(* --_ Tmp);
}

这会破坏我们的方法,因为对来自临时迭代器的消息的引用被返回。



使引用类型等于值类型



@DDrmmr注释建议使引用类型等于值类型,以便返回内部存储对象的副本。但是,我认为这对于实现 - >运算符的反向迭代器无效。

 指针运算符 - >() const {
return(& ** this);
}

derefs本身,调用*运算符然后返回临时副本最后返回此临时地址。



从外部存储消息



或者我可以在外部存储消息:

  SpecificMessage m; 
auto iter = fr.begin< SpecificMessage>(& m);
// ... ...

这似乎也有缺陷

  auto iter2 = iter + 2 

将同时包含 iter2 iter 指向同一内容。

解决方案

正如我在其他答案中暗示的那样,您可以考虑使用内存映射文件。你在评论中提到:


就内存映射文件而言,这似乎不是我想要的,你会如何为他们提供一个超过SpecificMessages的迭代器?


好吧,如果您的SpecificMessage是POD类型,你可以只是直接迭代原始内存。如果没有,您可以使用反序列化帮助程序(如您所知)并使用Boost transform_iterator 按需进行反序列化。



<请注意,我们可以使内存映射文件托管,这实际上意味着您可以将其用作常规堆,并且可以存储所有标准容器。这包括基于节点的容器( map<> ,例如),动态大小容器(例如 vector<> )除了固定大小的容器( array<> ) - 以及它们的任何组合。



这是一个演示,它包含一个包含字符串的简单 SpecificMessage ,并且(de)将其直接派生到共享内存中:

 使用blob_t = shm :: vector< uint8_t> ;; 
使用shared_blobs = shm :: vector< blob_t> ;;

您感兴趣的部分将是消费部分:

  bip :: managed_mapped_file mmf(bip :: open_only,DBASE_FNAME); 
shared_blobs * table = mmf.find_or_construct< shared_blobs>(blob_table)(mmf.get_segment_manager());

使用It = boost :: transform_iterator< LazyLoader< SpecificMessage>,shared_blobs :: const_reverse_iterator> ;;

//为了好玩,我们将blob
反转为(它首先(table-> rbegin()),last(table-> rend()); first< last ; first + = 13)
std :: cout<< blob:'<< first-> contents<< \\\
;

//任何类型的随机访问都可以,但是:
auto random = rand()%table-> size();
SpecificMessage消息;
load(table-> at(random),msg);
std :: cout<< 随机斑点#<<随机<< :'<< msg.contents<< \\\
;

因此,这会以相反的顺序打印每条第13条消息,然后是随机blob。



完整演示



在线样本使用来源行作为消息。



Live on Coliru

  #include< boost / interprocess / file_mapping.hpp> 
#include< boost / interprocess / managed_mapped_file.hpp>
#include< boost / container / scoped_allocator.hpp>
#include< boost / interprocess / containers / vector.hpp>
#include< iostream>

#include< boost / iterator / transform_iterator.hpp>
#include< boost / range / iterator_range.hpp>

static char const * DBASE_FNAME =database.map;

namespace bip = boost :: interprocess;

名称空间shm {
使用segment_manager = bip :: managed_mapped_file :: segment_manager;
模板< typename T>使用allocator = boost :: container :: scoped_allocator_adaptor< bip :: allocator< T,segment_manager>取代;
模板< typename T>使用vector = bip :: vector< T,allocator< T>取代;
}

使用blob_t = shm :: vector< uint8_t> ;;
使用shared_blobs = shm :: vector< blob_t> ;;

struct SpecificMessage {
//用于演示目的,只是一个字符串;可以是任何序列化的东西
std :: string contents;

//普通保存/加载序列化代码:
template< typename Blob>
朋友bool save(Blob& blob,SpecificMessage const& msg){
blob.assign(msg.contents.begin(),msg.contents.end());
返回true;
}

模板< typename Blob>
朋友bool load(Blob const& blob,SpecificMessage& msg){
msg.contents.assign(blob.begin(),blob.end());
返回true;
}
};

模板< typename消息> struct LazyLoader {
using type = Message;

消息运算符()(blob_t const& blob)const {
消息结果;
if(!load(blob,result))throw std :: bad_cast(); // TODO自定义例外
返回结果;
}
};

///////
//用于演示,创建一些数据库内容
void create_database_file(){
bip :: file_mapping :: remove(DBASE_FNAME );
bip :: managed_mapped_file mmf(bip :: open_or_create,DBASE_FNAME,1ul<< 20); //甚至稀疏文件大小限制在Coliru

shared_blobs * table = mmf.find_or_construct< shared_blobs>(blob_table)(mmf.get_segment_manager());

std :: ifstream ifs(main.cpp);
std :: string line;
while(std :: getline(ifs,line)){
table-> emplace_back();
save(table-> back(),SpecificMessage {line});
}

std :: cout<< 创建的blob表由<< table-> size()<< blobs \ n;
}

///////

void display_random_messages(){
bip :: managed_mapped_file mmf(bip :: open_only,DBASE_FNAME) ;
shared_blobs * table = mmf.find_or_construct< shared_blobs>(blob_table)(mmf.get_segment_manager());

使用It = boost :: transform_iterator< LazyLoader< SpecificMessage>,shared_blobs :: const_reverse_iterator> ;;

//为了好玩,我们将blob
反转为(它首先(table-> rbegin()),last(table-> rend()); first< last ; first + = 13)
std :: cout<< blob:'<< first-> contents<< \\\
;

//任何类型的随机访问都可以,但是:
auto random = rand()%table-> size();
SpecificMessage消息;
load(table-> at(random),msg);
std :: cout<< 随机斑点#<<随机<< :'<< msg.contents<< \\\
;
}

int main()
{
#ifndef CONSUMER_ONLY
create_database_file();
#endif

srand(time(NULL));
display_random_messages();
}


I'm currently working on a small project which requires loading messages from a file. The messages are stored sequentially in the file and files can become huge, so loading the entire file content into memory is unrewarding.

Therefore we decided to implement a FileReader class that is capable of moving to specific elements in the file quickly and load them on request. Commonly used something along the following lines

SpecificMessage m;
FileReader fr;
fr.open("file.bin");
fr.moveTo(120); // Move to Message #120
fr.read(&m);    // Try deserializing as SpecificMessage 

The FileReader per se works great. Therefore we thought about adding STL compliant iterator support as well: A random access iterator that provides read-only references to specific messages. Used in the following way

for (auto iter = fr.begin<SpecificMessage>(); iter != fr.end<SpecificMessage>(); ++iter) {
  // ...
}

Remark: the above assumes that the file only contains messages of type SpecificMessage. We've been using boost::iterator_facade to simplify the implementation.

Now my question boils down to: how to implement the iterator correctly? Since FileReader does not actually hold a sequence of messages internally, but loads them on request.

What we've tried so far:

Storing the message as an iterator member

This approach stores the message in the iterator instance. Which works great for simple use-cases but fails for more complex uses. E.g. std::reverse_iterator has a dereference operation that looks like this

 reference operator*() const
 {  // return designated value
   _RanIt _Tmp = current;
   return (*--_Tmp);
 }

This breaks our approach as a reference to a message from a temporary iterator is returned.

Making the reference type equal the value type

@DDrmmr in the comments suggested making the reference type equal the value type, so that a copy of the internally stored object is returned. However, I think this is not valid for the reverse iterator which implements the -> operator as

pointer operator->() const {
  return (&**this);
}

which derefs itself, calls the *operator which then returns a copy of a temporary and finally returns the address of this temporary.

Storing the message externally

Alternatively I though about storing the message externally:

SpecificMessage m;
auto iter = fr.begin<SpecificMessage>(&m);
// ...

which also seems to be flawed for

auto iter2 = iter + 2

which will have both iter2 and iter point to the same content.

解决方案

As I hinted in my other answer, you could consider using memory mapped files. In the comment you asked:

As far as memory mapped files is concerned, this seems not what I want to have, as how would you provide an iterator over SpecificMessages for them?

Well, if your SpecificMessage is a POD type, you could just iterate over the raw memory directly. If not, you could have a deserialization helper (as you already have) and use Boost transform_iterator to do the deserialization on demand.

Note that we can make the memory mapped file managed, effectively meaning that you can just use it as a regular heap, and you can store all standard containers. This includes node-based containers (map<>, e.g.), dynamic-size containers (e.g. vector<>) in addition to the fixed-size containers (array<>) - and any combinations of those.

Here's a demo that takes a simple SpecificMessage that contains a string, and (de)derializes it directly into shared memory:

using blob_t       = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;

The part that interests you would be the consuming part:

bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());

using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;

// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
    std::cout << "blob: '" << first->contents << "'\n";

// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";

So this prints each 13th message, in reverse order, followed by a random blob.

Full Demo

The sample online uses the lines of the sources as "messages".

Live On Coliru

#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <iostream>

#include <boost/iterator/transform_iterator.hpp>
#include <boost/range/iterator_range.hpp>

static char const* DBASE_FNAME = "database.map";

namespace bip = boost::interprocess;

namespace shm {
    using segment_manager = bip::managed_mapped_file::segment_manager;
    template <typename T> using allocator = boost::container::scoped_allocator_adaptor<bip::allocator<T, segment_manager> >;
    template <typename T> using vector    = bip::vector<T, allocator<T> >;
}

using blob_t       = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;

struct SpecificMessage {
    // for demonstration purposes, just a string; could be anything serialized
    std::string contents;

    // trivial save/load serialization code:
    template <typename Blob>
    friend bool save(Blob& blob, SpecificMessage const& msg) {
        blob.assign(msg.contents.begin(), msg.contents.end());
        return true;
    }

    template <typename Blob>
    friend bool load(Blob const& blob, SpecificMessage& msg) {
        msg.contents.assign(blob.begin(), blob.end());
        return true;
    }
};

template <typename Message> struct LazyLoader {
    using type = Message;

    Message operator()(blob_t const& blob) const {
        Message result;
        if (!load(blob, result)) throw std::bad_cast(); // TODO custom excepion
        return result;
    }
};

///////
// for demo, create some database contents
void create_database_file() {
    bip::file_mapping::remove(DBASE_FNAME);
    bip::managed_mapped_file mmf(bip::open_or_create, DBASE_FNAME, 1ul<<20); // Even sparse file size is limited on Coliru

    shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());

    std::ifstream ifs("main.cpp");
    std::string line;
    while (std::getline(ifs, line)) {
        table->emplace_back();
        save(table->back(), SpecificMessage { line });
    }

    std::cout << "Created blob table consisting of " << table->size() << " blobs\n";
}

///////

void display_random_messages() {
    bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
    shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());

    using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;

    // for fun, let's reverse the blobs
    for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
        std::cout << "blob: '" << first->contents << "'\n";

    // any kind of random access is okay, though:
    auto random = rand() % table->size();
    SpecificMessage msg;
    load(table->at(random), msg);
    std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
}

int main()
{
#ifndef CONSUMER_ONLY
    create_database_file();
#endif

    srand(time(NULL));
    display_random_messages();
}

这篇关于C ++随机访问迭代器,用于按需加载元素的容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆