C ++随机访问迭代器,用于按需加载元素的容器 [英] C++ random access iterators for containers with elements loaded on demand
问题描述
我目前正在开发一个小项目,需要从文件加载消息。消息按顺序存储在文件中,文件可能变得很大,因此将整个文件内容加载到内存中是不值得的。
因此我们决定实现 FileReader
类能够快速移动到文件中的特定元素并根据请求加载它们。常用的内容如下:
SpecificMessage m;
FileReader fr;
fr.open(file.bin);
fr.moveTo(120); //转到消息#120
fr.read(& m); //尝试反序列化为SpecificMessage
FileReader本身效果很好。因此,我们考虑添加符合STL的迭代器支持:随机访问迭代器,它提供对特定消息的只读引用。按以下方式使用
for(auto iter = fr.begin< SpecificMessage>(); iter!= fr.end< SpecificMessage>(); ++ iter){
// ...
}
备注:以上假设该文件仅包含SpecialMessage类型的消息。我们一直在使用 boost :: iterator_facade
来简化实现。
现在我的问题归结为:如何正确实现迭代器?由于 FileReader
实际上并不在内部保存一系列消息,而是根据请求加载它们。
到目前为止我们尝试过:
将消息存储为迭代器成员
此方法将消息存储在迭代器实例中。哪个适用于简单的用例,但无法用于更复杂的用途。例如。 std :: reverse_iterator
有一个看起来像这样的取消引用操作
参考operator *()const
{//返回指定值
_RanIt _Tmp = current;
return(* --_ Tmp);
}
这会破坏我们的方法,因为对来自临时迭代器的消息的引用被返回。
使引用类型等于值类型
@DDrmmr注释建议使引用类型等于值类型,以便返回内部存储对象的副本。但是,我认为这对于实现 - >运算符的反向迭代器无效。
指针运算符 - >() const {
return(& ** this);
}
derefs本身,调用*运算符然后返回临时副本最后返回此临时地址。
从外部存储消息
或者我可以在外部存储消息:
SpecificMessage m;
auto iter = fr.begin< SpecificMessage>(& m);
// ... ...
这似乎也有缺陷
auto iter2 = iter + 2
将同时包含 iter2
和 iter
指向同一内容。
正如我在其他答案中暗示的那样,您可以考虑使用内存映射文件。你在评论中提到:
就内存映射文件而言,这似乎不是我想要的,你会如何为他们提供一个超过SpecificMessages的迭代器?
好吧,如果您的SpecificMessage是POD类型,你可以只是直接迭代原始内存。如果没有,您可以使用反序列化帮助程序(如您所知)并使用Boost transform_iterator
按需进行反序列化。
<请注意,我们可以使内存映射文件托管,这实际上意味着您可以将其用作常规堆,并且可以存储所有标准容器。这包括基于节点的容器(
map<>
,例如),动态大小容器(例如 vector<>
)除了固定大小的容器( array<>
) - 以及它们的任何组合。 这是一个演示,它包含一个包含字符串的简单 SpecificMessage
,并且(de)将其直接派生到共享内存中:
使用blob_t = shm :: vector< uint8_t> ;;
使用shared_blobs = shm :: vector< blob_t> ;;
您感兴趣的部分将是消费部分:
bip :: managed_mapped_file mmf(bip :: open_only,DBASE_FNAME);
shared_blobs * table = mmf.find_or_construct< shared_blobs>(blob_table)(mmf.get_segment_manager());
使用It = boost :: transform_iterator< LazyLoader< SpecificMessage>,shared_blobs :: const_reverse_iterator> ;;
//为了好玩,我们将blob
反转为(它首先(table-> rbegin()),last(table-> rend()); first< last ; first + = 13)
std :: cout<< blob:'<< first-> contents<< \\\
;
//任何类型的随机访问都可以,但是:
auto random = rand()%table-> size();
SpecificMessage消息;
load(table-> at(random),msg);
std :: cout<< 随机斑点#<<随机<< :'<< msg.contents<< \\\
;
因此,这会以相反的顺序打印每条第13条消息,然后是随机blob。
完整演示
在线样本使用来源行作为消息。
#include< boost / interprocess / file_mapping.hpp>
#include< boost / interprocess / managed_mapped_file.hpp>
#include< boost / container / scoped_allocator.hpp>
#include< boost / interprocess / containers / vector.hpp>
#include< iostream>
#include< boost / iterator / transform_iterator.hpp>
#include< boost / range / iterator_range.hpp>
static char const * DBASE_FNAME =database.map;
namespace bip = boost :: interprocess;
名称空间shm {
使用segment_manager = bip :: managed_mapped_file :: segment_manager;
模板< typename T>使用allocator = boost :: container :: scoped_allocator_adaptor< bip :: allocator< T,segment_manager>取代;
模板< typename T>使用vector = bip :: vector< T,allocator< T>取代;
}
使用blob_t = shm :: vector< uint8_t> ;;
使用shared_blobs = shm :: vector< blob_t> ;;
struct SpecificMessage {
//用于演示目的,只是一个字符串;可以是任何序列化的东西
std :: string contents;
//普通保存/加载序列化代码:
template< typename Blob>
朋友bool save(Blob& blob,SpecificMessage const& msg){
blob.assign(msg.contents.begin(),msg.contents.end());
返回true;
}
模板< typename Blob>
朋友bool load(Blob const& blob,SpecificMessage& msg){
msg.contents.assign(blob.begin(),blob.end());
返回true;
}
};
模板< typename消息> struct LazyLoader {
using type = Message;
消息运算符()(blob_t const& blob)const {
消息结果;
if(!load(blob,result))throw std :: bad_cast(); // TODO自定义例外
返回结果;
}
};
///////
//用于演示,创建一些数据库内容
void create_database_file(){
bip :: file_mapping :: remove(DBASE_FNAME );
bip :: managed_mapped_file mmf(bip :: open_or_create,DBASE_FNAME,1ul<< 20); //甚至稀疏文件大小限制在Coliru
shared_blobs * table = mmf.find_or_construct< shared_blobs>(blob_table)(mmf.get_segment_manager());
std :: ifstream ifs(main.cpp);
std :: string line;
while(std :: getline(ifs,line)){
table-> emplace_back();
save(table-> back(),SpecificMessage {line});
}
std :: cout<< 创建的blob表由<< table-> size()<< blobs \ n;
}
///////
void display_random_messages(){
bip :: managed_mapped_file mmf(bip :: open_only,DBASE_FNAME) ;
shared_blobs * table = mmf.find_or_construct< shared_blobs>(blob_table)(mmf.get_segment_manager());
使用It = boost :: transform_iterator< LazyLoader< SpecificMessage>,shared_blobs :: const_reverse_iterator> ;;
//为了好玩,我们将blob
反转为(它首先(table-> rbegin()),last(table-> rend()); first< last ; first + = 13)
std :: cout<< blob:'<< first-> contents<< \\\
;
//任何类型的随机访问都可以,但是:
auto random = rand()%table-> size();
SpecificMessage消息;
load(table-> at(random),msg);
std :: cout<< 随机斑点#<<随机<< :'<< msg.contents<< \\\
;
}
int main()
{
#ifndef CONSUMER_ONLY
create_database_file();
#endif
srand(time(NULL));
display_random_messages();
}
I'm currently working on a small project which requires loading messages from a file. The messages are stored sequentially in the file and files can become huge, so loading the entire file content into memory is unrewarding.
Therefore we decided to implement a FileReader
class that is capable of moving to specific elements in the file quickly and load them on request. Commonly used something along the following lines
SpecificMessage m;
FileReader fr;
fr.open("file.bin");
fr.moveTo(120); // Move to Message #120
fr.read(&m); // Try deserializing as SpecificMessage
The FileReader per se works great. Therefore we thought about adding STL compliant iterator support as well: A random access iterator that provides read-only references to specific messages. Used in the following way
for (auto iter = fr.begin<SpecificMessage>(); iter != fr.end<SpecificMessage>(); ++iter) {
// ...
}
Remark: the above assumes that the file only contains messages of type SpecificMessage. We've been using boost::iterator_facade
to simplify the implementation.
Now my question boils down to: how to implement the iterator correctly? Since FileReader
does not actually hold a sequence of messages internally, but loads them on request.
What we've tried so far:
Storing the message as an iterator member
This approach stores the message in the iterator instance. Which works great for simple use-cases but fails for more complex uses. E.g. std::reverse_iterator
has a dereference operation that looks like this
reference operator*() const
{ // return designated value
_RanIt _Tmp = current;
return (*--_Tmp);
}
This breaks our approach as a reference to a message from a temporary iterator is returned.
Making the reference type equal the value type
@DDrmmr in the comments suggested making the reference type equal the value type, so that a copy of the internally stored object is returned. However, I think this is not valid for the reverse iterator which implements the -> operator as
pointer operator->() const {
return (&**this);
}
which derefs itself, calls the *operator which then returns a copy of a temporary and finally returns the address of this temporary.
Storing the message externally
Alternatively I though about storing the message externally:
SpecificMessage m;
auto iter = fr.begin<SpecificMessage>(&m);
// ...
which also seems to be flawed for
auto iter2 = iter + 2
which will have both iter2
and iter
point to the same content.
As I hinted in my other answer, you could consider using memory mapped files. In the comment you asked:
As far as memory mapped files is concerned, this seems not what I want to have, as how would you provide an iterator over SpecificMessages for them?
Well, if your SpecificMessage is a POD type, you could just iterate over the raw memory directly. If not, you could have a deserialization helper (as you already have) and use Boost transform_iterator
to do the deserialization on demand.
Note that we can make the memory mapped file managed, effectively meaning that you can just use it as a regular heap, and you can store all standard containers. This includes node-based containers (map<>
, e.g.), dynamic-size containers (e.g. vector<>
) in addition to the fixed-size containers (array<>
) - and any combinations of those.
Here's a demo that takes a simple SpecificMessage
that contains a string, and (de)derializes it directly into shared memory:
using blob_t = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;
The part that interests you would be the consuming part:
bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
std::cout << "blob: '" << first->contents << "'\n";
// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
So this prints each 13th message, in reverse order, followed by a random blob.
Full Demo
The sample online uses the lines of the sources as "messages".
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <iostream>
#include <boost/iterator/transform_iterator.hpp>
#include <boost/range/iterator_range.hpp>
static char const* DBASE_FNAME = "database.map";
namespace bip = boost::interprocess;
namespace shm {
using segment_manager = bip::managed_mapped_file::segment_manager;
template <typename T> using allocator = boost::container::scoped_allocator_adaptor<bip::allocator<T, segment_manager> >;
template <typename T> using vector = bip::vector<T, allocator<T> >;
}
using blob_t = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;
struct SpecificMessage {
// for demonstration purposes, just a string; could be anything serialized
std::string contents;
// trivial save/load serialization code:
template <typename Blob>
friend bool save(Blob& blob, SpecificMessage const& msg) {
blob.assign(msg.contents.begin(), msg.contents.end());
return true;
}
template <typename Blob>
friend bool load(Blob const& blob, SpecificMessage& msg) {
msg.contents.assign(blob.begin(), blob.end());
return true;
}
};
template <typename Message> struct LazyLoader {
using type = Message;
Message operator()(blob_t const& blob) const {
Message result;
if (!load(blob, result)) throw std::bad_cast(); // TODO custom excepion
return result;
}
};
///////
// for demo, create some database contents
void create_database_file() {
bip::file_mapping::remove(DBASE_FNAME);
bip::managed_mapped_file mmf(bip::open_or_create, DBASE_FNAME, 1ul<<20); // Even sparse file size is limited on Coliru
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
std::ifstream ifs("main.cpp");
std::string line;
while (std::getline(ifs, line)) {
table->emplace_back();
save(table->back(), SpecificMessage { line });
}
std::cout << "Created blob table consisting of " << table->size() << " blobs\n";
}
///////
void display_random_messages() {
bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
std::cout << "blob: '" << first->contents << "'\n";
// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
}
int main()
{
#ifndef CONSUMER_ONLY
create_database_file();
#endif
srand(time(NULL));
display_random_messages();
}
这篇关于C ++随机访问迭代器,用于按需加载元素的容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!