使用智能指针在具有大量对象集合中创建多个索引 [英] Create multiple indexes into a large collection of objects with smart pointers

查看:142
本文介绍了使用智能指针在具有大量对象集合中创建多个索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个大的对象集合中创建多个索引(即,使用不同的键)。对象可以改变,集合可以缩小和增长。我的想法到目前为止:



保持对对象的某种指针的多个集合。
使用set而不是map来更好地封装。
使用unordered_set可以很好地扩展大型数据集。
理想情况下,指针应该是某种形式的智能指针。



我可以使用unique_ptrs的主集合来开始,它管理所有的分配,使用原始指针的二级索引(我将暂时忽略支持函数,但请注意,索引是多重集合,因为它的键在集合中不是唯一的):

  typedef boost :: unordered_set< boost :: unique_ptr< MyObject>,myobject_hash,myobjects_equal> MyObjects; 
typedef boost :: unordered_multiset< const MyObject *,myobject_index2_hash,myobject_index2_equal> MyObjectsIndex2;

用法很简单:

  MyObjects my_objects; 
MyObjectsIndex2 my_objects_index2;

auto it_mo = my_objects.insert(
boost :: unique_ptr< MyObject>(
new MyObject(...)

);
const MyObject * p_mo = it_mo.first-> get();
my_objects_index2.insert(p_mo);

我考虑加入额外的努力来替换索引的原始指针使用const引用的unique_ptrs的主集合。我不知道我可以,虽然,至少不容易。我想我会问,是否有其他人已经走过这条路线,或者有其他建议。



UPDATE



到目前为止的经验教训:


  1. 资料储存库类别非常酷

  2. reference_wrappers

  3. 具有键对象的xx_set数据存储器成员var比xx_map更为节省空间。但是你不能轻易地使用unique_ptr作为c ++ 11中的键。 c ++ 14显然可以通过 std :: set< Key> :: find 更好地设置功能。有关详情,请参见此处。因此,现在,管理原始分配的数据存储区似乎比尝试强制使用unique_ptr作为设置密钥或使用地图增加keyspace存储空间更有意义。

  4. 记住强制键值对于对象的生命周期为const(使用构造函数中提供的const值)


解决方案

p>我认识到你的用例可能不同于我为我的例子设计的,没有更多的细节,我将无法使一个匹配(我还认为,如果你有很多细节,

  #include< iostream> 
#include< map>
#include< set>
#include< memory>
#include< stdexcept>

using namespace std;

class Thing
{
public:
Thing()= default;
Thing(const Thing& other)= default;
Thing(int i,string p,string d):id(i),desc(d),part(p){}

int id;
string desc;
string part;
};

ostream& operator<<<(ostream& out,const Thing& t)
{
if(& t == NULL)out< (空值); //不判断我
else out<< t.id<< :<< t.part<< (<< t.desc<<);
}

class Datastore
{
public:
Datastore()= default;
shared_ptr< const Thing> Add(const Thing& t)
{
if(!(index_bydesc.find(t.desc)== index_bydesc.end()&&
index_bypart.find部分)== index_bypart.end()&&
index_byid.find(t.id)== index_byid.end()))
throw runtime_error(Non-unique insert);
shared_ptr< const Thing> newt = make_shared< const Thing>(t);
weak_ptr< const Thing> weak = weak_ptr< const Thing>(newt);
index_bydesc [newt-> desc] = weak;
index_bypart [newt-> part] = weak;
index_byid [newt-> id] = weak;
store.insert(newt);
return newt;
}

void remove(const Thing& t)
{
shared_ptr< const Thing> p = FindBy_Desc(t.desc);
store.erase(p);
index_bydesc.erase(p-> desc);
index_bypart.erase(p-> part);
index_byid.erase(p-> id);
}

shared_ptr< const Thing> FindBy_Desc(string desc)
{
map< string,weak_ptr< const Thing> > :: iterator iter = index_bydesc.find(desc);
if(iter == index_bydesc.end())return shared_ptr< const Thing>();
return iter-> second.lock();
}

//缺少零件和数量的索引访问器

private:
std :: set< shared_ptr< const Thing> >商店;

std :: map< string,weak_ptr< const Thing> > index_bydesc;
std :: map< string,weak_ptr< const Thing> > index_bypart;
std :: map< int,weak_ptr< const Thing> > index_byid;
};

int main(){
Datastore d;
d.Add(Thing(1,TRNS-A,Automatic transmission));
d.Add(Thing(2,SPKPLG,Spark plugs));
d.Add(Thing(3,HOSE-S,Small hoses));
d.Add(Thing(4,HOSE-L,Large hoses));
d.Add(Thing(5,BATT-P,Primary battery(14.5v nominal)));
d.Add(Thing(6,BATT-S,二次电池(1.5v标称)));
d.Add(Thing(7,CRKSFT,Crank shaft));
d.Add(Thing(8,REAC-F,Fusion reactor power source));

cout<< * d.FindBy_Desc(Crank shaft)<< endl;
d.Remove(* d.FindBy_Desc(Crank shaft));
cout<< * d.FindBy_Desc(Crank shaft)<< endl;
return 0;
}



缺点:




  1. 存储结构是只读的。这是一个必要的缺点,因为如果您修改对象在数据存储区中的索引字段,索引将过时。要修改对象,请将其删除,然后重新添加其他对象。

  2. 所有字段必须是唯一的。这很容易改变,但你需要保持包含 list< Thing> 的地图作为非唯一字段的索引,而不只是包含 Thing

  3. 与使用 std :: map 相关的性能问题。 std :: unordered_map 是巨大数据结构的访问时间更好(固定摊销)的替代方法(与 std :: unordered_set )。



偏差:



  • 为了解决与引用计数相关的性能问题,我们建议您使用地图比设置更好。

  • < ,如果你小心地保持内部一致性在任何时候,你可以放弃所有的智能指针为原始的,并通过引用返回值,你可以通过使用不安全的对象所有权语义填充时进一步改进(即,传递指向数据存储所拥有的堆对象的指针)。更复杂,但最终副本更少,运行时开销更少。


    I am creating multiple indexes (ie, that use different keys) into a large collection of objects. The objects can change, and the collection can shrink and grow. My thoughts so far:

    Keep multiple collections of some kind of pointers to the objects. Use set instead of map for better encapsulation. Use unordered_set to scale well with large data sets. The pointers should ideally all be some form of smart pointer.

    I can start off fairly easily with a master collection of unique_ptrs, which manage all allocations, and secondary indexes that use "raw" pointers (I'll leave out the supporting functions for the moment, but note that the index is a multiset as its key won't be unique across the set):

    typedef boost::unordered_set< boost::unique_ptr<MyObject>,myobject_hash,myobjects_equal > MyObjects;
    typedef boost::unordered_multiset<const MyObject*,myobject_index2_hash,myobject_index2_equal > MyObjectsIndex2;
    

    Usage is straightforward:

    MyObjects my_objects;
    MyObjectsIndex2 my_objects_index2;
    
    auto it_mo = my_objects.insert(
        boost::unique_ptr<MyObject>(
            new MyObject(...)
        )
    );
    const MyObject* p_mo = it_mo.first->get();
    my_objects_index2.insert(p_mo);
    

    I am considering putting in extra effort to replace the index's use of raw pointers with const references to the unique_ptrs of the master collection. I'm not sure I can though, at least not easily. I thought I would ask if anyone else had gone that route already, or had alternative suggestions.

    UPDATE

    Lessons learned so far:

    1. Datastore classes are cool
    2. reference_wrappers are cool
    3. A xx_set with a "key" object datastore member var is more space-efficient than xx_map. BUT... you can't easily use a unique_ptr as a key in c++11. c++14 apparently may have better set functionality via std::set<Key>::find. See here for more details. So for now, a datastore that manages raw allocations seems to make more sense here than trying to force use of a unique_ptr as a set key, or increasing keyspace storage with maps.
    4. Remember to force key values to be const for the life of the object (use const values provided in constructor)

    解决方案

    I recognize that your use case is probably different from the one I contrived for my example, and without a lot more details I won't be able to make one that matches (I also think that if you had a lot of details you would be able to find a solution yourself).

    #include <iostream>
    #include <map>
    #include <set>
    #include <memory>
    #include <stdexcept>
    
    using namespace std;
    
    class Thing
    {
    public:
        Thing() = default;
        Thing(const Thing &other) = default;
        Thing(int i, string p, string d) : id(i), desc(d), part(p) {}
    
        int    id;
        string desc;
        string part;
    };
    
    ostream &operator<<(ostream &out, const Thing &t)
    {
        if (&t == NULL) out << "(NULL)"; // don't judge me
        else out << t.id << ": " << t.part << " (" << t.desc << ")";
    }
    
    class Datastore
    {
    public:
        Datastore() = default;
        shared_ptr<const Thing> Add(const Thing &t)
        {
            if (!(index_bydesc.find(t.desc) == index_bydesc.end() &&
                  index_bypart.find(t.part) == index_bypart.end() &&
                  index_byid.find(t.id) == index_byid.end()))
                throw runtime_error("Non-unique insert");
            shared_ptr<const Thing> newt = make_shared<const Thing>(t);
            weak_ptr<const Thing> weak = weak_ptr<const Thing>(newt);
            index_bydesc[newt->desc] = weak;
            index_bypart[newt->part] = weak;
            index_byid[newt->id] = weak;
            store.insert(newt);
            return newt;
        }
    
        void Remove(const Thing &t)
        {
            shared_ptr<const Thing> p = FindBy_Desc(t.desc);
            store.erase(p);
            index_bydesc.erase(p->desc);
            index_bypart.erase(p->part);
            index_byid.erase(p->id);
        }
    
        shared_ptr<const Thing> FindBy_Desc(string desc)
        {
            map<string, weak_ptr<const Thing> >::iterator iter = index_bydesc.find(desc);
            if (iter == index_bydesc.end()) return shared_ptr<const Thing>();
            return iter->second.lock();
        }
    
        // index accessors for part and quantity omitted
    
    private:
        std::set<shared_ptr<const Thing> > store;
    
        std::map<string, weak_ptr<const Thing> > index_bydesc;
        std::map<string, weak_ptr<const Thing> > index_bypart;
        std::map<int, weak_ptr<const Thing> > index_byid;
    };
    
    int main() {
        Datastore d;
        d.Add(Thing(1, "TRNS-A", "Automatic transmission"));
        d.Add(Thing(2, "SPKPLG", "Spark plugs"));
        d.Add(Thing(3, "HOSE-S", "Small hoses"));
        d.Add(Thing(4, "HOSE-L", "Large hoses"));
        d.Add(Thing(5, "BATT-P", "Primary battery (14.5v nominal)"));
        d.Add(Thing(6, "BATT-S", "Secondary batteries (1.5v nominal)"));
        d.Add(Thing(7, "CRKSFT", "Crank shaft"));
        d.Add(Thing(8, "REAC-F", "Fusion reactor power source"));
    
        cout << *d.FindBy_Desc("Crank shaft") << endl;
        d.Remove(*d.FindBy_Desc("Crank shaft"));
        cout << *d.FindBy_Desc("Crank shaft") << endl;
        return 0;
    }
    

    Shortcomings:

    1. Storage structure is read-only. This is a necessary shortcoming, because the index will become outdated if you modify the indexed fields of an object while it's in the datastore. To modify an object, remove it and then re-add another one.
    2. All fields must be unique. This is easily changeable, but you need to keep maps containing list<Thing> as indices for non-unique fields, not just maps containing Thing.
    3. Performance problems relating to using std::map. std::unordered_map is an alternative with better (constant amortized) access times for huge data structures (the same as std::unordered_set).

    Deviations:

    1. Given that you have a clear key-value relationship here, I think you'd be better off with a map than a set.
    2. In order to solve performance problems relating to reference counting, if you are careful to maintain the internal consistency at all times, you could forgo all smart pointers for raw ones, and return values via references, and you could achieve further improvements by using unsafe object ownership semantics when filling it (i.e., pass it pointers to heap objects that the Datastore then takes ownership of). More complex, but ultimately fewer copies and less runtime overhead.

    这篇关于使用智能指针在具有大量对象集合中创建多个索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆