C ++关于boost :: unordered_map& amp; amp;提高::哈希 [英] C++ some questions on boost::unordered_map & boost::hash

查看:178
本文介绍了C ++关于boost :: unordered_map& amp; amp;提高::哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近才开始着手进行boost和它的容器,并且我在web和stackoverflow上阅读了一些文章,其中boost :: unordered_map是大集合中表现最快的容器。
所以,我有这个类的状态,它在容器中必须是唯一的(没有重复),容器中会有数百万甚至数十亿个状态。
因此,我一直在试图对其进行小尺寸和尽可能少的计算来优化它。我之前使用了boost :: ptr_vector,但是当我在stackoverflow上阅读时,只要没有那么多的对象,矢量就很好。
在我的情况中,国家描述机器人的感知信息,所以可能有很多状态,因此快速查找是最重要的。
按照unordered_map的增强文档认识到我可以做两件事来加快速度:使用hash_function,并使用相等运算符根据hash_function来比较状态。
所以,我实现了一个私有hash()函数,它接收状态信息并使用boost :: hash_combine创建一个std :: size_t散列值。
运算符==基本上比较了状态的散列值。
So:


  • 是std :: size_t足以覆盖数十亿可能的hash_function
    组合吗?为了避免重复的状态,我打算使用
    他们的hash_values。

  • 在创建state_map时,我应该使用关键状态*还是哈希
    值?
    即: boost :: unordered_map< State *,std :: size_t> state_map;

    boost :: unordered_map< std :: size_t,State *> state_map;


  • 使用boost :: unordered_map :: iterator =
    state_map.find()比通过boost :: ptr_vector和
    比较每个迭代器的关键值更快吗?

  • 最后,关于如何优化这样的一个技巧的任何提示或技巧

























    $ b $很少有答案,一个不使用boost,但C ++ 0X,另一个不使用unordered_set,但说实话,我仍然想看看boost :: unordered_set如何与散列函数一起使用。
    我跟随了boost的文档并且实现了,但是我仍然无法弄清楚如何使用有序集合的boost的哈希函数。

    你说的不是你可以做的事情来加速事情;相反,它们是您类型的强制性要求,可以作为无序映射的元素类型,也适用于无序集(您可能更想要)。


  • 您需要提供一个相等运算符来比较对象,而不是散列值。平等的关键在于区分具有相同散列的元素。

  • size_t 整数类型,x86上32位和64位64位。既然你想要数十亿元素,这意味着许多千兆字节的数据,我认为你有一个坚实的x64机器。无论如何。

  • 最重要的是你的散列函数是,即碰撞少。


  • 你需要一个集合,而不是一个地图。将对象直接放在集合中: std :: unordered_set< State> 。如果您将映射到某物,请使用地图,即指定其他地方。如果可以的话,使用C ++ 0x而不是boost。

  • 使用 hash_combine





    宝贝示例:

      struct State 
    {
    inline bool operator ==(const State&)const;
    / *东西* /
    };

    命名空间标准
    {
    模板<> struct hash< State>
    {
    inline std :: size_t operator()(const State& s)const
    {
    / *您的散列算法在这里* /
    }
    };

    $ b std :: size_t Foo(const State& s){/ *一些代码* /}

    int main()
    {
    std :: unordered_set< State>状态; //不需要额外的数据
    std :: unordered_set< State,Foo>状态; //另一个散列函数
    }


    I've only recently started dwelling into boost and it's containers, and I read a few articles on the web and on stackoverflow that a boost::unordered_map is the fastest performing container for big collections. So, I have this class State, which must be unique in the container (no duplicates) and there will be millions if not billions of states in the container. Therefore I have been trying to optimize it for small size and as few computations as possible. I was using a boost::ptr_vector before, but as I read on stackoverflow a vector is only good as long as there are not that many objects in it. In my case, the State descibes sensorimotor information from a robot, so there can be an enormous amount of states, and therefore fast lookup is of topemost priority. Following the boost documentation for unordered_map I realize that there are two things I could do to speed things up: use a hash_function, and use an equality operator to compare States based on their hash_function. So, I implemented a private hash() function which takes in State information and using boost::hash_combine, creates an std::size_t hash value. The operator== compares basically the state's hash values. So:

    • is std::size_t enough to cover billions of possible hash_function combinations ? In order to avoid duplicate states I intend to use their hash_values.

    • When creating a state_map, should I use as key the State* or the hash value ? i.e: boost::unordered_map<State*,std::size_t> state_map; Or boost::unordered_map<std::size_t,State*> state_map;

    • Are the lookup times with a boost::unordered_map::iterator = state_map.find() faster than going through a boost::ptr_vector and comparing each iterator's key value ?

    • Finally, any tips or tricks on how to optimize such an unordered map for speed and fast lookups would be greatly appreciated.

    EDIT: I have seen quite a few answers, one being not to use boost but C++0X, another not to use an unordered_set, but to be honest, I still want to see how boost::unordered_set is used with a hash function. I have followed boost's documentation and implemented, but I still cannot figure out how to use the hash function of boost with the ordered set.

    解决方案

    This is a bit muddled.

    • What you say are not "things that you can do to speed things up"; rather, they are mandatory requirements of your type to be eligible as the element type of an unordered map, and also for an unordered set (which you might rather want).

    • You need to provide an equality operator that compares objects, not hash values. The whole point of the equality is to distinguish elements with the same hash.

    • size_t is an unsigned integral type, 32 bits on x86 and 64 bits on x64. Since you want "billions of elements", which means many gigabytes of data, I assume you have a solid x64 machine anyway.

    • What's crucial is that your hash function is good, i.e. has few collisions.

    • You want a set, not a map. Put the objects directly in the set: std::unordered_set<State>. Use a map if you are mapping to something, i.e. states to something else. Oh, use C++0x, not boost, if you can.

    • Using hash_combine is good.


    Baby example:

    struct State
    {
      inline bool operator==(const State &) const;
      /* Stuff */
    };
    
    namespace std
    {
      template <> struct hash<State>
      {
        inline std::size_t operator()(const State & s) const
        {
          /* your hash algorithm here */
        }
      };
    }
    
    std::size_t Foo(const State & s) { /* some code */ }
    
    int main()
    {
      std::unordered_set<State> states; // no extra data needed
      std::unordered_set<State, Foo> states; // another hash function
    }
    

    这篇关于C ++关于boost :: unordered_map&amp; amp; amp;提高::哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆