C ++关于boost :: unordered_map& amp; amp;提高::哈希 [英] C++ some questions on boost::unordered_map & boost::hash
问题描述
我最近才开始着手进行boost和它的容器,并且我在web和stackoverflow上阅读了一些文章,其中boost :: unordered_map是大集合中表现最快的容器。
所以,我有这个类的状态,它在容器中必须是唯一的(没有重复),容器中会有数百万甚至数十亿个状态。
因此,我一直在试图对其进行小尺寸和尽可能少的计算来优化它。我之前使用了boost :: ptr_vector,但是当我在stackoverflow上阅读时,只要没有那么多的对象,矢量就很好。
在我的情况中,国家描述机器人的感知信息,所以可能有很多状态,因此快速查找是最重要的。
按照unordered_map的增强文档认识到我可以做两件事来加快速度:使用hash_function,并使用相等运算符根据hash_function来比较状态。
所以,我实现了一个私有hash()函数,它接收状态信息并使用boost :: hash_combine创建一个std :: size_t散列值。
运算符==基本上比较了状态的散列值。
So:
-
是std :: size_t足以覆盖数十亿可能的hash_function
组合吗?为了避免重复的状态,我打算使用
他们的hash_values。
-
在创建state_map时,我应该使用关键状态*还是哈希
值?
即:boost :: unordered_map< State *,std :: size_t> state_map;
或
boost :: unordered_map< std :: size_t,State *> state_map;
-
使用boost :: unordered_map :: iterator =
state_map.find()比通过boost :: ptr_vector和
比较每个迭代器的关键值更快吗? 最后,关于如何优化这样的一个技巧的任何提示或技巧 -
您需要提供一个相等运算符来比较对象,而不是散列值。平等的关键在于区分具有相同散列的元素。
-
size_t
整数类型,x86上32位和64位64位。既然你想要数十亿元素,这意味着许多千兆字节的数据,我认为你有一个坚实的x64机器。无论如何。
-
最重要的是你的散列函数是好,即碰撞少。
-
你需要一个集合,而不是一个地图。将对象直接放在集合中:
std :: unordered_set< State>
。如果您将映射到某物,请使用地图,即指定其他地方。如果可以的话,使用C ++ 0x而不是boost。 使用
$ b $很少有答案,一个不使用boost,但C ++ 0X,另一个不使用unordered_set,但说实话,我仍然想看看boost :: unordered_set如何与散列函数一起使用。
我跟随了boost的文档并且实现了,但是我仍然无法弄清楚如何使用有序集合的boost的哈希函数。 你说的不是你可以做的事情来加速事情;相反,它们是您类型的强制性要求,可以作为无序映射的元素类型,也适用于无序集(您可能更想要)。
hash_combine
。宝贝示例:
struct State
{
inline bool operator ==(const State&)const;
/ *东西* /
};
命名空间标准
{
模板<> struct hash< State>
{
inline std :: size_t operator()(const State& s)const
{
/ *您的散列算法在这里* /
}
};
$ b std :: size_t Foo(const State& s){/ *一些代码* /}
int main()
{
std :: unordered_set< State>状态; //不需要额外的数据
std :: unordered_set< State,Foo>状态; //另一个散列函数
}
I've only recently started dwelling into boost and it's containers, and I read a few articles on the web and on stackoverflow that a boost::unordered_map is the fastest performing container for big collections. So, I have this class State, which must be unique in the container (no duplicates) and there will be millions if not billions of states in the container. Therefore I have been trying to optimize it for small size and as few computations as possible. I was using a boost::ptr_vector before, but as I read on stackoverflow a vector is only good as long as there are not that many objects in it. In my case, the State descibes sensorimotor information from a robot, so there can be an enormous amount of states, and therefore fast lookup is of topemost priority. Following the boost documentation for unordered_map I realize that there are two things I could do to speed things up: use a hash_function, and use an equality operator to compare States based on their hash_function. So, I implemented a private hash() function which takes in State information and using boost::hash_combine, creates an std::size_t hash value. The operator== compares basically the state's hash values. So:
is std::size_t enough to cover billions of possible hash_function combinations ? In order to avoid duplicate states I intend to use their hash_values.
When creating a state_map, should I use as key the State* or the hash value ? i.e:
boost::unordered_map<State*,std::size_t> state_map;
Orboost::unordered_map<std::size_t,State*> state_map;
Are the lookup times with a boost::unordered_map::iterator = state_map.find() faster than going through a boost::ptr_vector and comparing each iterator's key value ?
Finally, any tips or tricks on how to optimize such an unordered map for speed and fast lookups would be greatly appreciated.
EDIT: I have seen quite a few answers, one being not to use boost but C++0X, another not to use an unordered_set, but to be honest, I still want to see how boost::unordered_set is used with a hash function. I have followed boost's documentation and implemented, but I still cannot figure out how to use the hash function of boost with the ordered set.
This is a bit muddled.
What you say are not "things that you can do to speed things up"; rather, they are mandatory requirements of your type to be eligible as the element type of an unordered map, and also for an unordered set (which you might rather want).
You need to provide an equality operator that compares objects, not hash values. The whole point of the equality is to distinguish elements with the same hash.
size_t
is an unsigned integral type, 32 bits on x86 and 64 bits on x64. Since you want "billions of elements", which means many gigabytes of data, I assume you have a solid x64 machine anyway.What's crucial is that your hash function is good, i.e. has few collisions.
You want a set, not a map. Put the objects directly in the set:
std::unordered_set<State>
. Use a map if you are mapping to something, i.e. states to something else. Oh, use C++0x, not boost, if you can.Using
hash_combine
is good.
Baby example:
struct State
{
inline bool operator==(const State &) const;
/* Stuff */
};
namespace std
{
template <> struct hash<State>
{
inline std::size_t operator()(const State & s) const
{
/* your hash algorithm here */
}
};
}
std::size_t Foo(const State & s) { /* some code */ }
int main()
{
std::unordered_set<State> states; // no extra data needed
std::unordered_set<State, Foo> states; // another hash function
}
这篇关于C ++关于boost :: unordered_map& amp; amp;提高::哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!