散列函数到无序容器的存储桶大小? [英] hash function to size of buckets for unordered containers?

查看:128
本文介绍了散列函数到无序容器的存储桶大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,要将元素放入无序集合中,我们计算其哈希并将其放入相应的存储桶中.但是,通常我们的存储桶比哈希函数的值范围少得多.存储桶与哈希值的对应关系如何计算?似乎使用了某些函数来反映(0 ... size_t)->(0 ... size_of_buckets-1).但是,即使使用良好的哈希函数,使用此类函数也可能导致大量冲突.

To put an element into, say, an unordered set we calculate its hash and put it to the corresponding bucket. However we usually have many fewer buckets than the range of values of the hash function. How is the correspondence of buckets and hash values calculated? It seems like some function is used reflecting (0 ... size_t) -> (0 ... size_of_buckets - 1). But using such a function could lead to big number of collisions even for good hash function.

推荐答案

我不确定标准中是否定义了std::unordered_map确切的行为.但是,基本原理是这样的:始终使存储桶数大于容器大小乘以一个小数(这个小数是1.0/load_factor).这样,碰撞就很少见了.

I'm not sure whether std::unordered_map exact behavior is defined in the standard. However, the basic principle is this: always keep the number of buckets larger than the size of the container multiplied by a small number (this small number is 1.0/load_factor). This way, collisions should be rare.

对于哈希表,通常有两种计算bucket_index的方法:

For a hash table, usually there are two ways to calculate bucket_index:

    选择桶的数量为2的幂:计算哈希,然后使用位操作提取其一些较低/较高的位.此方法需要一个好的"哈希函数,其中每个位都是随机的 选择
  1. 存储桶数作为质数:计算哈希,然后通过模运算计算出bucket_index.此方法不需要太好"的哈希函数
  1. number of buckets is chosen to be a power of 2: hash is calculated, then some of its lower/higher bits extracted with bit operations. This method needs a "good" hash function, where every bit is random
  2. number of buckets is chosen to be a prime number: hash is calculated, then with a modulo operation, bucket_index is calculated. This method doesn't need a "too good" hash function

对于方法1,如果哈希函数质量不好,则可能会发生很多冲突.对于方法2,即使使用质量不太好的哈希函数,通常也很少发生冲突.但是,方法1通常更快,因为位操作比mod快得多(但是有些技术可以使其更快),而且质量足够好的哈希函数通常很便宜.

For method 1., if the hash function quality is bad, you can get a lot of collisions. For method 2., even with a not-too-good quality hash function, collisions are rare usually. But, method 1. is usually faster, as bit operations are much faster than a mod (but there are techniques to make it faster), and a good-enough quality hash function is usually cheap.

这篇关于散列函数到无序容器的存储桶大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆