预期的哈希冲突次数 [英] Expected number of hash collisions
问题描述
我觉得我对这个问题想得太多了,但无论如何……
I feel like I'm way overthinking this problem, but here goes anyway...
我有一个哈希表,它的内部数组中有 M 个槽.我需要在哈希表中插入 N 个元素.假设我有一个散列函数,该函数将 am 元素随机插入一个插槽中,每个插槽的概率相等,那么散列冲突总数的预期值是多少?
I have a hash table with M slots in its internal array. I need to insert N elements into the hash table. Assuming that I have a hash function that randomly inserts am element into a slot with equal probability for each slot, what's the expected value of the total number of hash collisions?
(抱歉,这与其说是编程问题,不如说是数学问题).
(Sorry that this is more of a math question than a programming question).
这是我必须使用 Python 模拟它的一些代码.我得到了数字答案,但无法将其概括为公式并进行解释.
Here's some code I have to simulate it using Python. I'm getting numerical answers, but having trouble generalizing it to a formula and explaining it.
import random
import pdb
N = 5
M = 8
NUM_ITER = 100000
def get_collisions(table):
col = 0
for item in table:
if item > 1:
col += (item-1)
return col
def run():
table = [0 for x in range(M)]
for i in range(N):
table[int(random.random() * M)] += 1
#print table
return get_collisions(table)
# Main
total = 0
for i in range(NUM_ITER):
total += run()
print float(total)/NUM_ITER
推荐答案
你会在这里找到答案:Quora.com.m 个桶和 n 个插入的预期碰撞次数是
You'll find the answer here: Quora.com. The expected number of collisions for m buckets and n inserts is
n - m * (1 - ((m-1)/m)^n)
.
这篇关于预期的哈希冲突次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!