期望的散列冲突数量 [英] Expected number of hash collisions

查看:146
本文介绍了期望的散列冲突数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我觉得我正在解决这个问题,但是无论如何...



我有一个在其内部阵列中有M个槽的散列表。我需要在散列表中插入N个元素。假设我有一个散列函数,它可以随机地将am元素插入到每个插槽的概率相等的时隙中,那么散列冲突总数的期望值是多少?

(对不起,这是一个数学问题,而不是编程问题)。

编辑:
下面是我必须使用Python模拟它的一些代码。我收到数字答案,但无法将其推广到公式并解释它。

 进口随机数
进口数pdb

N = 5
M = 8

NUM_ITER = 100000

def get_collisions(表):
col = 0
表中的商品:
if item> 1:
col + =(item-1)
return col

def run():
table = [0表示范围内的x(M)] $ (N):
table [int(random.random()* M)] + = 1

#print table
b
$ b返回get_collisions(表)

#主要
$ b总计= 0
(范围内的NUM_ITER):
总计+ =运行()

print float(total)/ NUM_ITER


解决方案

你会在这里找到答案: Quora.com 。对于 m 桶和 n 插入的预期碰撞次数为

n - m *(1 - ((m-1)/ m)^ n)


I feel like I'm way overthinking this problem, but here goes anyway...

I have a hash table with M slots in its internal array. I need to insert N elements into the hash table. Assuming that I have a hash function that randomly inserts am element into a slot with equal probability for each slot, what's the expected value of the total number of hash collisions?

(Sorry that this is more of a math question than a programming question).

Edit: Here's some code I have to simulate it using Python. I'm getting numerical answers, but having trouble generalizing it to a formula and explaining it.

import random
import pdb

N = 5
M = 8

NUM_ITER = 100000

def get_collisions(table):
    col = 0
    for item in table:
        if item > 1:
            col += (item-1)
    return col

def run():
    table = [0 for x in range(M)]

    for i in range(N):
        table[int(random.random() * M)] += 1

    #print table
    return get_collisions(table)

# Main

total = 0
for i in range(NUM_ITER):
    total += run()

print float(total)/NUM_ITER

解决方案

You'll find the answer here: Quora.com. The expected number of collisions for m buckets and n inserts is

n - m * (1 - ((m-1)/m)^n).

这篇关于期望的散列冲突数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆