哈希算法问题 [英] Hash algorithm question

查看:97
本文介绍了哈希算法问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了清楚起见,我还应该添加保证字符串是唯一的.我使用经典的djb2哈希函数-请参见
http://www.cse.yorku.ca/~oz/hash.html

I should add for clarity that strings are guaranteed to be unique.I use a classic djb2 hash function - see
http://www.cse.yorku.ca/~oz/hash.html

UINT32 gHashCh(BYTE *str)
{
	UINT32 hash = 5381;
	int c;

	while (c = *str++)
	{
		hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
	}

	return hash;
}


问题#1.如果对多个字符串进行计算,我如何计算(数学上)哈希值不唯一的概率?我知道最坏情况下的长度(16个字符)和最大可能的字符串数(比如说1000)
问题2.有人可以给我一个示例,用两个不同的字符串生成上面给出的算法相同的哈希值.


Question #1. How do I calculate (mathematically) the probability that the hash will NOT be unique if calculated on a number of strings? I know the worst case length (16 charactes) and the maximum possible number of strings (lets say, 1000)
Question #2. Can someone give me an example of 2 different strings that produce an identical hash given algorithm above.

推荐答案

A1. 理想"哈希将在哈希空间(2 ^ 32)上均匀随机地分布字符串.如果您有1000个字符串,那么在这种情况下发生冲突的可能性大约为1 - exp[(- 1000 * 999)/(2 * 2^32)],这很小(大约1.16e-4),但在您的情况下可能微不足道.根据输入空间的不同,您选择的哈希可能会达到同样的效果,或者更差,甚至更差.

A2.您可以很快完成此操作.从下面的参考中,如果您选择了超过77000个字符串,则发现冲突的机率要好于50%.编写代码将花费您几分钟,并且可能不到一秒钟的时间即可找到冲突.

[ref] http://en.wikipedia.org/wiki/Birthday_paradox [
A1. An "ideal" hash will spread the strings uniform-randomly over the hash space (2^32). If you have 1000 strings, then your probability of collision in that case is approximately 1 - exp[(- 1000 * 999)/(2 * 2^32)] which is small (about 1.16e-4), but may not be negligible in your case. Your chosen hash may perform as well as that, or a bit worse, or a lot worse, depending on your input space.

A2. You can do this pretty quickly. From the reference below, if you pick a bit over 77000 strings, you will have a better than 50% probability of finding a collision. It''ll take you a few minutes to write the code, and probably less than a second for it to find a collision.

[ref]http://en.wikipedia.org/wiki/Birthday_paradox[^]

Cheers,
Peter

[edit]inserted missing left bracket in formula[/edit]


这篇关于哈希算法问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆