为C ++散列表提供一个好的散列函数? [英] Have a good hash function for a C++ hash table?

查看:121
本文介绍了为C ++散列表提供一个好的散列函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在C ++中为将要编码的散列表实现一个面向性能的散列函数实现。我已经环顾四周,只发现一些问题,询问什么是一般的散列函数。我已经考虑过CRC32(但是在哪里可以找到很好的实现?)和一些加密算法。然而,我的表格有非常明确的要求。



以下是表格的样子:

  100,000个最大值
200,000个容量(所以负载为0.5)
散列一个6个字符的字符串,它是英语句子的一部分
例子:变成和他,而不是

我的散列的首要优先权表是快速搜索(检索)。快速插入并不重要,但它会伴随着快速搜索。删除并不重要,重新哈希不是我将要研究的内容。为了处理冲突,我可能会使用单独的链接,如这里。我已经看过这篇文章,但希望那些已经处理过此类问题的人的意见任务前。

解决方案

现在假设你想要一个哈希,并想要超快速在你的情况下,因为你的字符串只有6个字符长,你可以使用这个魔术:

  size_t precision = 2; //用这个
size_t hash(const char * str)
{
return(*(size_t *)str)>>来更改精度。精确;
}

CRC是用于slowpokes的;)

解释:
通过将字符串指针的内容强制转换为size_t(基于硬件的最佳匹配的int32或int64)的看起来像来工作。所以这个字符串的内容被解释为一个原始数字,不再担心字符了,然后你将这个精度转换为所需的精度(你把这个数字调整到最佳性能,我发现它适用于散列字符串一套几千)。


另外,真正整洁的部分是现代硬件上的任何体面的编译器都会在1个汇编指令中散列这样的字符串,很难打败它;)

I am in need of a performance-oriented hash function implementation in C++ for a hash table that I will be coding. I looked around already and only found questions asking what's a good hash function "in general". I've considered CRC32 (but where to find good implementation?) and a few cryptography algorithms. My table, though, has very specific requirements.

Here's what the table will be like:

100,000 items max
200,000 capacity (so the load is 0.5)
hashing a 6-character string which is a part of English sentence
     examples: "become"    "and he"    ", not "

The number one priority of my hash table is quick search (retrieval). Quick insertion is not important, but it will come along with quick search. Deletion is not important, and re-hashing is not something I'll be looking into. To handle collisions, I'll be probably using separate chaining as described here. I have already looked at this article, but would like an opinion of those who have handled such task before.

解决方案

Now assumming you want a hash, and want something blazing fast that would work in your case, because your strings are just 6 chars long you could use this magic:

size_t precision = 2; //change the precision with this
size_t hash(const char* str)
{
   return (*(size_t*)str)>> precision;
}

CRC is for slowpokes ;)

Explanation: This works by casting the contents of the string pointer to "look like" a size_t (int32 or int64 based on the optimal match for your hardware). So the contents of the string are interpreted as a raw number, no worries about characters anymore, and you then bit-shift this the precision needed (you tweak this number to the best performance, I've found 2 works well for hashing strings in set of a few thousands).

Also the really neat part is any decent compiler on modern hardware will hash a string like this in 1 assembly instruction, hard to beat that ;)

这篇关于为C ++散列表提供一个好的散列函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆