电话号码的散列函数 [英] Hash function for phone numbers

查看:233
本文介绍了电话号码的散列函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  6948060987 

我正在构建一个哈希表,其中关键是一个电话号码(这里是其中的一部分):
6960780800
6963208768
6944870406
6947279288
6953691771
6956094283
6947092062
6960086297
6947719197
6951516975
6957531584
6969211184
6963238579
6957054322
6952077216
6956907738

条目数量为200,2000,20000和2000000,条目将是唯一的。



关于表格的大小,我遵循这个答案。



我将电话号码存储为 char 的数组。我注意到所有数字都以69开头,所以我可以在哈希函数中跳过它们。



我试图取数字的总和,并用哈希表中的单元格数量,但它似乎(在纸上)这是一个坏函数,因为有很多冲突。



我应该如何修改我的哈希函数来获得更好的结果(少碰撞)?

解决方案

为什么你需要一个非标准的哈希函数? / p>

有很多经过良好测试并具有已知属性的散列函数,这些属性对于任何输入都可以很好地工作,因此对于电话号码也很适用,这些电话号码毕竟是ASCII字符串的子集。您的应用程序是否如此重要,以至于您需要设计自己的散列函数,并冒更多冲突的风险?如果没有,为什么不使用众所周知的哈希函数之一?例如,如果您需要具有加密可证明的碰撞阻力的东西,请使用SHA-256(如果您想)。如果您不担心对手,请使用通用哈希。除非你的问题非常专业,否则使用别人经过良好测试的散列算法会比试图自己创造一个更好。

更容易的散列是使用的原始散列perl ,其工作方式如下:

 #返回字符串的哈希值:$ hash = perlhash(key)
#(由hv中的PERL_HASH宏定义。 h)
sub perhahash
{
$ hash = 0;
foreach(split //,shift){
$ hash = $ hash * 33 + ord($ _);
}
返回$ hash;





$ b在英语中,它取当前散列值乘以33,下一个字符的ASCII值。这不是一个很好的散列,但它为perl工作了很长时间。


I am building a hash table, where the key is a phone number (here are some of them):

6948060987
6960780800
6963208768
6944870406
6947279288
6953691771
6956094283
6947092062
6960086297
6947719197
6951516975
6957531584
6969211184
6963238579
6957054322
6952077216
6956907738

The number of entries will be 200, 2000, 20000 and 2000000 and the entries will be unique.

About the size of the table, I am following this answer.

I store the phone number as an array of char's. I noticed that all the numbers begin with 69, so I can skip them in the hash function.

My attempt is to take the sum of the digits and do a modulo with the number of cells in the hash table, but it seems (on paper) that this is a bad function, since there are many collisions.

How should I modify my hash function to get better results (less collisions)?

解决方案

Why do you need to a non-standard hash function at all?

There are plenty of hash functions which are well tested and have known properties which will work fine for any input, thus will also work well for phone numbers, which are after all a subset of ASCII strings. Is your application so time critical that you need to design your own hash function and risk something with more collisions? If not, why not use one of the well known hash functions?

For instance, if you need something with cryptographically demonstrable collision resistance, use SHA-256 (truncated if you want). If you are not worried about an adversary, use something like universal hashing. Unless your problem is very specialised, you will be better off using someone else's well tested hash algorithm than trying to invent one yourself.

An even easier hash is the original hash perl used, which worked as follows:

# Return the hashed value of a string: $hash = perlhash("key")
# (Defined by the PERL_HASH macro in hv.h)
sub perlhash
{
    $hash = 0;
    foreach (split //, shift) {
          $hash = $hash*33 + ord($_);
    }
    return $hash;
}

In English, it takes the current hash value, multiplies by 33, and adds the ASCII value of the next character on. It's not a great hash, but it worked for perl for a long while.

这篇关于电话号码的散列函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆