在rabin-karp滚动哈希中选择基数和模质数 [英] Choosing radix and modulus prime in rabin-karp rolling hash

查看:129
本文介绍了在rabin-karp滚动哈希中选择基数和模质数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

维基百科

它说:"a和n的选择对于获得良好的哈希值至关重要;"并指的是与之无关的线性同余生成器文章.我不知道如何选择值.有什么建议吗?

It says, "The choice of a and n is critical to get good hashing;" and refers to a Linear congruential generator article that doesn't feel relevant. I cant figure out how the values are chosen. Any suggestions?

推荐答案

此算法的基础是,度数最多为 d 的非零多项式最多为 d 零.每个长度 k 字符串都有自己的关联多项式度数 k -1,我们通过减去相关字符串的多项式并在 a .如果字符串相等,则结果始终为零.如果字符串不相等,则当且仅当 a 是多项式差的零之一时,结果为零(这是将素数要求置于 n ,因为整数mod n 否则不是字段).

The basis of this algorithm is that a nonzero polynomial of degree at most d has at most d zeros. Each length-k string has its own associated polynomial of degree k - 1, and we screen for possible matches by subtracting the polynomials of the strings in question and evaluating at a. If the strings are equal, then the result is always zero. If the strings are not equal, then the result is zero if and only if a is one of the zeros of the polynomial difference (this is the fact that puts the primality requirement on n, as the integers mod n otherwise would not be a field).

从理论上讲,至少我们希望 a 是随机的,这样一个遗忘的对手就不能产生任何频率的误报.如果我们不希望遇到麻烦,那么最好选择 a ,这样与 a 的乘积就很便宜(例如, a 的二进制扩展em>具有少量的一位).不过,某些选择对典型的字符串集不利(例如 a = 1).我们希望 n 足够大,可以通过随机机会避免误报(概率( k -1)/ n ),但又要足够小,最好是的特殊形式,以便模运算高效.

In theory, at least, we want a to be random so that an oblivious adversary cannot create false positives with any frequency. If we don't expect trouble, then it might be better to choose a so that multiplication by a is cheap (e.g., the binary expansion of a has a small number of one bits). Nevertheless, some choices are bad on typical string sets (e.g., a = 1). We want n to be large enough to avoid false positives (probability (k - 1)/n) by random chance but small enough and preferably of a special form so that the modulo computations are efficient.

这篇关于在rabin-karp滚动哈希中选择基数和模质数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆