是普遍的哈希函数系列,只是为了防止敌方的攻击? [英] Is Universal family of hash functions only to prevent enemy attack?

查看:161
本文介绍了是普遍的哈希函数系列,只是为了防止敌方的攻击?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我的意图只是有一个很好的散列函数,将数据均匀地传播到所有的桶中,那么我不需要提出一系列的哈希函数,我只能用一个好的哈希函数来做,是否正确?

If my intention is only to have a good hash function that spreads data evenly into all of the buckets, then I need not come up with a family of hash functions, I could just do with one good hash function, is that correct?

拥有一系列哈希函数的目的只是为了使敌人更难以构建病理数据集,就像我们随机选择哈希函数一样,他/她没有关于使用哪个散列函数的信息。我的理解是否正确?

The purpose of having a family of hash functions is only to make it harder for the enemy to build a pathological data set as when we pick a hash function randomly, he/she has no information about which hash function is employed. Is my understanding right?

编辑:
由于有人试图关闭不清楚;这个问题是要知道使用Universal系列哈希函数的真正目的。

Since someone is trying to close as unclear; This question is to know the real purpose of employing a Universal family of hash functions.

推荐答案


I可以用一个好的哈希函数来做,这是否正确?

I could just do with one good hash function, is that correct?

正如你稍后在你的问题中所述,一个知道的敌人您正在使用的哈希函数可以准备一个病理数据集。

As you note later in your question, an "enemy" who knows which hash function you're using could prepare a pathological data set.

此外,散列只是将数据存储到表的存储区中的第一步 - 如果您正在实现开放寻址/封闭散列,您还需要选择替代桶来进行碰撞后探测:简单的方法,如线性和二次探测通常提供足够的碰撞避免,并且可能在数学上更简单,因此比重新进行更快,但是它们不会保持概率的下一个探测器在负载因子处找到未使用的桶。使用另一个好的哈希函数(包括另一个来自这样的函数的系列)进行重新表达,所以如果这对你很重要,你可能更喜欢使用一系列哈希函数。

Further, hashing is just the first stage in storing data into your table's buckets - if you're implementing open addressing / closed hashing, you also need to select alternative buckets to probe after collisions: simple approaches like linear and quadratic probing generally provide adequate collision avoidance, and are likely mathematically simpler and therefore faster than rehashing, but they don't maintain a probability of the next probe finding an unused bucket at the load factor. Rehashing with another good hash function (including another from a family of such functions) does, so if that's important to you you may prefer to use a family of hash functions.

还要注意,有时使用内存中散列表来表示磁盘数据上的哪些偏移/扇区被存储,因此使用已有内存数据的额外重做计算可能比更高的概率(线性/二次探测)等待磁盘I / O只发现另一个碰撞。

Note too that sometimes an in-memory hash table is used to say at which offsets/sectors on disk data is stored, so extra rehashing calculations with already-in-memory data may be far more appealing than a higher probability (with linear/quadratic probing) of waiting on disk I/O only to find another collision.

这篇关于是普遍的哈希函数系列,只是为了防止敌方的攻击?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆