就为了它的产品清单独立的Hash函数 [英] Hash function on list independant of order of items in it

查看:267
本文介绍了就为了它的产品清单独立的Hash函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想有一个字典,将值赋给一个整数集。

I want to have a dictionary that assigns a value to a set of integers.

例如 [1 2 3] 将有一定的价值。

For example key is [1 2 3] and value will have certain value.

的事情是, [3 2 1] 需要处理同样的在我的情况下,这样的哈希必须是平等的,如果我去与哈希方法。

The thing is that [3 2 1] needs to be treated the same in my case so hash needs to be equal, if I go with hash approach.

的集将具有2至10个。

The set will have 2 to 10 items.

项目的总和通常是固定的,所以根据总和,这是第一个自然的想法在这里,我们不能让哈希code。

Sum of items is usually fixed so we cannot make hashcode according to sum, which is a first natural idea here.

不是一门功课的任务,实际上面临着这样的问题,在我的code。

Not a homework task, actually facing this problem in my code.

这集是基本的IEnumerable< INT> 在C#中,所以任何的数据结构细来存储它们

This set is basically IEnumerable<int> in C# so any data structure is fine to store them.

任何帮助AP preciated。性能是pretty的重要位置了。

Any help appreciated. Performance is pretty important here too.

这是直接的想法:我们可以总结项目^ 2 ,并且已经得到一些更好的散列,但我仍然想听到一些想法

An immediate thought: we could sum up items^2 and already get some kind of better hash, but still I would like to hear some thoughts.

编辑: HMM的真的对不起你们的,每个人都提出订货,没来我的心,我需要说的,其实订购和散列是当前的解决方案,我使用和我正在考虑更快的替代品。

hmm really sorry guys, everyone suggests ordering, didn't come to my mind that I needed to say that actually ordering and hashing is the current solution I use and I am considering faster alternatives.

推荐答案

基本上所有的方法在这里是相同的模板实例。映射X <子> 1 ,...,X <子> N 为函数f(x <子> 1 )运...运算函数f(x <子> N ) ,其中op是一些集合X可交换的相关操作,f是为X.此模板项的图是使用了几次的方式,可证明是良好的。

Basically all of the approaches here are instantiations of the same template. Map x1, …, xn to f(x1) op … op f(xn), where op is a commutative associative operation on some set X, and f is a map from items to X. This template has been used a couple of times in ways that are provably good.

  • 选择一个随机大素数p和随机残基B [1,P - 1]。设f(x)的= B X 模p并让运算是加法。我们基本上是跨preT一组作为一个多项式,并使用施瓦茨 - Zippel引理以碰撞约束的概率(=一个非零多项式具有的概率B,为根模p)。

  • Choose a random large prime p and a random residue b in [1, p - 1]. Let f(x) = bx mod p and let op be addition. We essentially interpret a set as a polynomial and use the Schwartz–Zippel lemma to bound the probability of a collision (= the probability that a nonzero polynomial has b as a root mod p).

让运算是XOR和设F是随机选择的表。这是佐布里斯特散列并在预期的碰撞由简单的线性代数参数的数量减少了。

Let op be XOR and let f be a randomly chosen table. This is Zobrist hashing and minimizes in expectation the number of collisions by straightforward linear-algebraic arguments.

模幂是缓慢的,所以不要使用它。至于佐布里斯特散列,拥有3万个项目,该表F可能将不适合进入L2,尽管它设置一个上限的一个主存储器访问。

Modular exponentiation is slow, so don't use it. As for Zobrist hashing, with 3 million items, the table f probably won't fit into L2, though it does set an upper bound of one main-memory access.

我反而采取佐布里斯特散列为出发点,寻找一个廉价的函数f的行为就像一个随机函数。这实质上是一种非加密的伪随机数生成的职位描述 - 我会尝试通过播种快速PRG与x和生成一个值计算˚F

I would instead take Zobrist hashing as a departure point and look for a cheap function f that behaves like a random function. This is essentially the job description of a non-cryptographic pseudorandom generator – I would try computing f by seeding a fast PRG with x and generating one value.

编辑:鉴于集都具有相同的款项,不选择F到一定程度的1多项式(例如,线性同余发生器的阶跃函数)

given that the sets all have the same sums, don't choose f to be a degree 1 polynomial (e.g., the step function of a linear congruential generator).

这篇关于就为了它的产品清单独立的Hash函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆