散列一个浮点向量的好方法? [英] Good way to hash a float vector?

查看:204
本文介绍了散列一个浮点向量的好方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很清楚所有比较花车的问题。这正是这个问题的原因。
我正在寻找为3D向量(3浮点数 - x,y,z)的值创建快速哈希表。可以假定矢量的长度总是1.0( sqrt(x * x + y * y + z * z)是1.0)



基本上这意味着我正在寻找一个散列函数,它的值几乎等于相同的无符号整型值,如果散列值相等,则相应的相等运算符为true不一定只有当它们相等时)

编辑 -

错误的正面(即不同的向量,相同的桶)是一个给定的,因为这是一个哈希表。

错误的负面(即向近,但映射到不同的桶)是不受欢迎的,但似乎没有办法避免它们。在我的情况下,他们不会导致完全破坏,只是一些我必须忍受的重复数据。

>我认为你所寻找的不是直接可能的。平等的一个重要特征是它是传递性的。 (即如果a == b和b == c,则a == c)。不过,距离衡量,你真的不想要这个属性。例如:

取一个浮点数(为了简单起见)。假设我们要散列每个浮点数,以便浮动小于1e-3的浮点数是相同的值。现在,假设我们将这个哈希表1000个浮点值全部加1e-4分开。任何邻近的2值应该散列到相同的浮点数,因为它们比1e-3更接近。但是,由于传递性,这些价值的邻居也应该具有相同的价值,和邻居等等。因此,所有1000个值(包括远离1e-3的对)将散列到相同的整数。如果您在图片上绘制这些点:

  ABCDEFGH ... YZ 

假设所有的差距都是< 1e-3分开,但A和Z分开> 1e-3(不按比例)。这不能被满足,因为如果对于所有对,散列(A)==散列(B)和散列(B)==散列(C)等等(因为它们相距< 1e-3) A)必须==散列(Z)。

一个可能的选择是定义你的向量空间的区域,在这个区域中所有向量都会散列到相同的值(也就是在散列它们之前围绕它们),但是你仍然可以得到2个向量在它们各自空间的边缘相邻,但哈希到不同的值。你可以通过搜索所有相邻的空间来寻找一个向量。 (即在上面的一维情况下,将所有向量四舍五入到最接近的1e-3的倍数,然后搜索邻居,因此5.3e-3将搜索5e-3,4e-3和6-e3。在维度较高的情况下,您必须在所有维度中搜索邻居。)


I am well aware of all the problems involved in comparing floats. This is exactly the reason for this question.
I'm looking to create a fast hash table for values that are 3D vectors (3 floats - x,y,z). It can be assumed that the length of the vector is always 1.0 (sqrt(x*x+y*y+z*z) is 1.0)

Essentially this means that I'm looking for a hash function that takes values that are almost equal to the same unsigned int value and a corresponding equality operator that is true if the hash values are equal (not not necessarily only if they are equal)

Edit -
False positives (i.e. vectors that are different but map to the same bucket) are a given since this is a hash table.
False negatives (i.e. Vectors that are close but map to different buckets) are undesirable but it seems that there is no way to avoid them. In my case, they will not causes total breakage, just some data duplication which is something I'll have to live with.

解决方案

I think what you're looking for is not directly possible. One important property of equality is that it is transitive. (ie. If a == b and b == c, then a == c). With a distance measure, though, you really don't want this property. Example:

Take a single float (for simplicity). Suppose we want to hash each float so that floats less than 1e-3 away are the same value. Now, suppose we add to this hash table 1000 float values all 1e-4 apart. Any neighboring 2 values should hash to the same float, since they are closer than 1e-3. However, because of transitivity, the neighbors of those values should also has to the same value, and their neighbors and so on. As a result, all 1000 values, including pairs farther than 1e-3 apart, would hash to the same integer. If you were to draw these points on a picture:

A  B  C  D  E  F  G  H ... Y Z

Suppose all the gaps are < 1e-3 apart, but A and Z are > 1e-3 apart (not to scale!). This can't be satisfied because if hash(A) == hash(B) and hash(B) == hash(C) and so on for all pairs, (since they are < 1e-3 apart) than hash(A) must == hash(Z).

One possible option is to define regions of your vector space in which all vectors would hash to the same value (ie round them before hashing them), but you could still get 2 vectors on the edges of their respective spaces that are close together but hash to a different value. You could get around that by searching all neighboring spaces for a vector. (ie in the 1-d case above, you'd round all vectors to the nearest multiple of 1e-3, and then search the neighbors, so 5.3e-3 would search 5e-3, 4e-3 and 6-e3. In higher-dimensional cases, you'd have to search neighbors in all dimensions.)

这篇关于散列一个浮点向量的好方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆