散列浮点向量的好方法? [英] Good way to hash a float vector?

查看:11
本文介绍了散列浮点向量的好方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很清楚比较浮动所涉及的所有问题.这正是这个问题的原因.我正在寻找为 3D 向量(3 个浮点数 - x、y、z)的值创建一个快速哈希表.可以假设向量的长度总是1.0 (sqrt(x*x+y*y+z*z) is 1.0)

I am well aware of all the problems involved in comparing floats. This is exactly the reason for this question.
I'm looking to create a fast hash table for values that are 3D vectors (3 floats - x,y,z). It can be assumed that the length of the vector is always 1.0 (sqrt(x*x+y*y+z*z) is 1.0)

本质上,这意味着我正在寻找一个散列函数,该函数采用几乎等于相同无符号整数值的值和一个对应的相等运算符,如果散列值相等(不一定只有当它们相等时才为真)相等)

Essentially this means that I'm looking for a hash function that takes values that are almost equal to the same unsigned int value and a corresponding equality operator that is true if the hash values are equal (not not necessarily only if they are equal)

编辑 -
误报(即不同的向量但映射到同一个桶)是给定的,因为这是一个哈希表.
假阴性(即接近但映射到不同桶的向量)是不可取的,但似乎没有办法避免它们.就我而言,它们不会导致完全损坏,只会导致一些数据重复,这是我必须忍受的.

Edit -
False positives (i.e. vectors that are different but map to the same bucket) are a given since this is a hash table.
False negatives (i.e. Vectors that are close but map to different buckets) are undesirable but it seems that there is no way to avoid them. In my case, they will not causes total breakage, just some data duplication which is something I'll have to live with.

推荐答案

我认为您正在寻找的东西不是直接可能的.平等的一个重要特性是它是可传递的.(即,如果 a == b 和 b == c,则 a == c).但是,使用距离度量,您确实不想要此属性.示例:

I think what you're looking for is not directly possible. One important property of equality is that it is transitive. (ie. If a == b and b == c, then a == c). With a distance measure, though, you really don't want this property. Example:

取一个浮点数(为简单起见).假设我们想要散列每个浮点数,以便小于 1e-3 的浮点数是相同的值.现在,假设我们向这个哈希表添加 1000 个浮点值,所有 1e-4 相隔.任何相邻的 2 个值都应该散列到相同的浮点数,因为它们比 1e-3 更接近.但是,由于传递性,这些值的邻居也应该具有相同的值,以及它们的邻居等等.结果,所有 1000 个值,包括相距超过 1e-3 的对,都将散列到相同的整数.如果您要在图片上绘制这些点:

Take a single float (for simplicity). Suppose we want to hash each float so that floats less than 1e-3 away are the same value. Now, suppose we add to this hash table 1000 float values all 1e-4 apart. Any neighboring 2 values should hash to the same float, since they are closer than 1e-3. However, because of transitivity, the neighbors of those values should also has to the same value, and their neighbors and so on. As a result, all 1000 values, including pairs farther than 1e-3 apart, would hash to the same integer. If you were to draw these points on a picture:

A  B  C  D  E  F  G  H ... Y Z

假设所有的间隙都<相距 1e-3,但 A 和 Z 相距 > 1e-3(不按比例!).这不能被满足,因为如果 hash(A) == hash(B) 和 hash(B) == hash(C) 等等所有对,(因为它们相距 <1e-3)而不是 hash(A) 必须 == 哈希(Z).

Suppose all the gaps are < 1e-3 apart, but A and Z are > 1e-3 apart (not to scale!). This can't be satisfied because if hash(A) == hash(B) and hash(B) == hash(C) and so on for all pairs, (since they are < 1e-3 apart) than hash(A) must == hash(Z).

一种可能的选择是定义向量空间的区域,其中所有向量都将散列到相同的值(即在散列它们之前将它们舍入),但您仍然可以在它们各自空间的边缘上得到 2 个接近的向量在一起但散列到不同的值.您可以通过在所有相邻空间中搜索向量来解决这个问题.(即在上面的 1-d 情况下,您会将所有向量四舍五入到 1e-3 的最近倍数,然后搜索邻居,因此 5.3e-3 将搜索 5e-3、4e-3 和 6-e3.在更高维度的情况下,您必须在所有维度上搜索邻居.)

One possible option is to define regions of your vector space in which all vectors would hash to the same value (ie round them before hashing them), but you could still get 2 vectors on the edges of their respective spaces that are close together but hash to a different value. You could get around that by searching all neighboring spaces for a vector. (ie in the 1-d case above, you'd round all vectors to the nearest multiple of 1e-3, and then search the neighbors, so 5.3e-3 would search 5e-3, 4e-3 and 6-e3. In higher-dimensional cases, you'd have to search neighbors in all dimensions.)

这篇关于散列浮点向量的好方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆