局部敏感哈希实现? [英] Locality Sensitive Hash Implementation?
问题描述
是否有任何C比较简单易懂(并实现简单)局部性敏感哈希例子/ C ++ / Java的/ C#?
Are there any relatively simple to understand (and simple to implement) locality-sensitive hash examples in C/C++/Java/C#?
我想更多地了解这个概念,因此想尝试一些文本文件的实施只是为了看看它是如何工作的,所以我什么都不需要高性能或什么...只是一个例子的用于返回类似的输入类似哈希的哈希函数。我可以学习例如,从更后。 :)
I'd like to learn more about the concept and so want to try an implementation on a few text files just to see how it works, so I don't need anything high-performance or anything... just an example of a hash function that returns similar hashes for similar inputs. I can learn more from it by example afterwards. :)
推荐答案
对于字符串您可以使用近似匹配算法。
For strings you can use approximate matching algorithm.
- 生成一个随机字符串
- 对于所有的字符串中使用的算法如 http://www.dotnetperls.com/levenshtein从随机共享字符串计算它们之间的距离一>
- Generate a random string
- For all the strings compute their distance from that random shared string using an algorithm like http://www.dotnetperls.com/levenshtein
如果字符串是等距离的参考线,然后有机会,他们是彼此相似。有你去,你有一个地方senitive字符串哈希实施。
If the strings are equidistant from a reference string then chances are that they are similar to each other. And there you go you have a locality senitive hash implementation for strings.
您可以为距离范围内创建不同的散列桶。
You can create different hash buckets for a range of distances.
编辑::您可以尝试串的距离等变化。一个更简单的算法将只返回没有。两个字符串之间的共性。
You can try other variations of string distance. A simpler algorithm would just return no. of common characters between two strings.
这篇关于局部敏感哈希实现?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!