最佳的方法在Java中执行最好的莱文斯坦对阵地图 [英] Optimal method to perform a best levenshtein match against Map in Java

查看:362
本文介绍了最佳的方法在Java中执行最好的莱文斯坦对阵地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Java中的地图。我想比较反对在地图中的所有项目源字符串,并返回基于一个莱文斯坦比算法的最佳匹配。我想知道什么到列表中的将是每一个元素都执行此项检查的最佳方式。

I have a map in Java. I would like to compare a source string against all items in the map and return the best match based on a levenshtein ratio algorithm. I am wondering what the optimal way to perform this check on every element in the list would be.

谢谢,马特

推荐答案

您将无法获得比O(n)性能更好的与标准地图 - 只需使用顺序测试它们的幼稚的做法。

You won't be able to get better than O(n) performance with a standard Map - just use the naive approach of testing them sequentially.

有更有效的方法来做到这一点,虽然。其中一个被称为 BK树。基本上,在构造一个n路的树,由节点之间的Levenshtein距离确定的边缘。然后,您可以使用三角不等式的,以大规模削减你要搜索的节点。对于短距离,这是非常有效的。这里有一个博客文章,我写了一些前段时间,描述得很详细。随着一点点额外的工作,就可以查询它的近邻,而不是距离为1,2,等反复查询。

There are far more efficient ways to do this, though. One of them is called a bk-tree. Basically, you construct an n-way tree, with edges determined by the levenshtein distance between the nodes. Then, you can make use of the triangle inequality to massively cut down the nodes you have to search. For short distances, it's very efficient. Here's a blog article I wrote some time ago, describing it in detail. With a little extra work, you can query it for nearest-neighbour, rather than repeatedly querying with distance 1, 2, etc.

这篇关于最佳的方法在Java中执行最好的莱文斯坦对阵地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆