算法＆QUOT;模糊匹配＆QUOT;字符串 [英] Algorithms for "fuzzy matching" strings

查看：168 发布时间：2015/11/30 13:46:36 algorithm search string fuzzy-search

本文介绍了算法＆QUOT;模糊匹配＆QUOT;字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

通过模糊匹配我的意思不是类似字符串由Levenshtein距离或类似的东西，但它在TextMate中/伊多文/冰柱使用方法：给定一个字符串列表，找到那些包括在搜索字符串中的所有字符，但可能与之间，preferring最合适的。

By fuzzy matching I don't mean similar strings by Levenshtein distance or something similar, but the way it's used in TextMate/Ido/Icicles: given a list of strings, find those which include all characters in the search string, but possibly with other characters between, preferring the best fit.

推荐答案

我终于明白了什么，你所期待的。这个问题很有趣但是在看2算法，你发现它似乎人们对规范广泛不同意见;）

I've finally understood what you were looking for. The issue is interesting however looking at the 2 algorithms you found it seems that people have widely different opinions about the specifications ;)

我认为这将是有益的陈述问题和要求更清楚。

I think it would be useful to state the problem and the requirements more clearly.

问题

我们正在寻找一种方式，允许用户只输入关键字他们居然打算的几个字母，并提出他们从中选择一个列表，加快打字。

We are looking for a way to speed up typing by allowing users to only type a few letters of the keyword they actually intended and propose them a list from which to select.

预期的输入的所有字母是在关键字
据预计，在输入的字母在同一个顺序在关键字
关键字的列表中返回应psented一致（reproductible）为了$ P $
在该算法应不区分大小写

分析

前两个要求可以总结一下像这样：对于输入 AXG 我们正在寻找的话匹配此正EX pression [^ A] * A [^ X] * X [^ G] * G *

The first two requirements can be sum up like such: for an input axg we are looking for words matching this regular expression [^a]*a[^x]*x[^g]*g.*

第三个要求是故意松。其中是一致的话应该出现在列表中需要的命令......但它很难猜测评分方法是否会比按字母顺序排列好。如果列表extremy长，则评分方法可能会更好，但对于短名单更容易让眼睛去寻找下一个列表排序明显地特定项目。

The third requirement is purposely loose. The order in which the words should appear in the list need being consistent... however it's difficult to guess whether a scoring approach would be better than alphabetical order. If the list is extremy long, then a scoring approach could be better, however for short list it's easier for the eye to look for a particular item down a list sorted in an obvious manner.

此外，按字母顺序排列具有一致性的优势打字时：即增加一个字母不完全列表重新排序（痛苦的眼睛和大脑），它只是过滤掉不符合任何更长的项目

Also, the alphabetical order has the advantage of consistency during typing: ie adding a letter does not completely reorder the list (painful for the eye and brain), it merely filters out the items that do not match any longer.

有一个关于处理UNI code字符，例如没有precision是 A 类似于 A 或其它字符完全？因为我知道，目前使用这些字符在他们的关键字没有语言的，我会让它溜走现在。

There is no precision about handling unicode characters, for example is à similar to a or another character altogether ? Since I know of no language that currently uses such characters in their keywords, I'll let it slip for now.

我的解决办法：

对于任何投入，我将建立早期pssed正规前pression EX $ P $。它适用于Python的，因为语言已经拥有不区分大小写匹配。

For any input, I would build the regular expression expressed earlier. It suitable for Python because the language already features case-insensitive matching.

我会再匹配关键字的我（按字母顺序排序）列表，并输出使过滤。

I would then match my (alphabetically sorted) list of keywords, and output it so filtered.

在伪code：

WORDS = ['Bar', 'Foo', 'FooBar', 'Other']

def GetList(input, words = WORDS):
  expr = ['[^' + i + ']*' + i for i in input]
  return [w for w in words if re.match(expr, w, re.IGNORECASE)]

我可以用一个一行但认为它会掩盖了code;）

I could have used a one-liner but thought it would obscure the code ;)

该解决方案非常适用于增量的情况下（例如，当您匹配的用户类型，从而保持重建），因为当用户添加，你可以简单地重新过滤结果你只计算一个字符。因此：

This solution works very well for incremental situations (ie, when you match as the user type and thus keep rebuilding) because when the user adds a character you can simply refilter the result you just computed. Thus:

无论有几个字符，从而匹配的是快速和列表的长度没有多大关系
要么也有许多字符，这意味着我们正在筛选候选名单，因此它没有太大的关系，如果匹配需要更长的时间逐元素

我也应该注意到，这种常规的前pression不涉及回溯，因此十分有效。它也可以被建模为一个简单的状态机。

I should also note that this regular expression does not involve back-tracking and is thus quite efficient. It could also be modeled as a simple state machine.

这篇关于算法＆QUOT;模糊匹配＆QUOT;字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

算法＆QUOT;模糊匹配＆QUOT;字符串 [英] Algorithms for "fuzzy matching" strings

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

算法＆QUOT;模糊匹配＆QUOT;字符串 [英] Algorithms for &quot;fuzzy matching&quot; strings

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

算法＆QUOT;模糊匹配＆QUOT;字符串 [英] Algorithms for "fuzzy matching" strings

登录关闭