快速近似字符串匹配算法 [英] algorithms for fast string approximate matching
问题描述
由于源字符串取值
和 N
相等长度的字符串,我需要找到一个快速的算法来回报那些有至多 K
,可从每个相应的位置上源字符串取值
不同的字符的字符串。
Given a source string s
and n
equal length strings, I need to find a quick algorithm to return those strings that have at most k
characters that are different from the source string s
at each corresponding position.
什么是快速算法来做到这一点?
What is a fast algorithm to do so?
PS:我有要求,这是一个学术
的问题。我想找到最有效的算法,如果可能的话。
PS: I have to claim that this is a academic
question. I want to find the most efficient algorithm if possible.
此外,我错过了一个信息非常重要的一块。该 N
相等长度的字符串形成一本字典,对其中许多源字符串取值
会被人质疑。似乎有某种preprocessing步骤,使之更有效率。
Also I missed one very important piece of information. The n
equal length strings form a dictionary, against which many source strings s
will be queried upon. There seems to be some sort of preprocessing step to make it more efficient.
推荐答案
塞奇威克在他的著作算法写道:的三元搜索树允许找到一个给定的海明内的所有的话距离查询词的。在道博博士的 文章
Sedgewick in his book "Algorithms" writes that Ternary Search Tree allows "to locate all words within a given Hamming distance of a query word". Article in Dr. Dobb's
这篇关于快速近似字符串匹配算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!