如何使用Trie进行拼写检查 [英] How do I use a Trie for spell checking

查看:98
本文介绍了如何使用Trie进行拼写检查的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用单词字典构建的特里.我想将其用于拼写检查(并建议在字典中最接近的匹配项,也许对于给定的x编辑次数).我以为我会在目标词和字典中的词之间使用levenshtein距离,但是有一种聪明的方法来遍历trie而不实际在每个词上分别运行编辑距离逻辑吗?遍历和编辑距离应如何匹配?

I have a trie that I've built from a dictionary of words. I want to use this for spell checking( and suggest closest matches in the dictionary , maybe for a given number of edits x). I'm thinking I'd use levenshtein distance between the target word and words in my dictionary, but is there a smart way to traverse the trie without actually running the edit distance logic over each word separately? How should I do the traversal and the edit distance matching?

例如,如果我有单词MAN,MANE,我应该能够在MANE中的MAN上重用编辑距离计算.否则Trie不会有任何目的

For e.g, if I have words MAN, MANE, I should be able to reuse the edit distance computation on MAN in MANE. Otherwise the Trie wouldnt serve any purpose

推荐答案

尝试为每个树节点计算一个数组A,其中A [x]匹配树的前x个字母后,在trie中该位置的最小编辑距离目标词.

Try computing for each tree node an array A where A[x] the smallest edit distance to be at that position in the trie after matching the first x letters of the target word.

然后,如果数组中的每个元素都大于目标距离,则可以停止检查任何节点.

You can then stop examining any nodes if every element in the array is greater than your target distance.

例如,使用包含MAN和MANE的trie以及输入BANE:

For example, with a trie containing MAN and MANE and an input BANE:

Node 0 representing '', A=[0,1,2,3,4]
Node 1 representing 'M', A=[1,1,2,3,4]
Node 2 representing 'MA', A=[2,1,1,2,3]
Node 3 representing 'MAN' A=[3,2,2,1,2]
Node 4 representing 'MANE' A=[4,3,2,2,1]

A [end]的最小值是1个单词"MANE",因此这是最佳匹配.

The smallest value for A[end] is 1 reached with the word 'MANE' so this is the best match.

这篇关于如何使用Trie进行拼写检查的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆