想要的算法:找到类似于自由文本中的单词的字典中的所有单词 [英] Algorithm wanted: Find all words of a dictionary that are similar to words in a free text

查看:143
本文介绍了想要的算法:找到类似于自由文本中的单词的字典中的所有单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个约15万个单词的列表,当用户输入一个免费的文本时,系统应该提供字典中的单词列表,这些单词非常接近自由文本中的单词。

We have a list of about 150,000 words, and when the user enters a free text, the system should present a list of words from the dictionary, that are very close to words in the free text.

例如,用户输入:我想在沃尔玛购买legoe玩具。如果字典包含乐高,汽车和沃尔玛,系统应在列表中出现乐高和沃尔玛。 沃尔玛是显而易见的,因为它与句子中的单词是一样的,但是乐高与乐高也是一样的。然而,没有什么是类似于汽车,所以这个词不会显示。

For instance, the user enters: "I would like to buy legoe toys in Walmart". If the dictionary contains "Lego", "Car" and "Walmart", the system should present "Lego" and "Walmart" in the list. "Walmart" is obvious because it is identical to a word in the sentence, but "Lego" is similar enough to "Legoe" to be mentioned, too. However, nothing is similar to "Car", so that word is not shown.

显示列表应该是实时的,这意味着当用户输入该句子时,单词列表必须存在于屏幕上。有没有人知道一个很好的算法?

Showing the list should be realtime, meaning that when the user has entered the sentence, the list of words must be present on the screen. Does anybody know a good algorithm for this?

字典实际上包含可能包含空格的概念。例如,乐高太空飞船。完美的解决方案也可以识别这些多字概念。

The dictionary actually contains concepts which may include a space. For instance, "Lego spaceship". The perfect solution recognizes these multi-word concepts, too.

任何建议都不胜感激。

推荐答案

查看 http://norvig.com/spell-correct.html 为一个简单的算法。本文使用Python,但是在最后还有其他语言的实现链接。

Take a look at http://norvig.com/spell-correct.html for a simple algorithm. The article uses Python, but there are links to implementations in other languages at the end.

这篇关于想要的算法:找到类似于自由文本中的单词的字典中的所有单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆