算法通缉:查找字典是在自由文本类似的话所有的话 [英] Algorithm wanted: Find all words of a dictionary that are similar to words in a free text

查看:126
本文介绍了算法通缉:查找字典是在自由文本类似的话所有的话的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们具有大约15万字的列表,并且当用户输入自由文本,系统应该present词语从词典,即非常接近在自由文本单词的列表。

例如,用户输入:我想购买legoe玩具在沃尔玛。如果字典包含乐高,汽车和沃尔玛,该系统应present乐高和沃尔玛在列表中。 沃尔玛是明显,因为它等同于在句子的单词,而是乐高类似于足以Legoe被提及,太。然而,没有什么是类似于汽车,使字未图示

显示列表应该是实时的,这意味着当用户输入的句子,单词的列表必须在屏幕上present。是否有人知道一个好的算法呢?

本词典实际上包含了概念,其中可能包括一个空格。例如,乐高飞船。完美的解决方案可以识别这些多字的概念了。

任何建议都是AP preciated。

解决方案

看看 http://norvig.com/spell -correct.html 一个简单的算法。文章使用Python,但也有链接到其他语言的实现在年底。

We have a list of about 150,000 words, and when the user enters a free text, the system should present a list of words from the dictionary, that are very close to words in the free text.

For instance, the user enters: "I would like to buy legoe toys in Walmart". If the dictionary contains "Lego", "Car" and "Walmart", the system should present "Lego" and "Walmart" in the list. "Walmart" is obvious because it is identical to a word in the sentence, but "Lego" is similar enough to "Legoe" to be mentioned, too. However, nothing is similar to "Car", so that word is not shown.

Showing the list should be realtime, meaning that when the user has entered the sentence, the list of words must be present on the screen. Does anybody know a good algorithm for this?

The dictionary actually contains concepts which may include a space. For instance, "Lego spaceship". The perfect solution recognizes these multi-word concepts, too.

Any suggestions are appreciated.

解决方案

Take a look at http://norvig.com/spell-correct.html for a simple algorithm. The article uses Python, but there are links to implementations in other languages at the end.

这篇关于算法通缉:查找字典是在自由文本类似的话所有的话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆