Java中数据标准化的拼写校正 [英] Spelling correction for data normalization in Java

查看：142 发布时间：2020/5/4 7:25:36 java lucene spell-checking

本文介绍了Java中数据标准化的拼写校正的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一个Java库来对用户生成的文本内容进行一些初始拼写检查/数据标准化，想象一下在Facebook个人资料中输入的兴趣.

I am looking for a Java library to do some initial spell checking / data normalization on user generated text content, imagine the interests entered in a Facebook profile.

此文本将在某个时候被标记化(在进行拼写校正之前或之后，无论哪种方法效果更好)，其中一些文本用作搜索(完全匹配)的键.减少拼写错误和类似操作以产生更多匹配结果将是很好的.如果校正在令牌上的效果要好于仅一个单词，例如更长的单词，那会更好. 喝咖啡"将变成喝咖啡"，而不是思维咖啡".

This text will be tokenized at some point (before or after spell correction, whatever works better) and some of it used as keys to search for (exact match). It would be nice to cut down misspellings and the like to produce more matches. It would be even better if the correction would perform well on tokens longer than just one word, e.g. "trinking coffee" would become "drinking coffee" and not "thinking coffee".

我找到了以下Java库来进行拼写校正:

I found the following Java libraries for doing spelling correction:

JAZZY 似乎没有得到积极发展.此外，由于在社交网络配置文件和多词标记中使用了非标准语言，因此基于字典距离的方法似乎是不够的.
APACHE LUCENE 似乎有一个统计拼写检查器那应该更合适.问题在这里将如何创建一个好的字典? (否则，我们不使用Lucene，因此不存在索引.)

JAZZY does not seem to be under active development. Also, the dictionary-distance based approach seems inadequate because of the use of non-standard language in social network profiles and multi-word tokens.
APACHE LUCENE seems to have a statistical spell checker that should be much more suited. Question here would how to create a good dictionary? (We are not using Lucene otherwise, so there is no existing index.)

欢迎提出任何建议！

Java中数据标准化的拼写校正 [英] Spelling correction for data normalization in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java中数据标准化的拼写校正 [英] Spelling correction for data normalization in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭