Apache的Lucene的 - 改善拼写检查的结果 [英] Apache Lucene - Improving the results of Spell Checker
问题描述
我最近实施使用Apache Lucene的拼写检查。我的code提供如下:
I recently implemented a SpellChecker using Apache Lucene. My code is provided below:
public void loadDictionary() {
try {
File dir = new File("c:/spellchecker/");
Directory directory = FSDirectory.open(dir);
spellChecker = new SpellChecker(directory);
Dictionary dictionary = new PlainTextDictionary(new File("c:/dictionary.txt"));
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, null);
spellChecker.indexDictionary(dictionary, config, false);
} catch (IOException e) {
e.printStackTrace();
}
}
public String performSpellCheck(String word) {
try {
String[] suggestions = spellChecker.suggestSimilar(word, 1);
if (suggestions.length > 0) {
return suggestions[0];
}
else {
return word;
}
} catch (Exception e) {
return "Error";
}
}
以上code使用的英语单词的字典。我有精确度的问题。我希望它做的是建议被错误拼写的单词相似的词(也就是不会出现在字典中的字被使用)。但是,如果我的话后发送给performSpellCheck方法,它返回诗人,也就是说,它是纠正不需要进行修正的话(这些话字典文件存在)。
The above code uses a dictionary of English words. I am having a problem with the accuracy. What I want it to do is suggest similar words to words that are spelled incorrectly (that is, words that do not appear in the dictionary being used). However, if I send the word "post" to the performSpellCheck method, it returns "poet", that is, it is correcting words that do not need to be corrected (these words exist in the dictionary file).
我如何能提高我的成绩?任何建议
Any suggestions on how I can improve my results?
推荐答案
我想,你应该使用<一个href=\"http://lucene.apache.org/core/3_5_0/api/contrib-spellchecker/org/apache/lucene/search/spell/SpellChecker.html#exist%28java.lang.String%29\"相对=nofollow> SpellChecker.exists()方法。只有在Word不使用suggestSimilar方法存在于词典中。
I think, you should use SpellChecker.exists() method. Use suggestSimilar method only if the word does not exists in the dictionary.
这篇关于Apache的Lucene的 - 改善拼写检查的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!