Apache的Lucene的 - 改善拼写检查的结果 [英] Apache Lucene - Improving the results of Spell Checker

查看:366
本文介绍了Apache的Lucene的 - 改善拼写检查的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近实施使用Apache Lucene的拼写检查。我的code提供如下:

I recently implemented a SpellChecker using Apache Lucene. My code is provided below:

public void loadDictionary() {
    try {
        File dir = new File("c:/spellchecker/");
        Directory directory = FSDirectory.open(dir);
        spellChecker = new SpellChecker(directory);
        Dictionary dictionary = new PlainTextDictionary(new File("c:/dictionary.txt"));
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, null);
        spellChecker.indexDictionary(dictionary, config, false);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public String performSpellCheck(String word) {
    try {
         String[] suggestions = spellChecker.suggestSimilar(word, 1);
         if (suggestions.length > 0) {
             return suggestions[0];
         }
         else {
             return word; 
         }
    } catch (Exception e) {
        return "Error";
    }
}

以上code使用的英语单词的字典。我有精确度的问题。我希望它做的是建议被错误拼写的单词相似的词(也就是不会出现在字典中的字被使用)。但是,如果我的话后发送给performSpellCheck方法,它返回诗人,也就是说,它是纠正不需要进行修正的话(这些话字典文件存在)。

The above code uses a dictionary of English words. I am having a problem with the accuracy. What I want it to do is suggest similar words to words that are spelled incorrectly (that is, words that do not appear in the dictionary being used). However, if I send the word "post" to the performSpellCheck method, it returns "poet", that is, it is correcting words that do not need to be corrected (these words exist in the dictionary file).

我如何能提高我的成绩?任何建议

Any suggestions on how I can improve my results?

推荐答案

我想,你应该使用<一个href=\"http://lucene.apache.org/core/3_5_0/api/contrib-spellchecker/org/apache/lucene/search/spell/SpellChecker.html#exist%28java.lang.String%29\"相对=nofollow> SpellChecker.exists()方法。只有在Word不使用suggestSimilar方法存在于词典中。

I think, you should use SpellChecker.exists() method. Use suggestSimilar method only if the word does not exists in the dictionary.

这篇关于Apache的Lucene的 - 改善拼写检查的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆