用lucene模糊搜索 [英] fuzzy search with lucene

查看:160
本文介绍了用lucene模糊搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用lucene 4.3.1实现了模糊搜索,但是我对结果并不满意。我想指定一些它应该返回的结果。例如,如果我想要10个结果,它应该返回10个最好的匹配,不管它们有多糟糕。大多数情况下,如果我搜索的单词与索引中的任何单词都非常不同,它将不会返回任何内容。如何获得更多/更模糊的结果?



这里的代码我有:

  public String [] luceneQuery(String query,int numberOfHits,String path)
throws ParseException,IOException {

File dir = new File(path);
目录索引= FSDirectory.open(dir);

query = query +〜;
Query q = new QueryParser(Version.LUCENE_43,label,analyzer)
.parse(query);

IndexReader reader = DirectoryReader.open(index);
IndexSearcher搜索者=新的IndexSearcher(读者);

Query fuzzyQuery = new FuzzyQuery(new Term(label,query),2);

ScoreDoc [] fuzzyHits = searcher.search(fuzzyQuery,numberOfHits).scoreDocs;
String [] fuzzyResults = new String [fuzzyHits.length];

for(int i = 0; i< fuzzyHits.length; ++ i){
int docId = fuzzyHits [i] .doc;
文档d = searcher.doc(docId);
fuzzyResults [i] = d.get(label);
}

reader.close();
返回fuzzyResults;
}


解决方案

在Lucene 4.x中由 FuzzyQuery 支持。目前 FuzzyQuery 的实现对Lucene 3.x实现的性能有很大的改进,但只支持两个编辑。根据 /4_0_0/core/org/apache/lucene/search/FuzzyQuery.htmlrel =nofollow> FuzzyQuery 文档,如果您确实必须拥有更高的编辑距离:


如果您真的想要这样,请考虑使用n-gram索引技术(例如,建议模块中的SpellChecker)。


强烈的含义是您应该重新思考您想要完成的事情,并找到更有用的方法。 $ b

I implemented a fuzzy search with lucene 4.3.1 but i'm not satisfied with the result. I would like to specify a number of results it should return. So for example if I want 10 results, it should return the 10 best matches, no matter how bad they are. Most of the time it returns nothing if the word I search for is very different from anything in the index. How can I achieve more/fuzzier results?

Here the code I have:

    public String[] luceneQuery(String query, int numberOfHits, String path)
        throws ParseException, IOException {

    File dir = new File(path);
    Directory index = FSDirectory.open(dir);

    query = query + "~";
    Query q = new QueryParser(Version.LUCENE_43, "label", analyzer)
            .parse(query);

    IndexReader reader = DirectoryReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);

    Query fuzzyQuery = new FuzzyQuery(new Term("label", query), 2);

    ScoreDoc[] fuzzyHits = searcher.search(fuzzyQuery, numberOfHits).scoreDocs;
    String[] fuzzyResults = new String[fuzzyHits.length];

    for (int i = 0; i < fuzzyHits.length; ++i) {
        int docId = fuzzyHits[i].doc;
        Document d = searcher.doc(docId);
        fuzzyResults[i] = d.get("label");
    }

    reader.close();
    return fuzzyResults;
}

解决方案

large edit distances are no longer supported by FuzzyQuery in Lucene 4.x. The current implementation of FuzzyQuery is a huge improvement on performance from the Lucene 3.x implementation, but only supports two edits. Distances greater than 2 Damerau–Levenshtein edits are considered to rarely be really useful.

According to the FuzzyQuery documentation, if you really must have higher edit distances:

If you really want this, consider using an n-gram indexing technique (such as the SpellChecker in the suggest module) instead.

The strong implication is that you should rethink what your trying to accomplish, and find a more useful approach.

这篇关于用lucene模糊搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆