lucene 短语查询不起作用 [英] lucene phrase query not working

查看:22
本文介绍了lucene 短语查询不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Lucene 2.9.4 编写一个简单的程序来搜索短语查询,但我得到 0 次点击

I am trying to write a simple program using Lucene 2.9.4 which searches for a phrase query but I am getting 0 hits

public class HelloLucene {

public static void main(String[] args) throws IOException, ParseException{
    // TODO Auto-generated method stub

    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
    Directory index = new RAMDirectory();

    IndexWriter w = new IndexWriter(index,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED);
    addDoc(w, "Lucene in Action");
    addDoc(w, "Lucene for Dummies");
    addDoc(w, "Managing Gigabytes");
    addDoc(w, "The Art of Computer Science");
    w.close();      

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "lucene"),0);
    pq.add(new Term("content", "in"),1);
    pq.setSlop(0);

    int hitsPerPage = 10;
    IndexSearcher searcher = new IndexSearcher(index,true);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(pq, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    System.out.println("Found " + hits.length + " hits.");
    for(int i=0; i<hits.length; i++){
        int docId = hits[i].doc;
        Document d = searcher.doc(docId);
        System.out.println((i+1)+ "." + d.get("content"));
    }

    searcher.close();


}

public static void addDoc(IndexWriter w, String value)throws IOException{
    Document doc = new Document();
    doc.add(new Field("content", value, Field.Store.YES, Field.Index.NOT_ANALYZED));
    w.addDocument(doc);
}

}

请告诉我有什么问题.我也尝试过如下使用 QueryParser

Please tell me what is wrong. I have also tried using QueryParser as following

String querystr =""Lucene in Action"";

    Query q = new QueryParser(Version.LUCENE_29, "content",analyzer).parse(querystr);

但这也行不通.

推荐答案

代码有两个问题(和你的Lucene版本无关):

There are two issues with the code (and they have nothing to do with your version of Lucene):

1) StandardAnalyzer 不索引停用词(如in"),因此 PhraseQuery 将永远无法找到短语Lucene in"

1) the StandardAnalyzer does not index stopwords (like "in"), so the PhraseQuery will never be able to find the phrase "Lucene in"

2) 正如 Xodarap 和 Shashikant Kore 所提到的,您创建文档的调用需要包含 Index.ANALYZED,否则 Lucene 不会在文档的这一部分使用分析器.使用 Index.NOT_ANALYZED 可能有一个不错的方法,但我不熟悉它.

2) as mentioned by Xodarap and Shashikant Kore, your call to create a document needs to include Index.ANALYZED, otherwise Lucene does not use the Analyzer on this section of the Document. There's probably a nifty way to do it with Index.NOT_ANALYZED, but I'm not familiar with it.

为方便解决,请将 addDoc 方法更改为:

For an easy fix, change your addDoc method to:

public static void addDoc(IndexWriter w, String value)throws IOException{
    Document doc = new Document();
    doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED));
    w.addDocument(doc);
}

并将您创建的 PhraseQuery 修改为:

and modify your creation of the PhraseQuery to:

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "computer"),0);
    pq.add(new Term("content", "science"),1);
    pq.setSlop(0);

这将为您提供以下结果,因为计算机"和科学"都不是停用词:

This will give you the result below since both "computer" and "science" are not stopwords:

    Found 1 hits.
    1.The Art of Computer Science

如果要查找Lucene in Action",可以增加这个PhraseQuery的slop(增加两个词之间的'gap'):

If you want to find "Lucene in Action", you can increase the slop of this PhraseQuery (increasing the 'gap' between the two words):

    PhraseQuery pq = new PhraseQuery();
    pq.add(new Term("content", "lucene"),0);
    pq.add(new Term("content", "action"),1);
    pq.setSlop(1);

如果你真的想搜索lucene in"这个句子,你需要选择一个不同的分析器(比如 SimpleAnalyzer).在 Lucene 2.9 中,只需将您对 StandardAnalyzer 的调用替换为:

If you really want to search for the sentence "lucene in", you will need to select a different analyzer (like the SimpleAnalyzer). In Lucene 2.9, just replace your call to the StandardAnalyzer with:

    SimpleAnalyzer analyzer = new SimpleAnalyzer();

或者,如果您使用的是 3.1 或更高版本,则需要添加版本信息:

Or, if you're using version 3.1 or higher, you need to add the version information:

    SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);

这是一篇关于类似问题的有用帖子(这将帮助您开始使用 PhraseQuery):使用 Lucene 进行精确短语搜索? -- 请参阅 WhiteFang34 的答案.

Here is a helpful post on a similar issue (this will help you get going with PhraseQuery): Exact Phrase search using Lucene? -- see WhiteFang34's answer.

这篇关于lucene 短语查询不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆