lucene 短语查询不起作用 [英] lucene phrase query not working
问题描述
我正在尝试使用 Lucene 2.9.4 编写一个简单的程序来搜索短语查询,但我得到 0 次点击
I am trying to write a simple program using Lucene 2.9.4 which searches for a phrase query but I am getting 0 hits
public class HelloLucene {
public static void main(String[] args) throws IOException, ParseException{
// TODO Auto-generated method stub
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
Directory index = new RAMDirectory();
IndexWriter w = new IndexWriter(index,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED);
addDoc(w, "Lucene in Action");
addDoc(w, "Lucene for Dummies");
addDoc(w, "Managing Gigabytes");
addDoc(w, "The Art of Computer Science");
w.close();
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "lucene"),0);
pq.add(new Term("content", "in"),1);
pq.setSlop(0);
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index,true);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(pq, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for(int i=0; i<hits.length; i++){
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i+1)+ "." + d.get("content"));
}
searcher.close();
}
public static void addDoc(IndexWriter w, String value)throws IOException{
Document doc = new Document();
doc.add(new Field("content", value, Field.Store.YES, Field.Index.NOT_ANALYZED));
w.addDocument(doc);
}
}
请告诉我有什么问题.我也尝试过如下使用 QueryParser
Please tell me what is wrong. I have also tried using QueryParser as following
String querystr =""Lucene in Action"";
Query q = new QueryParser(Version.LUCENE_29, "content",analyzer).parse(querystr);
但这也行不通.
推荐答案
代码有两个问题(和你的Lucene版本无关):
There are two issues with the code (and they have nothing to do with your version of Lucene):
1) StandardAnalyzer 不索引停用词(如in"),因此 PhraseQuery 将永远无法找到短语Lucene in"
1) the StandardAnalyzer does not index stopwords (like "in"), so the PhraseQuery will never be able to find the phrase "Lucene in"
2) 正如 Xodarap 和 Shashikant Kore 所提到的,您创建文档的调用需要包含 Index.ANALYZED,否则 Lucene 不会在文档的这一部分使用分析器.使用 Index.NOT_ANALYZED 可能有一个不错的方法,但我不熟悉它.
2) as mentioned by Xodarap and Shashikant Kore, your call to create a document needs to include Index.ANALYZED, otherwise Lucene does not use the Analyzer on this section of the Document. There's probably a nifty way to do it with Index.NOT_ANALYZED, but I'm not familiar with it.
为方便解决,请将 addDoc 方法更改为:
For an easy fix, change your addDoc method to:
public static void addDoc(IndexWriter w, String value)throws IOException{
Document doc = new Document();
doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}
并将您创建的 PhraseQuery 修改为:
and modify your creation of the PhraseQuery to:
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "computer"),0);
pq.add(new Term("content", "science"),1);
pq.setSlop(0);
这将为您提供以下结果,因为计算机"和科学"都不是停用词:
This will give you the result below since both "computer" and "science" are not stopwords:
Found 1 hits.
1.The Art of Computer Science
如果要查找Lucene in Action",可以增加这个PhraseQuery的slop(增加两个词之间的'gap'):
If you want to find "Lucene in Action", you can increase the slop of this PhraseQuery (increasing the 'gap' between the two words):
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "lucene"),0);
pq.add(new Term("content", "action"),1);
pq.setSlop(1);
如果你真的想搜索lucene in"这个句子,你需要选择一个不同的分析器(比如 SimpleAnalyzer).在 Lucene 2.9 中,只需将您对 StandardAnalyzer 的调用替换为:
If you really want to search for the sentence "lucene in", you will need to select a different analyzer (like the SimpleAnalyzer). In Lucene 2.9, just replace your call to the StandardAnalyzer with:
SimpleAnalyzer analyzer = new SimpleAnalyzer();
或者,如果您使用的是 3.1 或更高版本,则需要添加版本信息:
Or, if you're using version 3.1 or higher, you need to add the version information:
SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
这是一篇关于类似问题的有用帖子(这将帮助您开始使用 PhraseQuery):使用 Lucene 进行精确短语搜索? -- 请参阅 WhiteFang34 的答案.
Here is a helpful post on a similar issue (this will help you get going with PhraseQuery): Exact Phrase search using Lucene? -- see WhiteFang34's answer.
这篇关于lucene 短语查询不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!