通过URL搜索Lucene [英] Lucene search by URL

查看:88
本文介绍了通过URL搜索Lucene的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我存储一个包含URL字段的文档:

  Document doc = new Document(); 
doc.add(new Field(url,url,Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field(text,text,Field.Store.YES,Field.Index.ANALYZED));
doc.add(new Field(html,CompressionTools.compressString(html),Field.Store.YES));

我希望能够通过其URL找到文档,但我得到0个结果:

 分析器分析器=新的StandardAnalyzer(版本.LUCENE_30)
查询查询=新的QueryParser(LUCENE_VERSION,url,分析仪).parse(URL);
IndexSearcher搜索器=新的IndexSearcher(index,true);
TopScoreDocCollector collector = TopScoreDocCollector.create(10,true);
searcher.search(query,collector);
ScoreDoc [] hits = collector.topDocs()。scoreDocs;
//显示结果
(ScoreDoc hit:hits){
System.out.println(FOUND A MATCH);
}
searcher.close();

我可以做些什么不同的事情,以便我可以存储HTML文档并通过它的URL找到它?

解决方案

您可能会将您的查询重写为这样的内容

  Query query = new QueryParser(LUCENE_VERSION,url,analyzer).newTermQuery(new Term(url,url))。parse(url); 

建议:

  TermQuery tq = new TermQuery(new)

我建议您使用BooleanQuery,因为它提供了良好的性能,并且在内部对其进行了优化。术语(url,url));
// BooleanClauses枚举应该说这个运算符用于应该出现在匹配文档中的子句。
BooleanQuery bq = new BooleanQuery()。add(tq,BooleanClause.Occur.SHOULD);
IndexSearcher搜索器=新的IndexSearcher(index,true);
TopScoreDocCollector collector = TopScoreDocCollector.create(10,true);
searcher.search(query,collector);

我看到您使用URL frield作为Not_Analysed进行索引,这对于搜索来说是很好的IMO,因为没有分析器现在如果您的商业案例说,我会给你一个网址找到精确

一个来自Lucene索引,那么你应该看看你的索引与一个不同的分析器(KeywordAnalyzer等)


I'm storing a Document which has a URL field:

Document doc = new Document();
doc.add(new Field("url", url, Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("html", CompressionTools.compressString(html), Field.Store.YES));

I'd like to be able to find a Document by its URL, but I get 0 results:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30)
Query query = new QueryParser(LUCENE_VERSION, "url", analyzer).parse(url);
IndexSearcher searcher = new IndexSearcher(index, true);
TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// Display results
for (ScoreDoc hit : hits) {
  System.out.println("FOUND A MATCH");
}
searcher.close();

What can I do differently so that I can store an HTML document and find it by its URL?

解决方案

You may rewrite your query to something like this

Query query = new QueryParser(LUCENE_VERSION, "url", analyzer).newTermQuery(new Term("url", url)).parse(url);

Suggestion:

I suggest you use BooleanQuery since it gives good performance and internally it is optimized.

TermQuery tq= new TermQuery(new Term("url", url));
// BooleanClauses Enum SHOULD says Use this operator for clauses that should appear in the matching documents.
BooleanQuery bq = new BooleanQuery().add(tq,BooleanClause.Occur.SHOULD);
IndexSearcher searcher = new IndexSearcher(index, true);
TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
searcher.search(query, collector);

I see you are indexing using URL frield as Not_Analysed, which is good IMO for searching, As no analyzer is used the value will be stored as a single term.

Now if your business case says, i will give you a URL find the EXACT one from the Lucene Index then you shall look at your indexing with a different analyzer(KeywordAnalyzer etc)

这篇关于通过URL搜索Lucene的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆