Lucene中的多字段查询处理 [英] Multiple Field Query handling in Lucene
问题描述
我在Lucene中编写了一个索引搜索器,它将在索引数据库中搜索多个字段.
I have written an index searcher in Lucene that will search multiple fields in the indexed database.
实际上,查询需要两个字符串,一个是title
,另一个是cityname
.
Actually it takes query as two strings one is say title
and another is cityname
.
现在,索引数据库具有三个字段:title, address and city
.
Now the indexed database has three field: title, address and city
.
仅当标题匹配和城市名称匹配时,命中才会发生.为此,我在帖子的帮助下使用MultiFieldQuerySearcher
编写了以下搜索器代码:
Hit should occur only if the title matches and city name matches. For that purpose I have written the following searcher code using MultiFieldQuerySearcher
with the help of a post:
public void searchdb(String myQuery, String myCity) throws Exception
{
System.out.println("Searching in the database ...");
String[] fields={"title","address","city"};
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_CURRENT, fields, new StandardAnalyzer(Version.LUCENE_CURRENT));
parser.setDefaultOperator(QueryParser.Operator.AND);
if(!myQuery.toLowerCase().contains(myCity.toLowerCase()))
{
myQuery="title:"+myQuery+" "+"address:"+myQuery+" "+myCity+" "+"city:"+myCity;
}
Query query=parser.parse(myQuery);
if (query instanceof BooleanQuery)
{
BooleanClause.Occur[] flags ={BooleanClause.Occur.MUST,BooleanClause.Occur.SHOULD,BooleanClause.Occur.MUST};
BooleanQuery booleanQuery = (BooleanQuery) query;
BooleanClause[] clauses = booleanQuery.getClauses();
System.out.println("Query="+booleanQuery.toString()+" and Number of clauses="+clauses.length);
for (int i = 0; i < clauses.length; i++)
{
clauses[i].setOccur(flags[i]);
}
Directory dir=FSDirectory.open(new File("demoIndex"));
IndexSearcher searcher = new IndexSearcher(dir, true);
TopDocs hits = searcher.search(booleanQuery, 20);
searcher.close();
dir.close();
System.out.println("Number of hits="+hits.totalHits);
}
}
但是它不能正常运行.
例如,如果查询为"Pizza Hut"而城市为"Mumbai",我希望仅在数据库的标题字段中搜索"Pizza Hut",而仅在数据库的city字段中搜索孟买".
For example if the query is "Pizza Hut" and city is "Mumbai", I want "Pizza Hut" to be searched only in title field of the database and Mumbai only in city field of the database.
但是随着语句 booleanQuery.toString()输出为" + title:pizza +(title :hut city:hut)+ city:mumbai ".
But it is finding "Hut" also in the city field of the database as the output of the statement booleanQuery.toString() is coming as "+title:pizza +(title:hut city:hut) +city:mumbai".
作为for循环的结果,它给出了索引outOfBound错误.
As a result in the for loop it is giving index outOfBound error.
我是Lucene的新手.因此,我正在寻求帮助来解决问题.
I am new to Lucene. So I am asking for help to fix the problem.
推荐答案
仅当我们要在多个字段中搜索相同的关键字时,才使用MultiFieldQueryParser.
We use MultiFieldQueryParser only when we want to search the same keyword(s) in multiple fields.
要处理用例,更简单的是,您已经分别引用了city-keyword和title-keyword.尝试使用以下代码.
To handle your use case, it is simpler that you already have references to city-keyword and title-keyword separately. Try using following code.
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
// city query
QueryParser cityQP = new QueryParser(Version.LUCENE_CURRENT, "city", analyzer);
Query cityQuery = cityQP.parse(myCity);
// title query
QueryParser titleQP = new QueryParser(Version.LUCENE_CURRENT, "title", analyzer);
Query titleQuery = titleQP.parse(myQuery);
// final query
BooleanQuery finalQuery = new BooleanQuery();
finalQuery.add(cityQuery, Occur.MUST); // MUST implies that the keyword must occur.
finalQuery.add(titleQuery, Occur.MUST); // Using all "MUST" occurs is equivalent to "AND" operator.
这篇关于Lucene中的多字段查询处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!