Lucene 中的多字段查询处理 [英] Multiple Field Query handling in Lucene
问题描述
我在 Lucene 中编写了一个索引搜索器,它将搜索索引数据库中的多个字段.
I have written an index searcher in Lucene that will search multiple fields in the indexed database.
实际上它将查询作为两个字符串,一个是 title
,另一个是 cityname
.
Actually it takes query as two strings one is say title
and another is cityname
.
现在索引数据库有三个字段:title、address 和 city
.
Now the indexed database has three field: title, address and city
.
只有当标题匹配并且城市名称匹配时才会发生命中.为此,我在帖子的帮助下使用 MultiFieldQuerySearcher
编写了以下搜索器代码:
Hit should occur only if the title matches and city name matches. For that purpose I have written the following searcher code using MultiFieldQuerySearcher
with the help of a post:
public void searchdb(String myQuery, String myCity) throws Exception
{
System.out.println("Searching in the database ...");
String[] fields={"title","address","city"};
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_CURRENT, fields, new StandardAnalyzer(Version.LUCENE_CURRENT));
parser.setDefaultOperator(QueryParser.Operator.AND);
if(!myQuery.toLowerCase().contains(myCity.toLowerCase()))
{
myQuery="title:"+myQuery+" "+"address:"+myQuery+" "+myCity+" "+"city:"+myCity;
}
Query query=parser.parse(myQuery);
if (query instanceof BooleanQuery)
{
BooleanClause.Occur[] flags ={BooleanClause.Occur.MUST,BooleanClause.Occur.SHOULD,BooleanClause.Occur.MUST};
BooleanQuery booleanQuery = (BooleanQuery) query;
BooleanClause[] clauses = booleanQuery.getClauses();
System.out.println("Query="+booleanQuery.toString()+" and Number of clauses="+clauses.length);
for (int i = 0; i < clauses.length; i++)
{
clauses[i].setOccur(flags[i]);
}
Directory dir=FSDirectory.open(new File("demoIndex"));
IndexSearcher searcher = new IndexSearcher(dir, true);
TopDocs hits = searcher.search(booleanQuery, 20);
searcher.close();
dir.close();
System.out.println("Number of hits="+hits.totalHits);
}
}
但它运行不正常.
例如,如果查询是必胜客",城市是孟买",我希望仅在数据库的标题字段中搜索必胜客",而仅在数据库的城市字段中搜索孟买.
For example if the query is "Pizza Hut" and city is "Mumbai", I want "Pizza Hut" to be searched only in title field of the database and Mumbai only in city field of the database.
但它也在数据库的城市字段中找到小屋",因为语句 booleanQuery.toString() 的输出为+title:pizza +(title:hut city:hut) +city:mumbai".
But it is finding "Hut" also in the city field of the database as the output of the statement booleanQuery.toString() is coming as "+title:pizza +(title:hut city:hut) +city:mumbai".
结果在 for 循环中给出 index outOfBound 错误.
As a result in the for loop it is giving index outOfBound error.
我是 Lucene 的新手.所以我正在寻求帮助来解决这个问题.
I am new to Lucene. So I am asking for help to fix the problem.
推荐答案
只有当我们想在多个字段中搜索相同的关键字时,我们才使用 MultiFieldQueryParser.
We use MultiFieldQueryParser only when we want to search the same keyword(s) in multiple fields.
为了处理您的用例,您已经分别引用了 city-keyword 和 title-keyword 会更简单.尝试使用以下代码.
To handle your use case, it is simpler that you already have references to city-keyword and title-keyword separately. Try using following code.
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
// city query
QueryParser cityQP = new QueryParser(Version.LUCENE_CURRENT, "city", analyzer);
Query cityQuery = cityQP.parse(myCity);
// title query
QueryParser titleQP = new QueryParser(Version.LUCENE_CURRENT, "title", analyzer);
Query titleQuery = titleQP.parse(myQuery);
// final query
BooleanQuery finalQuery = new BooleanQuery();
finalQuery.add(cityQuery, Occur.MUST); // MUST implies that the keyword must occur.
finalQuery.add(titleQuery, Occur.MUST); // Using all "MUST" occurs is equivalent to "AND" operator.
这篇关于Lucene 中的多字段查询处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!